easy_vision.python.model.faster_rcnn

easy_vision.python.model.faster_rcnn.faster_rcnn

class easy_vision.python.model.faster_rcnn.faster_rcnn.FasterRcnn(model_config, feature_dict, label_dict=None, mode='predict', categories=None)[source]

Bases: easy_vision.python.model.detection_model.DetectionModel

__init__(model_config, feature_dict, label_dict=None, mode='predict', categories=None)[source]

x.__init__(…) initializes x; see help(type(x)) for signature

build_loss_graph()[source]
build_predict_graph()[source]
classmethod create_class(name)
get_scopes_of_levels()[source]

return a list of variable scope list order by levels ( outputs -> heads -> backbone -> inputs).

easy_vision.python.model.faster_rcnn.rcnn_head

class easy_vision.python.model.faster_rcnn.rcnn_head.RCNNHead(feature_dict, head_config, label_dict=None, fpn_config=None, mode='predict', region_feature_extractor=None)[source]

Bases: easy_vision.python.model.cv_head.CVHead

for the second stage of faster rcnn: classification

__init__(feature_dict, head_config, label_dict=None, fpn_config=None, mode='predict', region_feature_extractor=None)[source]
Args
feature_dict: input dict of features head_config: rcnn head config label_dict: a dict of labels, during prediction, it can be None fpn_config: config of fpn mode: train for train phase, evaluate for evaluate phase, predict for predict phase region_feature_extractor: block reuse part of backbone to extract box feature in second stage
build_loss_graph()[source]

Build loss of the rcnn stage, including classification loss and regression loss. variables involved are: proposals, proposal scores, proposal box offsets, groundtruth_boxes, groundtruth_classes key steps are:

  1. find matches between proposal boxes and groundtruth boxes, proposal boxes of larger IOUs with groundtruth boxes are assigned groundtruth class label, others are assigned backgroung class label
  2. for proposals with groundtruth class label, regression targets(i.e. offsets) are computed.
  3. compute regression and classification loss, normalized by number of proposals, and then normalized by batch size
Returns:rcnn_reg loss, rcnn_cls loss
build_postprocess_graph()[source]
Postprocess of rcnn stage, include box decoding, cliping, nms
variables involved are: refined_box_encodings,
class_predictions_with_background
Returns:a dict of nmsed_boxes, nmsed_scores, nmsed_classes, num_detections
build_predict_graph()[source]

Build predict graph of rcnn stage

  1. crop region features from backbone feature map.
  2. a classify block to classify the region features.
  3. a box predictor to predict box scores and box encodings.

easy_vision.python.model.faster_rcnn.rcnn_helper

easy_vision.python.model.faster_rcnn.rcnn_helper.add_gts_to_proposal(proposal_boxlists, groundtruth_boxlists)[source]

Add gts(groundtruth boxes) to proposals

Parameters:
  • proposal_boxlists – A list of BoxList containing proposal boxes in absolute coordinates.
  • groundtruth_boxlists – A list of Boxlist containing groundtruth object boxes in absolute coordinates.
Returns:

a list of BoxList contained sampled proposals. num_proposals: number of sampled proposals per images max_num_proposals: max number of sampled proposals

Return type:

sampled_proposal_boxlists

easy_vision.python.model.faster_rcnn.rcnn_helper.build_roi_pooling_fn(initial_crop_size, maxpool_kernel_size=1, maxpool_stride=1, fpn_config=None)[source]

RoiPooling Function Builder

Parameters:
  • initial_crop_size – the initial bilinear interpolation based cropping during ROI pooling.
  • maxpool_kernel_size – kernel size of the max pool op on the cropped feature map during ROI pooling.
  • maxpool_stride – stride of the max pool op on the cropped feature map during ROI pooling.
  • fpn_config – config of fpn.
Returns:

A roi_pooling function

easy_vision.python.model.faster_rcnn.rcnn_helper.fpn_roi_pooling(features_list_to_crop, proposal_boxes, image_shape, initial_crop_size, maxpool_kernel_size=1, maxpool_stride=1, roi_min_level=2, roi_max_level=5, roi_canonical_level=4, roi_canonical_scale=224)[source]

Crops to a set of proposals from the feature map for a batch of images.

Parameters:
  • features_list_to_crop – A list of float32 tensor with shape [batch_size, height, width, depth]
  • proposal_boxes – A float32 tensor with shape [batch_size, num_proposals, box_code_size] containing proposal boxes in absolute coordinates in preprocessed image.
  • image_shape – A 1-D tensor of shape [4] containing image tensor shape.
  • initial_crop_size – the initial bilinear interpolation based cropping during ROI pooling.
  • maxpool_kernel_size – kernel size of the max pool op on the cropped feature map during ROI pooling.
  • maxpool_stride – stride of the max pool op on the cropped feature map during ROI pooling.
  • roi_min_level – level refers to feature map indices, level is associated with feature map strides = 2 ^ level
  • roi_max_level – refers to the roi level of lowest fpn feature map
  • roi_canonical_level – roi_canonical_level(k0) specified the parameters used in distribute proposals to feature maps: k = floor(k0 + log2(sqrt(wh)/224))
  • roi_canonical_scale – default is 224
Returns:

A float32 tensor with shape [K, new_height, new_width, depth].

easy_vision.python.model.faster_rcnn.rcnn_helper.roi_pooling(features_list_to_crop, proposal_boxes, image_shape, initial_crop_size, maxpool_kernel_size=1, maxpool_stride=1)[source]

Crops to a set of proposals from the feature map for a batch of images.

Parameters:
  • features_list_to_crop – A list of float32 tensor with shape [batch_size, height, width, depth]
  • proposal_boxes – A float32 tensor with shape [batch_size, num_proposals, box_code_size] containing proposal boxes in absolute coordinates in preprocessed image.
  • image_shape – A 1-D tensor of shape [4] containing image tensor shape.
  • initial_crop_size – the initial bilinear interpolation based cropping during ROI pooling.
  • maxpool_kernel_size – kernel size of the max pool op on the cropped feature map during ROI pooling.
  • maxpool_stride – stride of the max pool op on the cropped feature map during ROI pooling.
Returns:

A float32 tensor with shape [K, new_height, new_width, depth].

easy_vision.python.model.faster_rcnn.rcnn_helper.sample_minbatch_train_box(proposal_boxlists, groundtruth_boxlists, detector_target_assigner, minibatch_balance_fraction, minibatch_batch_size)[source]

Samples a mini-batch of proposals to be sent to the box classifier.

Parameters:
  • proposal_boxlists – A list of BoxList containing proposal boxes in absolute coordinates.
  • groundtruth_boxlists – A list of Boxlist containing groundtruth object boxes in absolute coordinates.
  • detector_target_assigner – detection target assigner.
  • minibatch_balance_fraction – balance fraction of BalancedPositiveNegativeSampler
  • minibatch_batch_size – sample batch size
Returns:

a list of BoxList contained sampled proposals. num_proposals: number of sampled proposals per images max_num_proposals: max number of sampled proposals

Return type:

sampled_proposal_boxlists

easy_vision.python.model.faster_rcnn.rpn_head

class easy_vision.python.model.faster_rcnn.rpn_head.RPNHead(feature_dict, head_config, label_dict=None, is_fpn=False, mode='predict')[source]

Bases: easy_vision.python.model.cv_head.CVHead

for the first stage of faster rcnn: region proposal

__init__(feature_dict, head_config, label_dict=None, is_fpn=False, mode='predict')[source]
Parameters:
  • feature_dict – must include two parts: 1. backbone output features 2. preprocessed batched image shape(input_shape) 3. preprocessed per image shapes(image_shapes)
  • label_dict – groundtruth_boxes
  • is_fpn – whether to use fpn(multiscale features)
  • head_config – protos.rpn_head_pb2.RPNHead
  • is_training – train or not(eval/predict)
build_loss_graph()[source]
Parameters:
  • label_dict – must include two fields ‘groundtruth_boxes’: bounding boxes of each object ‘num_groundtruth_boxes’: number of objects in each image
  • objectness scores (box) – self._prediction_dict[‘cls’]
  • offsets (box) – self._prediction_dict[‘reg’]
Returns:

a dict of {‘rpn_reg’: reg_loss, ‘rpn_cls’:rpn_cls}

Return type:

loss_dict

build_postprocess_graph()[source]
inputs: self._prediction_dict[‘reg’]
self._prediction_dict[‘cls’] self._prediction_dict[‘anchors’] self._image_shapes
return: a dict of proposal_boxes, proposal_scores, num_proposals
the results are also merged into self._prediction_dict
steps:
  1. box decoding: _batch_decode_boxes

2. nms and pad to max_num_proposals: batch_multiclass_non_max_suppression the second step are done image by image

build_predict_graph()[source]