easy_vision.python.model.faster_rcnn¶

easy_vision.python.model.faster_rcnn.faster_rcnn¶

class easy_vision.python.model.faster_rcnn.faster_rcnn.FasterRcnn(model_config, feature_dict, label_dict=None, mode='predict', categories=None)[source]¶

Bases: easy_vision.python.model.detection_model.DetectionModel

__init__(model_config, feature_dict, label_dict=None, mode='predict', categories=None)[source]¶: x.__init__(…) initializes x; see help(type(x)) for signature

build_loss_graph()[source]¶

build_predict_graph()[source]¶

classmethod create_class(name)¶

get_scopes_of_levels()[source]¶: return a list of variable scope list order by levels ( outputs -> heads -> backbone -> inputs).

easy_vision.python.model.faster_rcnn.rcnn_head¶

class easy_vision.python.model.faster_rcnn.rcnn_head.RCNNHead(feature_dict, head_config, label_dict=None, fpn_config=None, mode='predict', region_feature_extractor=None)[source]¶

Bases: easy_vision.python.model.cv_head.CVHead

for the second stage of faster rcnn: classification

__init__(feature_dict, head_config, label_dict=None, fpn_config=None, mode='predict', region_feature_extractor=None)[source]¶

Args: feature_dict: input dict of features head_config: rcnn head config label_dict: a dict of labels, during prediction, it can be None fpn_config: config of fpn mode: train for train phase, evaluate for evaluate phase, predict for predict phase region_feature_extractor: block reuse part of backbone to extract box feature in second stage

build_loss_graph()[source]¶

Build loss of the rcnn stage, including classification loss and regression loss. variables involved are: proposals, proposal scores, proposal box offsets, groundtruth_boxes, groundtruth_classes key steps are:

find matches between proposal boxes and groundtruth boxes, proposal boxes of larger IOUs with groundtruth boxes are assigned groundtruth class label, others are assigned backgroung class label
for proposals with groundtruth class label, regression targets(i.e. offsets) are computed.
compute regression and classification loss, normalized by number of proposals, and then normalized by batch size

Returns:	rcnn_reg loss, rcnn_cls loss

build_postprocess_graph()[source]¶

Postprocess of rcnn stage, include box decoding, cliping, nms

variables involved are: refined_box_encodings,: class_predictions_with_background

Returns:	a dict of nmsed_boxes, nmsed_scores, nmsed_classes, num_detections

build_predict_graph()[source]¶

Build predict graph of rcnn stage

crop region features from backbone feature map.
a classify block to classify the region features.
a box predictor to predict box scores and box encodings.

easy_vision.python.model.faster_rcnn.rcnn_helper¶

easy_vision.python.model.faster_rcnn.rcnn_helper.add_gts_to_proposal(proposal_boxlists, groundtruth_boxlists)[source]¶

Add gts(groundtruth boxes) to proposals

Parameters:	proposal_boxlists – A list of BoxList containing proposal boxes in absolute coordinates. groundtruth_boxlists – A list of Boxlist containing groundtruth object boxes in absolute coordinates.
Returns:	a list of BoxList contained sampled proposals. num_proposals: number of sampled proposals per images max_num_proposals: max number of sampled proposals
Return type:	sampled_proposal_boxlists

easy_vision.python.model.faster_rcnn.rcnn_helper.build_roi_pooling_fn(initial_crop_size, maxpool_kernel_size=1, maxpool_stride=1, fpn_config=None)[source]¶

RoiPooling Function Builder

Parameters:	initial_crop_size – the initial bilinear interpolation based cropping during ROI pooling. maxpool_kernel_size – kernel size of the max pool op on the cropped feature map during ROI pooling. maxpool_stride – stride of the max pool op on the cropped feature map during ROI pooling. fpn_config – config of fpn.
Returns:	A roi_pooling function

easy_vision.python.model.faster_rcnn.rcnn_helper.fpn_roi_pooling(features_list_to_crop, proposal_boxes, image_shape, initial_crop_size, maxpool_kernel_size=1, maxpool_stride=1, roi_min_level=2, roi_max_level=5, roi_canonical_level=4, roi_canonical_scale=224)[source]¶

Crops to a set of proposals from the feature map for a batch of images.

Parameters:

features_list_to_crop – A list of float32 tensor with shape [batch_size, height, width, depth]
proposal_boxes – A float32 tensor with shape [batch_size, num_proposals, box_code_size] containing proposal boxes in absolute coordinates in preprocessed image.
image_shape – A 1-D tensor of shape [4] containing image tensor shape.
initial_crop_size – the initial bilinear interpolation based cropping during ROI pooling.
maxpool_kernel_size – kernel size of the max pool op on the cropped feature map during ROI pooling.
maxpool_stride – stride of the max pool op on the cropped feature map during ROI pooling.
roi_min_level – level refers to feature map indices, level is associated with feature map strides = 2 ^ level
roi_max_level – refers to the roi level of lowest fpn feature map
roi_canonical_level – roi_canonical_level(k0) specified the parameters used in distribute proposals to feature maps: k = floor(k0 + log2(sqrt(wh)/224))
roi_canonical_scale – default is 224

Returns:

A float32 tensor with shape [K, new_height, new_width, depth].

easy_vision.python.model.faster_rcnn.rcnn_helper.roi_pooling(features_list_to_crop, proposal_boxes, image_shape, initial_crop_size, maxpool_kernel_size=1, maxpool_stride=1)[source]¶

Crops to a set of proposals from the feature map for a batch of images.

Parameters:

features_list_to_crop – A list of float32 tensor with shape [batch_size, height, width, depth]
proposal_boxes – A float32 tensor with shape [batch_size, num_proposals, box_code_size] containing proposal boxes in absolute coordinates in preprocessed image.
image_shape – A 1-D tensor of shape [4] containing image tensor shape.
initial_crop_size – the initial bilinear interpolation based cropping during ROI pooling.
maxpool_kernel_size – kernel size of the max pool op on the cropped feature map during ROI pooling.
maxpool_stride – stride of the max pool op on the cropped feature map during ROI pooling.

Returns:

A float32 tensor with shape [K, new_height, new_width, depth].

easy_vision.python.model.faster_rcnn.rcnn_helper.sample_minbatch_train_box(proposal_boxlists, groundtruth_boxlists, detector_target_assigner, minibatch_balance_fraction, minibatch_batch_size)[source]¶

Samples a mini-batch of proposals to be sent to the box classifier.

Parameters:	proposal_boxlists – A list of BoxList containing proposal boxes in absolute coordinates. groundtruth_boxlists – A list of Boxlist containing groundtruth object boxes in absolute coordinates. detector_target_assigner – detection target assigner. minibatch_balance_fraction – balance fraction of BalancedPositiveNegativeSampler minibatch_batch_size – sample batch size
Returns:	a list of BoxList contained sampled proposals. num_proposals: number of sampled proposals per images max_num_proposals: max number of sampled proposals
Return type:	sampled_proposal_boxlists

easy_vision.python.model.faster_rcnn.rpn_head¶

class easy_vision.python.model.faster_rcnn.rpn_head.RPNHead(feature_dict, head_config, label_dict=None, is_fpn=False, mode='predict')[source]¶

Bases: easy_vision.python.model.cv_head.CVHead

for the first stage of faster rcnn: region proposal

__init__(feature_dict, head_config, label_dict=None, is_fpn=False, mode='predict')[source]¶

Parameters:	feature_dict – must include two parts: 1. backbone output features 2. preprocessed batched image shape(input_shape) 3. preprocessed per image shapes(image_shapes) label_dict – groundtruth_boxes is_fpn – whether to use fpn(multiscale features) head_config – protos.rpn_head_pb2.RPNHead is_training – train or not(eval/predict)

build_loss_graph()[source]¶

Parameters:	label_dict – must include two fields ‘groundtruth_boxes’: bounding boxes of each object ‘num_groundtruth_boxes’: number of objects in each image objectness scores (box) – self._prediction_dict[‘cls’] offsets (box) – self._prediction_dict[‘reg’]
Returns:	a dict of {‘rpn_reg’: reg_loss, ‘rpn_cls’:rpn_cls}
Return type:	loss_dict

build_postprocess_graph()[source]¶

inputs: self._prediction_dict[‘reg’]

self._prediction_dict[‘cls’] self._prediction_dict[‘anchors’] self._image_shapes

return: a dict of proposal_boxes, proposal_scores, num_proposals

the results are also merged into self._prediction_dict

steps:

box decoding: _batch_decode_boxes

2. nms and pad to max_num_proposals: batch_multiclass_non_max_suppression the second step are done image by image

build_predict_graph()[source]¶