easy_vision.python.model.rc3d¶
easy_vision.python.model.rc3d.action_detection_helper¶
-
easy_vision.python.model.rc3d.action_detection_helper.
batch_decode_ts_boxes
(box_coder, box_encodings, anchor_boxes)[source]¶ Decodes box encodings with respect to the anchor boxes.
Parameters: - box_encodings – a 4-D tensor with shape [batch_size, num_anchors, num_classes, box_coder.code_size] representing box encodings.
- anchor_boxes – [batch_size, num_anchors, box_coder.code_size] representing decoded bounding boxes. If using a shared box across classes the shape will instead be [total_num_proposals, 1, box_coder.code_size].
Returns: - a
[batch_size, num_anchors, num_classes, box_coder.code_size] float tensor representing bounding box predictions (for each image in batch, proposal and class). If using a shared box across classes the shape will instead be [batch_size, num_anchors, 1, box_coder.code_size].
Return type: decoded_boxes
-
easy_vision.python.model.rc3d.action_detection_helper.
change_ts_coordinate_to_original_image
(detection_boxes, true_image_shape, original_image_shape)[source]¶ Change detection boxes coordinate to original image
Parameters: - detection_boxes – A float32 tensor with shape [batch_size, num_proposals, box_code_size] containing proposal boxes in absolution coordinates on preprocessed image.
- true_image_shape – A int32 tensor with shape [batch_size, 3] containing preprocessed valid image shapes
- original_image_shape – A int32 tensor with shape [batch_size, 3] containing original valid image shapes
easy_vision.python.model.rc3d.rc3d¶
-
class
easy_vision.python.model.rc3d.rc3d.
RC3D
(model_config, feature_dict, label_dict=None, mode='predict', categories=None)[source]¶ Bases:
easy_vision.python.model.detection_model.DetectionModel
R-C3D: Region Convolutional 3D Network for Temporal Activity Detection
-
__init__
(model_config, feature_dict, label_dict=None, mode='predict', categories=None)[source]¶ x.__init__(…) initializes x; see help(type(x)) for signature
-
build_metric_graph
(eval_config)[source]¶ add metrics ops to graph
Parameters: eval_config – protobufer object, see python/protos/eval.proto. Returns: a dict of metric_op, each metric_op is a tuple of (update_op, value_op) Return type: metric_dict
-
classmethod
create_class
(name)¶
-
easy_vision.python.model.rc3d.trcnn_head¶
-
class
easy_vision.python.model.rc3d.trcnn_head.
TRCNNHead
(feature_dict, head_config, label_dict=None, fpn_config=None, mode='predict', region_feature_extractor=None)[source]¶ Bases:
easy_vision.python.model.cv_head.CVHead
for the second stage of faster rcnn: classification
-
__init__
(feature_dict, head_config, label_dict=None, fpn_config=None, mode='predict', region_feature_extractor=None)[source]¶ - Args
- feature_dict: input dict of features head_config: rcnn head config label_dict: a dict of labels, during prediction, it can be None fpn_config: config of fpn mode: train for train phase, evaluate for evaluate phase, predict for predict phase region_feature_extractor: block reuse part of backbone to extract time segment feature in second stage
-
build_loss_graph
()[source]¶ - Build loss of the rcnn stage, including classification loss and regression loss.
variables involved are: proposals, proposal scores, proposal box offsets, groundtruth_boxes, groundtruth_classes key steps are:
- find matches between proposal boxes and groundtruth boxes, proposal boxes of larger IOUs with groundtruth boxes are assigned groundtruth class label, others are assigned backgroung class label
- for proposals with groundtruth class label, regression targets(i.e. offsets) are computed.
- compute regression and classification loss, normalized by number of proposals, and then normalized by batch size
Returns: rcnn_reg loss, rcnn_cls loss
-
build_postprocess_graph
()[source]¶ Postprocess of rcnn stage, include box decoding, cliping, nms variables involved are: refined_box_encodings, class_predictions_with_background
Returns: a dict of nmsed_boxes, nmsed_scores, nmsed_classes, num_detections
-
build_predict_graph
()[source]¶ input: proposal_boxes, feature_map output:
refined_box_encodings_with_background, class_predictions_with_background- steps:
- crop region features from backbone feature map.
- a classify block to classify the region features.
- a box predictor to predict box scores and box encodings.
-
easy_vision.python.model.rc3d.trcnn_helper¶
-
easy_vision.python.model.rc3d.trcnn_helper.
build_ts_roi_pooling_fn
(initial_crop_size, maxpool_kernel_size=1, maxpool_stride=1)[source]¶ RoiPooling Function Builder
Parameters: - initial_crop_size – the initial bilinear interpolation based cropping during ROI pooling.
- maxpool_kernel_size – kernel size of the max pool op on the cropped feature map during ROI pooling.
- maxpool_stride – stride of the max pool op on the cropped feature map during ROI pooling.
- fpn_config – config of fpn.
Returns: A roi_pooling function
-
easy_vision.python.model.rc3d.trcnn_helper.
ts_roi_pooling
(features_list_to_crop, proposal_boxes, image_shape, initial_crop_size, maxpool_kernel_size=1, maxpool_stride=1)[source]¶ Crops to a set of proposals from the feature map for a batch of images.
Parameters: - features_list_to_crop – A list of float32 tensor with shape [batch_size, length, height, width, depth]
- proposal_boxes – A float32 tensor with shape [batch_size, num_proposals, box_code_size] containing proposal boxes in absolute coordinates in preprocessed image.
- image_shape – A 1-D tensor of shape [5] containing image tensor shape.
- initial_crop_size – the initial bilinear interpolation based cropping during ROI pooling.
- maxpool_kernel_size – kernel size of the max pool op on the cropped feature map during ROI pooling.
- maxpool_stride – stride of the max pool op on the cropped feature map during ROI pooling.
Returns: A float32 tensor with shape [K, new_height, new_width, depth].
easy_vision.python.model.rc3d.trpn_head¶
-
class
easy_vision.python.model.rc3d.trpn_head.
TRPNHead
(feature_dict, head_config, label_dict=None, mode='predict')[source]¶ Bases:
easy_vision.python.model.cv_head.CVHead
for the first stage of rc3d: temporal region proposal
-
__init__
(feature_dict, head_config, label_dict=None, mode='predict')[source]¶ Parameters: - feature_dict – must include two parts: 1. backbone output features 2. preprocessed batched image shape(input_shape) 3. preprocessed per image shapes(image_shapes)
- label_dict – groundtruth_boxes
- head_config – protos.rpn_head_pb2.RPNHead
- is_training – train or not(eval/predict)
-
build_loss_graph
()[source]¶ Parameters: - label_dict – must include two fields ‘groundtruth_boxes’: bounding boxes of each object ‘num_groundtruth_boxes’: number of objects in each image
- objectness scores (box) – self._prediction_dict[‘cls’]
- offsets (box) – self._prediction_dict[‘reg’]
Returns: a dict of {‘rpn_reg’: reg_loss, ‘rpn_cls’:rpn_cls}
Return type: loss_dict
-
build_postprocess_graph
()[source]¶ - inputs: self._prediction_dict[‘reg’]
- self._prediction_dict[‘cls’] self._prediction_dict[‘anchors’] self._image_shapes
- return: a dict of proposal_boxes, proposal_scores, num_proposals
- the results are also merged into self._prediction_dict
- steps:
- box decoding: _batch_decode_boxes
2. nms and pad to max_num_proposals: batch_multiclass_non_max_suppression the second step are done image by image
-