easy_vision.python.model.rc3d¶

easy_vision.python.model.rc3d.action_detection_helper¶

easy_vision.python.model.rc3d.action_detection_helper.batch_decode_ts_boxes(box_coder, box_encodings, anchor_boxes)[source]¶

Decodes box encodings with respect to the anchor boxes.

Parameters:

box_encodings – a 4-D tensor with shape [batch_size, num_anchors, num_classes, box_coder.code_size] representing box encodings.
anchor_boxes – [batch_size, num_anchors, box_coder.code_size] representing decoded bounding boxes. If using a shared box across classes the shape will instead be [total_num_proposals, 1, box_coder.code_size].

Returns:

a: [batch_size, num_anchors, num_classes, box_coder.code_size] float tensor representing bounding box predictions (for each image in batch, proposal and class). If using a shared box across classes the shape will instead be [batch_size, num_anchors, 1, box_coder.code_size].

Return type:

decoded_boxes

easy_vision.python.model.rc3d.action_detection_helper.change_ts_coordinate_to_original_image(detection_boxes, true_image_shape, original_image_shape)[source]¶

Change detection boxes coordinate to original image

Parameters:	detection_boxes – A float32 tensor with shape [batch_size, num_proposals, box_code_size] containing proposal boxes in absolution coordinates on preprocessed image. true_image_shape – A int32 tensor with shape [batch_size, 3] containing preprocessed valid image shapes original_image_shape – A int32 tensor with shape [batch_size, 3] containing original valid image shapes

easy_vision.python.model.rc3d.action_detection_helper.compute_clip_ts_window(clip_shapes)[source]¶

Parameters:	clip_shapes – the shape of video(padded) in a batch[[l0], [l1], …]
Returns:	clip segment for each clip(l0, l1)

easy_vision.python.model.rc3d.rc3d¶

class easy_vision.python.model.rc3d.rc3d.RC3D(model_config, feature_dict, label_dict=None, mode='predict', categories=None)[source]¶

Bases: easy_vision.python.model.detection_model.DetectionModel

R-C3D: Region Convolutional 3D Network for Temporal Activity Detection

__init__(model_config, feature_dict, label_dict=None, mode='predict', categories=None)[source]¶: x.__init__(…) initializes x; see help(type(x)) for signature

build_loss_graph()[source]¶

build_metric_graph(eval_config)[source]¶

add metrics ops to graph

Parameters:	eval_config – protobufer object, see python/protos/eval.proto.
Returns:	a dict of metric_op, each metric_op is a tuple of (update_op, value_op)
Return type:	metric_dict

build_predict_graph()[source]¶

classmethod create_class(name)¶

easy_vision.python.model.rc3d.trcnn_head¶

class easy_vision.python.model.rc3d.trcnn_head.TRCNNHead(feature_dict, head_config, label_dict=None, fpn_config=None, mode='predict', region_feature_extractor=None)[source]¶

Bases: easy_vision.python.model.cv_head.CVHead

for the second stage of faster rcnn: classification

__init__(feature_dict, head_config, label_dict=None, fpn_config=None, mode='predict', region_feature_extractor=None)[source]¶

Args: feature_dict: input dict of features head_config: rcnn head config label_dict: a dict of labels, during prediction, it can be None fpn_config: config of fpn mode: train for train phase, evaluate for evaluate phase, predict for predict phase region_feature_extractor: block reuse part of backbone to extract time segment feature in second stage

build_loss_graph()[source]¶

Build loss of the rcnn stage, including classification loss and regression loss.

variables involved are: proposals, proposal scores, proposal box offsets, groundtruth_boxes, groundtruth_classes key steps are:

find matches between proposal boxes and groundtruth boxes, proposal boxes of larger IOUs with groundtruth boxes are assigned groundtruth class label, others are assigned backgroung class label
for proposals with groundtruth class label, regression targets(i.e. offsets) are computed.
compute regression and classification loss, normalized by number of proposals, and then normalized by batch size

Returns: rcnn_reg loss, rcnn_cls loss

build_postprocess_graph()[source]¶

Postprocess of rcnn stage, include box decoding, cliping, nms variables involved are: refined_box_encodings, class_predictions_with_background

Returns:	a dict of nmsed_boxes, nmsed_scores, nmsed_classes, num_detections

build_predict_graph()[source]¶

input: proposal_boxes, feature_map output:

refined_box_encodings_with_background, class_predictions_with_background

steps:

crop region features from backbone feature map.
a classify block to classify the region features.
a box predictor to predict box scores and box encodings.

easy_vision.python.model.rc3d.trcnn_helper¶

easy_vision.python.model.rc3d.trcnn_helper.build_ts_roi_pooling_fn(initial_crop_size, maxpool_kernel_size=1, maxpool_stride=1)[source]¶

RoiPooling Function Builder

Parameters:	initial_crop_size – the initial bilinear interpolation based cropping during ROI pooling. maxpool_kernel_size – kernel size of the max pool op on the cropped feature map during ROI pooling. maxpool_stride – stride of the max pool op on the cropped feature map during ROI pooling. fpn_config – config of fpn.
Returns:	A roi_pooling function

easy_vision.python.model.rc3d.trcnn_helper.ts_roi_pooling(features_list_to_crop, proposal_boxes, image_shape, initial_crop_size, maxpool_kernel_size=1, maxpool_stride=1)[source]¶

Crops to a set of proposals from the feature map for a batch of images.

Parameters:

features_list_to_crop – A list of float32 tensor with shape [batch_size, length, height, width, depth]
proposal_boxes – A float32 tensor with shape [batch_size, num_proposals, box_code_size] containing proposal boxes in absolute coordinates in preprocessed image.
image_shape – A 1-D tensor of shape [5] containing image tensor shape.
initial_crop_size – the initial bilinear interpolation based cropping during ROI pooling.
maxpool_kernel_size – kernel size of the max pool op on the cropped feature map during ROI pooling.
maxpool_stride – stride of the max pool op on the cropped feature map during ROI pooling.

Returns:

A float32 tensor with shape [K, new_height, new_width, depth].

easy_vision.python.model.rc3d.trpn_head¶

class easy_vision.python.model.rc3d.trpn_head.TRPNHead(feature_dict, head_config, label_dict=None, mode='predict')[source]¶

Bases: easy_vision.python.model.cv_head.CVHead

for the first stage of rc3d: temporal region proposal

__init__(feature_dict, head_config, label_dict=None, mode='predict')[source]¶

Parameters:	feature_dict – must include two parts: 1. backbone output features 2. preprocessed batched image shape(input_shape) 3. preprocessed per image shapes(image_shapes) label_dict – groundtruth_boxes head_config – protos.rpn_head_pb2.RPNHead is_training – train or not(eval/predict)

build_loss_graph()[source]¶

Parameters:	label_dict – must include two fields ‘groundtruth_boxes’: bounding boxes of each object ‘num_groundtruth_boxes’: number of objects in each image objectness scores (box) – self._prediction_dict[‘cls’] offsets (box) – self._prediction_dict[‘reg’]
Returns:	a dict of {‘rpn_reg’: reg_loss, ‘rpn_cls’:rpn_cls}
Return type:	loss_dict

build_postprocess_graph()[source]¶

inputs: self._prediction_dict[‘reg’]

self._prediction_dict[‘cls’] self._prediction_dict[‘anchors’] self._image_shapes

return: a dict of proposal_boxes, proposal_scores, num_proposals

the results are also merged into self._prediction_dict

steps:

box decoding: _batch_decode_boxes

2. nms and pad to max_num_proposals: batch_multiclass_non_max_suppression the second step are done image by image

build_predict_graph()[source]¶