easy_vision.python.model.rc3d

easy_vision.python.model.rc3d.action_detection_helper

easy_vision.python.model.rc3d.action_detection_helper.batch_decode_ts_boxes(box_coder, box_encodings, anchor_boxes)[source]

Decodes box encodings with respect to the anchor boxes.

Parameters:
  • box_encodings – a 4-D tensor with shape [batch_size, num_anchors, num_classes, box_coder.code_size] representing box encodings.
  • anchor_boxes – [batch_size, num_anchors, box_coder.code_size] representing decoded bounding boxes. If using a shared box across classes the shape will instead be [total_num_proposals, 1, box_coder.code_size].
Returns:

a

[batch_size, num_anchors, num_classes, box_coder.code_size] float tensor representing bounding box predictions (for each image in batch, proposal and class). If using a shared box across classes the shape will instead be [batch_size, num_anchors, 1, box_coder.code_size].

Return type:

decoded_boxes

easy_vision.python.model.rc3d.action_detection_helper.change_ts_coordinate_to_original_image(detection_boxes, true_image_shape, original_image_shape)[source]

Change detection boxes coordinate to original image

Parameters:
  • detection_boxes – A float32 tensor with shape [batch_size, num_proposals, box_code_size] containing proposal boxes in absolution coordinates on preprocessed image.
  • true_image_shape – A int32 tensor with shape [batch_size, 3] containing preprocessed valid image shapes
  • original_image_shape – A int32 tensor with shape [batch_size, 3] containing original valid image shapes
easy_vision.python.model.rc3d.action_detection_helper.compute_clip_ts_window(clip_shapes)[source]
Parameters:clip_shapes – the shape of video(padded) in a batch[[l0], [l1], …]
Returns:clip segment for each clip(l0, l1)

easy_vision.python.model.rc3d.rc3d

class easy_vision.python.model.rc3d.rc3d.RC3D(model_config, feature_dict, label_dict=None, mode='predict', categories=None)[source]

Bases: easy_vision.python.model.detection_model.DetectionModel

R-C3D: Region Convolutional 3D Network for Temporal Activity Detection

__init__(model_config, feature_dict, label_dict=None, mode='predict', categories=None)[source]

x.__init__(…) initializes x; see help(type(x)) for signature

build_loss_graph()[source]
build_metric_graph(eval_config)[source]

add metrics ops to graph

Parameters:eval_config – protobufer object, see python/protos/eval.proto.
Returns:a dict of metric_op, each metric_op is a tuple of (update_op, value_op)
Return type:metric_dict
build_predict_graph()[source]
classmethod create_class(name)

easy_vision.python.model.rc3d.trcnn_head

class easy_vision.python.model.rc3d.trcnn_head.TRCNNHead(feature_dict, head_config, label_dict=None, fpn_config=None, mode='predict', region_feature_extractor=None)[source]

Bases: easy_vision.python.model.cv_head.CVHead

for the second stage of faster rcnn: classification

__init__(feature_dict, head_config, label_dict=None, fpn_config=None, mode='predict', region_feature_extractor=None)[source]
Args
feature_dict: input dict of features head_config: rcnn head config label_dict: a dict of labels, during prediction, it can be None fpn_config: config of fpn mode: train for train phase, evaluate for evaluate phase, predict for predict phase region_feature_extractor: block reuse part of backbone to extract time segment feature in second stage
build_loss_graph()[source]
Build loss of the rcnn stage, including classification loss and regression loss.

variables involved are: proposals, proposal scores, proposal box offsets, groundtruth_boxes, groundtruth_classes key steps are:

  1. find matches between proposal boxes and groundtruth boxes, proposal boxes of larger IOUs with groundtruth boxes are assigned groundtruth class label, others are assigned backgroung class label
  2. for proposals with groundtruth class label, regression targets(i.e. offsets) are computed.
  3. compute regression and classification loss, normalized by number of proposals, and then normalized by batch size

Returns: rcnn_reg loss, rcnn_cls loss

build_postprocess_graph()[source]

Postprocess of rcnn stage, include box decoding, cliping, nms variables involved are: refined_box_encodings, class_predictions_with_background

Returns:a dict of nmsed_boxes, nmsed_scores, nmsed_classes, num_detections
build_predict_graph()[source]

input: proposal_boxes, feature_map output:

refined_box_encodings_with_background, class_predictions_with_background
steps:
  1. crop region features from backbone feature map.
  2. a classify block to classify the region features.
  3. a box predictor to predict box scores and box encodings.

easy_vision.python.model.rc3d.trcnn_helper

easy_vision.python.model.rc3d.trcnn_helper.build_ts_roi_pooling_fn(initial_crop_size, maxpool_kernel_size=1, maxpool_stride=1)[source]

RoiPooling Function Builder

Parameters:
  • initial_crop_size – the initial bilinear interpolation based cropping during ROI pooling.
  • maxpool_kernel_size – kernel size of the max pool op on the cropped feature map during ROI pooling.
  • maxpool_stride – stride of the max pool op on the cropped feature map during ROI pooling.
  • fpn_config – config of fpn.
Returns:

A roi_pooling function

easy_vision.python.model.rc3d.trcnn_helper.ts_roi_pooling(features_list_to_crop, proposal_boxes, image_shape, initial_crop_size, maxpool_kernel_size=1, maxpool_stride=1)[source]

Crops to a set of proposals from the feature map for a batch of images.

Parameters:
  • features_list_to_crop – A list of float32 tensor with shape [batch_size, length, height, width, depth]
  • proposal_boxes – A float32 tensor with shape [batch_size, num_proposals, box_code_size] containing proposal boxes in absolute coordinates in preprocessed image.
  • image_shape – A 1-D tensor of shape [5] containing image tensor shape.
  • initial_crop_size – the initial bilinear interpolation based cropping during ROI pooling.
  • maxpool_kernel_size – kernel size of the max pool op on the cropped feature map during ROI pooling.
  • maxpool_stride – stride of the max pool op on the cropped feature map during ROI pooling.
Returns:

A float32 tensor with shape [K, new_height, new_width, depth].

easy_vision.python.model.rc3d.trpn_head

class easy_vision.python.model.rc3d.trpn_head.TRPNHead(feature_dict, head_config, label_dict=None, mode='predict')[source]

Bases: easy_vision.python.model.cv_head.CVHead

for the first stage of rc3d: temporal region proposal

__init__(feature_dict, head_config, label_dict=None, mode='predict')[source]
Parameters:
  • feature_dict – must include two parts: 1. backbone output features 2. preprocessed batched image shape(input_shape) 3. preprocessed per image shapes(image_shapes)
  • label_dict – groundtruth_boxes
  • head_config – protos.rpn_head_pb2.RPNHead
  • is_training – train or not(eval/predict)
build_loss_graph()[source]
Parameters:
  • label_dict – must include two fields ‘groundtruth_boxes’: bounding boxes of each object ‘num_groundtruth_boxes’: number of objects in each image
  • objectness scores (box) – self._prediction_dict[‘cls’]
  • offsets (box) – self._prediction_dict[‘reg’]
Returns:

a dict of {‘rpn_reg’: reg_loss, ‘rpn_cls’:rpn_cls}

Return type:

loss_dict

build_postprocess_graph()[source]
inputs: self._prediction_dict[‘reg’]
self._prediction_dict[‘cls’] self._prediction_dict[‘anchors’] self._image_shapes
return: a dict of proposal_boxes, proposal_scores, num_proposals
the results are also merged into self._prediction_dict
steps:
  1. box decoding: _batch_decode_boxes

2. nms and pad to max_num_proposals: batch_multiclass_non_max_suppression the second step are done image by image

build_predict_graph()[source]