easy_vision.python.core.detection_predictors

easy_vision.python.core.detection_predictors.convolutional_box_predictor

Convolutional Box Predictors with and without weight sharing.

class easy_vision.python.core.detection_predictors.convolutional_box_predictor.Convolutional3DBoxPredictor(is_training, num_classes, box_prediction_head, class_prediction_head, other_heads, conv_hyperparams_fn, num_layers_before_predictor, min_depth, max_depth)[source]

Bases: easy_vision.python.core.detection_predictors.convolutional_box_predictor.ConvolutionalBoxPredictor

Convolutional Box Predictor.

Optionally add an intermediate 1x1 convolutional layer after features and predict in parallel branches box_encodings and class_predictions_with_background.

Currently this box predictor assumes that predictions are “shared” across classes — that is each anchor makes box predictions which do not depend on class.

__init__(is_training, num_classes, box_prediction_head, class_prediction_head, other_heads, conv_hyperparams_fn, num_layers_before_predictor, min_depth, max_depth)[source]

Constructor.

Parameters:
  • is_training – Indicates whether the BoxPredictor is in training mode.
  • num_classes – number of classes. Note that num_classes does not include the background category, so if groundtruth labels take values in {0, 1, .., K-1}, num_classes=K (and not K+1, even though the assigned classification targets can range from {0,… K}).
  • box_prediction_head – The head that predicts the boxes.
  • class_prediction_head – The head that predicts the classes.
  • other_heads – A dictionary mapping head names to convolutional head classes.
  • conv_hyperparams_fn – A function to generate tf-slim arg_scope with hyperparameters for convolution ops.
  • num_layers_before_predictor – Number of the additional conv layers before the predictor.
  • min_depth – Minimum feature depth prior to predicting box encodings and class predictions.
  • max_depth – Maximum feature depth prior to predicting box encodings and class predictions. If max_depth is set to 0, no additional feature map will be inserted before location and class predictions.
Raises:

ValueError – if min_depth > max_depth.

num_classes
class easy_vision.python.core.detection_predictors.convolutional_box_predictor.ConvolutionalBoxPredictor(is_training, num_classes, box_prediction_head, class_prediction_head, other_heads, conv_hyperparams_fn, num_layers_before_predictor, min_depth, max_depth)[source]

Bases: easy_vision.python.core.detection_predictors.predictor.BoxPredictor

Convolutional Box Predictor.

Optionally add an intermediate 1x1 convolutional layer after features and predict in parallel branches box_encodings and class_predictions_with_background.

Currently this box predictor assumes that predictions are “shared” across classes — that is each anchor makes box predictions which do not depend on class.

__init__(is_training, num_classes, box_prediction_head, class_prediction_head, other_heads, conv_hyperparams_fn, num_layers_before_predictor, min_depth, max_depth)[source]

Constructor.

Parameters:
  • is_training – Indicates whether the BoxPredictor is in training mode.
  • num_classes – number of classes. Note that num_classes does not include the background category, so if groundtruth labels take values in {0, 1, .., K-1}, num_classes=K (and not K+1, even though the assigned classification targets can range from {0,… K}).
  • box_prediction_head – The head that predicts the boxes.
  • class_prediction_head – The head that predicts the classes.
  • other_heads – A dictionary mapping head names to convolutional head classes.
  • conv_hyperparams_fn – A function to generate tf-slim arg_scope with hyperparameters for convolution ops.
  • num_layers_before_predictor – Number of the additional conv layers before the predictor.
  • min_depth – Minimum feature depth prior to predicting box encodings and class predictions.
  • max_depth – Maximum feature depth prior to predicting box encodings and class predictions. If max_depth is set to 0, no additional feature map will be inserted before location and class predictions.
Raises:

ValueError – if min_depth > max_depth.

num_classes
class easy_vision.python.core.detection_predictors.convolutional_box_predictor.WeightSharedConvolutionalBoxPredictor(is_training, num_classes, box_prediction_head, class_prediction_head, other_heads, conv_hyperparams_fn, depth, num_layers_before_predictor, kernel_size=3, apply_batch_norm=False, share_prediction_tower=False, use_depthwise=False)[source]

Bases: easy_vision.python.core.detection_predictors.predictor.BoxPredictor

Convolutional Box Predictor with weight sharing.

Defines the box predictor as defined in https://arxiv.org/abs/1708.02002. This class differs from ConvolutionalBoxPredictor in that it shares weights and biases while predicting from different feature maps. However, batch_norm parameters are not shared because the statistics of the activations vary among the different feature maps.

Also note that separate multi-layer towers are constructed for the box encoding and class detection_predictors respectively.

__init__(is_training, num_classes, box_prediction_head, class_prediction_head, other_heads, conv_hyperparams_fn, depth, num_layers_before_predictor, kernel_size=3, apply_batch_norm=False, share_prediction_tower=False, use_depthwise=False)[source]

Constructor.

Parameters:
  • is_training – Indicates whether the BoxPredictor is in training mode.
  • num_classes – number of classes. Note that num_classes does not include the background category, so if groundtruth labels take values in {0, 1, .., K-1}, num_classes=K (and not K+1, even though the assigned classification targets can range from {0,… K}).
  • box_prediction_head – The head that predicts the boxes.
  • class_prediction_head – The head that predicts the classes.
  • other_heads – A dictionary mapping head names to convolutional head classes.
  • conv_hyperparams_fn – A function to generate tf-slim arg_scope with hyperparameters for convolution ops.
  • depth – depth of conv layers.
  • num_layers_before_predictor – Number of the additional conv layers before the predictor.
  • kernel_size – Size of final convolution kernel.
  • apply_batch_norm – Whether to apply batch normalization to conv layers in this predictor.
  • share_prediction_tower – Whether to share the multi-layer tower between box prediction and class prediction heads.
  • use_depthwise – Whether to use depthwise separable conv2d instead of regular conv2d.
num_classes

easy_vision.python.core.detection_predictors.mask_rcnn_box_predictor

Mask R-CNN Box Predictor.

class easy_vision.python.core.detection_predictors.mask_rcnn_box_predictor.MaskRCNN3DBoxPredictor(is_training, num_classes, box_prediction_head, class_prediction_head, fc_hyperparams_fn, num_layers_before_predictor, depth)[source]

Bases: easy_vision.python.core.detection_predictors.mask_rcnn_box_predictor.MaskRCNNBoxPredictor

Mask R-CNN Box Predictor.

See Mask R-CNN: He, K., Gkioxari, G., Dollar, P., & Girshick, R. (2017). Mask R-CNN. arXiv preprint arXiv:1703.06870.

This is used for the second stage of the Mask R-CNN detector where proposals cropped from an image are arranged along the batch dimension of the input image_features tensor. Notice that locations are not shared across classes, thus for each anchor, a separate prediction is made for each class.

Currently this box predictor makes per-class predictions; that is, each anchor makes a separate box prediction for each class.

__init__(is_training, num_classes, box_prediction_head, class_prediction_head, fc_hyperparams_fn, num_layers_before_predictor, depth)[source]

Constructor.

Parameters:
  • is_training – Indicates whether the BoxPredictor is in training mode.
  • num_classes – number of classes. Note that num_classes does not include the background category, so if groundtruth labels take values in {0, 1, .., K-1}, num_classes=K (and not K+1, even though the assigned classification targets can range from {0,… K}).
  • box_prediction_head – The head that predicts the boxes in second stage.
  • class_prediction_head – The head that predicts the classes in second stage.
  • fc_hyperparams_fn – A function to generate tf-slim arg_scope with hyperparameters for fc ops.
  • num_layers_before_predictor – Number of the additional fc layers before the predictor.
  • depth – feature depth prior to predicting box encodings and class predictions.
class easy_vision.python.core.detection_predictors.mask_rcnn_box_predictor.MaskRCNNBoxPredictor(is_training, num_classes, box_prediction_head, class_prediction_head, fc_hyperparams_fn, num_layers_before_predictor, depth)[source]

Bases: easy_vision.python.core.detection_predictors.predictor.BoxPredictor

Mask R-CNN Box Predictor.

See Mask R-CNN: He, K., Gkioxari, G., Dollar, P., & Girshick, R. (2017). Mask R-CNN. arXiv preprint arXiv:1703.06870.

This is used for the second stage of the Mask R-CNN detector where proposals cropped from an image are arranged along the batch dimension of the input image_features tensor. Notice that locations are not shared across classes, thus for each anchor, a separate prediction is made for each class.

Currently this box predictor makes per-class predictions; that is, each anchor makes a separate box prediction for each class.

__init__(is_training, num_classes, box_prediction_head, class_prediction_head, fc_hyperparams_fn, num_layers_before_predictor, depth)[source]

Constructor.

Parameters:
  • is_training – Indicates whether the BoxPredictor is in training mode.
  • num_classes – number of classes. Note that num_classes does not include the background category, so if groundtruth labels take values in {0, 1, .., K-1}, num_classes=K (and not K+1, even though the assigned classification targets can range from {0,… K}).
  • box_prediction_head – The head that predicts the boxes in second stage.
  • class_prediction_head – The head that predicts the classes in second stage.
  • fc_hyperparams_fn – A function to generate tf-slim arg_scope with hyperparameters for fc ops.
  • num_layers_before_predictor – Number of the additional fc layers before the predictor.
  • depth – feature depth prior to predicting box encodings and class predictions.
get_second_stage_prediction_heads()[source]
num_classes

easy_vision.python.core.detection_predictors.mask_rcnn_mask_predictor

Mask R-CNN Mask Predictor.

class easy_vision.python.core.detection_predictors.mask_rcnn_mask_predictor.MaskRCNNMaskPredictor(is_training, num_classes, mask_prediction_head)[source]

Bases: easy_vision.python.core.detection_predictors.predictor.MaskPredictor

Mask R-CNN Mask Predictor.

See Mask R-CNN: He, K., Gkioxari, G., Dollar, P., & Girshick, R. (2017). Mask R-CNN. arXiv preprint arXiv:1703.06870.

This is used for the third stage of the Mask R-CNN detector where proposals cropped from an image are arranged along the batch dimension of the input image_features tensor. Notice that locations are not shared across classes, thus for each anchor, a separate prediction is made for each class.

Currently this masks predictor makes per-class predictions; that is, each anchor makes a separate masks prediction for each class.

__init__(is_training, num_classes, mask_prediction_head)[source]

Constructor.

Parameters:
  • is_training – Indicates whether the BoxPredictor is in training mode.
  • num_classes – number of classes. Note that num_classes does not include the background category, so if groundtruth labels take values in {0, 1, .., K-1}, num_classes=K (and not K+1, even though the assigned classification targets can range from {0,… K}).
  • mask_prediction_head – The head that predicts the maskes in third stage.
num_classes

easy_vision.python.core.detection_predictors.predictor

base detection_predictors for object detectors.

detection_predictors are classes that take a high level image feature map as input and produce at least one of predictions, (1) a tensor encoding box locations, and (2) a tensor encoding classes for each box. (3) a tensor encoding instance mask for each box. (4) a tensor encoding instance keypoints for each box.

These components are passed directly to loss functions in our detection models.

These modules are separated from the main model since the same few box predictor architectures are shared across many models.

class easy_vision.python.core.detection_predictors.predictor.BoxPredictor(is_training, num_classes)[source]

Bases: easy_vision.python.core.detection_predictors.predictor.Predictor

BoxPredictor.

__init__(is_training, num_classes)[source]

Constructor.

Parameters:
  • is_training – Indicates whether the BoxPredictor is in training mode.
  • num_classes – number of classes. Note that num_classes does not include the background category, so if groundtruth labels take values in {0, 1, .., K-1}, num_classes=K (and not K+1, even though the assigned classification targets can range from {0,… K}).
num_classes
class easy_vision.python.core.detection_predictors.predictor.KeypointPredictor(is_training, num_keypoints)[source]

Bases: easy_vision.python.core.detection_predictors.predictor.Predictor

KeypointPredictor.

__init__(is_training, num_keypoints)[source]

Constructor.

Parameters:
  • is_training – Indicates whether the BoxPredictor is in training mode.
  • num_keypoints – number of keypoints.
num_keypoints
class easy_vision.python.core.detection_predictors.predictor.MaskPredictor(is_training, num_classes)[source]

Bases: easy_vision.python.core.detection_predictors.predictor.Predictor

MaskPredictor.

__init__(is_training, num_classes)[source]

Constructor.

Parameters:
  • is_training – Indicates whether the BoxPredictor is in training mode.
  • num_classes – number of classes. Note that num_classes does not include the background category, so if groundtruth labels take values in {0, 1, .., K-1}, num_classes=K (and not K+1, even though the assigned classification targets can range from {0,… K}).
num_classes
class easy_vision.python.core.detection_predictors.predictor.Predictor(is_training)[source]

Bases: object

Predictor.

__init__(is_training)[source]

Constructor.

Parameters:is_training – Indicates whether the Predictor is in training mode.
predict(image_features, num_predictions_per_location, scope=None, **params)[source]

Computes encoded object locations and corresponding confidences.

Takes a list of high level image feature maps as input and produces a list of box encodings and a list of class scores where each element in the output lists correspond to the feature maps in the input list.

Parameters:
  • image_features – A list of float tensors of shape [batch_size, height_i,
  • channels_i] containing features for a batch of images. (width_i,) –
  • num_predictions_per_location – A list of integers representing the number of box predictions to be made per spatial location for each feature map.
  • scope – Variable and Op scope name.
  • **params – Additional keyword arguments for specific implementations of BoxPredictor.
Returns:

A dictionary containing at least one of the following tensors.
box_encodings: A list of float tensors. Each entry in the list

corresponds to a feature map in the input image_features list. All tensors in the list have one of the two following shapes: a. [batch_size, num_anchors_i, q, code_size] representing the location

of the objects, where q is 1 or the number of classes.

  1. [batch_size, num_anchors_i, code_size].
class_predictions_with_background: A list of float tensors of shape

[batch_size, num_anchors_i, num_classes + 1] representing the class predictions for the proposals. Each entry in the list corresponds to a feature map in the input image_features list.

mask_predictions: A float tensor of shape

[batch_size, 1, num_classes, image_height, image_width]

keypoint_predictions: A float tensor of shape

[batch_size, 1, num_keypoints, 2]

Raises:

ValueError – If length of image_features is not equal to length of num_predictions_per_location.

easy_vision.python.core.detection_predictors.rfcn_box_predictor

RFCN Box Predictor.

class easy_vision.python.core.detection_predictors.rfcn_box_predictor.RfcnBoxPredictor(is_training, num_classes, conv_hyperparams_fn, num_spatial_bins, depth, crop_size, box_code_size, agnostic=True)[source]

Bases: easy_vision.python.core.detection_predictors.predictor.BoxPredictor

RFCN Box Predictor.

Applies a position sensitive ROI pooling on position sensitive feature maps to predict classes and refined locations. See https://arxiv.org/abs/1605.06409 for details.

This is used for the second stage of the RFCN meta architecture. Notice that locations are not shared across classes, thus for each anchor, a separate prediction is made for each class.

__init__(is_training, num_classes, conv_hyperparams_fn, num_spatial_bins, depth, crop_size, box_code_size, agnostic=True)[source]

Constructor.

Parameters:
  • is_training – Indicates whether the BoxPredictor is in training mode.
  • num_classes – number of classes. Note that num_classes does not include the background category, so if groundtruth labels take values in {0, 1, .., K-1}, num_classes=K (and not K+1, even though the assigned classification targets can range from {0,… K}).
  • conv_hyperparams_fn – A function to construct tf-slim arg_scope with hyperparameters for convolutional layers.
  • num_spatial_bins – A list of two integers [spatial_bins_y, spatial_bins_x].
  • depth – Target depth to reduce the input feature maps to.
  • crop_size – A list of two integers [crop_height, crop_width].
  • box_code_size – Size of encoding for each box.
num_classes

easy_vision.python.core.detection_predictors.text_resnet_keypoint_predictor

class easy_vision.python.core.detection_predictors.text_resnet_keypoint_predictor.TextResnetKeypointPredictor(is_training, num_keypoints, keypoint_prediction_head, conv_hyperparams_fn=None, num_blocks_before_predictor=3, num_units_per_block=2, base_depth_before_predictor=64, se_rate=0)[source]

Bases: easy_vision.python.core.detection_predictors.predictor.KeypointPredictor

Mask R-CNN Box Predictor.

__init__(is_training, num_keypoints, keypoint_prediction_head, conv_hyperparams_fn=None, num_blocks_before_predictor=3, num_units_per_block=2, base_depth_before_predictor=64, se_rate=0)[source]

Constructor.

Parameters:
  • is_training – Indicates whether the KeypointPredictor is in training mode.
  • num_keypoints – number of keypoints.
  • keypoint_prediction_head – The head that predicts the keypoints.
  • conv_hyperparams_fn – A function to generate tf-slim arg_scope with hyperparameters for conv ops.
  • num_blocks_before_predictor – Number of the additional resnet block before the predictor.
  • num_units_per_block – number of unit of resnet block.
  • base_depth_before_predictor – the feature depth of first resnet block.
  • se_rate – the squeeze_and_excite rate, less and equal zero for disable.

easy_vision.python.core.detection_predictors.yolo_box_predictor

Yolo Box Predictors

class easy_vision.python.core.detection_predictors.yolo_box_predictor.YOLOBoxPredictor(is_training, num_classes, conv_hyperparams_fn, num_layers_before_predictor)[source]

Bases: easy_vision.python.core.detection_predictors.predictor.BoxPredictor

YOLO Box Predictor.

Optionally add an intermediate 3x3 convolutional layer after features and predict in parallel branches box_encodings and confidence_predictions and class_predictions.

Currently this box predictor assumes that predictions are “shared” across classes — that is each anchor makes box predictions which do not depend on class.

__init__(is_training, num_classes, conv_hyperparams_fn, num_layers_before_predictor)[source]

Constructor.

Parameters:
  • is_training – Indicates whether the BoxPredictor is in training mode.
  • num_classes – number of classes.
  • conv_hyperparams_fn – A function to generate tf-slim arg_scope with hyperparameters for convolution ops.
  • num_layers_before_predictor – Number of the additional conv layers before the predictor.
Raises:

ValueError – if min_depth > max_depth.

num_classes