easy_vision.python.core.ops

easy_vision.python.core.ops.attention_ops

class easy_vision.python.core.ops.attention_ops.LocationSensitiveBahdanauAttention(num_units, memory, memory_sequence_length=None, filters=16, kernel_size=7, name='LocationSensitiveBahdanauAttention')[source]

Bases: tensorflow.contrib.seq2seq.python.ops.attention_wrapper.BahdanauAttention

Implements Location Sensitive Attention from: Chorowski, Jan et al. ‘Attention-Based Models for Speech Recognition’ https://arxiv.org/abs/1506.07503

__init__(num_units, memory, memory_sequence_length=None, filters=16, kernel_size=7, name='LocationSensitiveBahdanauAttention')[source]

Construct the Attention mechanism. See superclass for argument details.

easy_vision.python.core.ops.attention_ops.create_attention_mechanism(attention_option, num_units, memory, source_sequence_length)[source]

Create attention mechanism based on the attention_option. :param attention_option: attention mechanism type. :param num_units: num units of hidden states. :param memory: the memory to query; usually the output of an RNN encoder.

This tensor should be shaped [batch_size, max_time, …]
Parameters:source_sequence_length – sequence lengths for the batch entries in memory.
Returns:A Subclass of AttentionMechanism.

easy_vision.python.core.ops.box_coder

Base box coder.

Box coders convert between coordinate frames, namely image-centric (with (0,0) on the top left of image) and anchor-centric (with (0,0) being defined by a specific anchor).

Users of a BoxCoder can call two methods:
encode: which encodes a box with respect to a given anchor
(or rather, a tensor of boxes wrt a corresponding tensor of anchors) and

decode: which inverts this encoding with a decode operation.

In both cases, the arguments are assumed to be in 1-1 correspondence already; it is not the job of a BoxCoder to perform matching.

class easy_vision.python.core.ops.box_coder.BoxCoder[source]

Bases: object

Abstract base class for box coder.

code_size

Return the size of each code.

This number is a constant and should agree with the output of the encode op (e.g. if rel_codes is the output of self.encode(…), then it should have shape [N, code_size()]). This abstractproperty should be overridden by implementations.

Returns:an integer constant
decode(rel_codes, anchors)[source]

Decode boxes that are encoded relative to an anchor collection.

Parameters:
  • rel_codes – a tensor representing N relative-encoded boxes
  • anchors – BoxList of anchors
Returns:

BoxList holding N boxes encoded in the ordinary way (i.e.,

with corners y_min, x_min, y_max, x_max)

Return type:

boxlist

encode(boxes, anchors)[source]

Encode a box list relative to an anchor collection.

Parameters:
  • boxes – BoxList holding N boxes to be encoded
  • anchors – BoxList of N anchors
Returns:

a tensor representing N relative-encoded boxes

easy_vision.python.core.ops.box_coder.batch_decode(encoded_boxes, box_coder, anchors)[source]

Decode a batch of encoded boxes.

This op takes a batch of encoded bounding boxes and transforms them to a batch of bounding boxes specified by their corners in the order of [y_min, x_min, y_max, x_max].

Parameters:
  • encoded_boxes – a float32 tensor of shape [batch_size, num_anchors, code_size] representing the location of the objects.
  • box_coder – a BoxCoder object.
  • anchors – a BoxList of anchors used to encode encoded_boxes.
Returns:

a float32 tensor of shape [batch_size, num_anchors,

coder_size] representing the corners of the objects in the order of [y_min, x_min, y_max, x_max].

Return type:

decoded_boxes

Raises:
  • ValueError – if batch sizes of the inputs are inconsistent, or if
  • the number of anchors inferred from encoded_boxes and anchors are
  • inconsistent.

easy_vision.python.core.ops.box_list

Bounding Box List definition.

BoxList represents a list of bounding boxes as tensorflow tensors, where each bounding box is represented as a row of 4 numbers, [y_min, x_min, y_max, x_max]. It is assumed that all bounding boxes within a given list correspond to a single image. See also box_list_ops.py for common box related operations (such as area, iou, etc).

Optionally, users can add additional related fields (such as weights). We assume the following things to be true about fields: * they correspond to boxes in the box_list along the 0th dimension * they have inferrable rank at graph construction time * all dimensions except for possibly the 0th can be inferred

(i.e., not None) at graph construction time.
Some other notes:
  • Following tensorflow conventions, we use height, width ordering,

and correspondingly, y,x (or ymin, xmin, ymax, xmax) ordering * Tensors are always provided as (flat) [N, 4] tensors.

class easy_vision.python.core.ops.box_list.BoxList(boxes)[source]

Bases: object

Box collection.

__init__(boxes)[source]

Constructs box collection.

Parameters:boxes – a tensor of shape [N, 4] representing box corners
Raises:ValueError – if invalid dimensions for bbox data or if bbox data is not in float32 format.
add_field(field, field_data)[source]

Add field to box list.

This method can be used to add related box data such as weights/labels, etc.

Parameters:
  • field – a string key to access the data via get
  • field_data – a tensor containing the data to store in the BoxList
as_tensor_dict(fields=None)[source]

Retrieves specified fields as a dictionary of tensors.

Parameters:fields – (optional) list of fields to return in the dictionary. If None (default), all fields are returned.
Returns:A dictionary of tensors specified by fields.
Return type:tensor_dict
Raises:ValueError – if specified field is not contained in boxlist.
get()[source]

Convenience function for accessing box coordinates.

Returns:a tensor with shape [N, 4] representing box coordinates.
get_all_fields()[source]

Returns all fields.

get_center_coordinates_and_sizes(scope=None)[source]

Computes the center coordinates, height and width of the boxes.

Parameters:scope – name scope of the function.
Returns:a list of 4 1-D tensors [ycenter, xcenter, height, width].
get_extra_fields()[source]

Returns all non-box fields (i.e., everything not named ‘boxes’).

get_field(field)[source]

Accesses a box collection and associated fields.

This function returns specified field with object; if no field is specified, it returns the box coordinates.

Parameters:field – this optional string parameter can be used to specify a related field to be accessed.
Returns:a tensor representing the box collection or an associated field.
Raises:ValueError – if invalid field
has_field(field)[source]
num_boxes()[source]

Returns number of boxes held in collection.

Returns:a tensor representing the number of boxes held in the collection.
num_boxes_static()[source]

Returns number of boxes held in collection.

This number is inferred at graph construction time rather than run-time.

Returns:
Number of boxes held in collection (integer) or None if this is not
inferrable at graph construction time.
set(boxes)[source]

Convenience function for setting box coordinates.

Parameters:boxes – a tensor of shape [N, 4] representing box corners
Raises:ValueError – if invalid dimensions for bbox data
set_field(field, value)[source]

Sets the value of a field.

Updates the field of a box_list with a given value.

Parameters:
  • field – (string) name of the field to set value.
  • value – the value to assign to the field.
Raises:

ValueError – if the box_list does not have specified field.

transpose_coordinates(scope=None)[source]

Transpose the coordinate representation in a boxlist.

Parameters:scope – name scope of the function.

easy_vision.python.core.ops.box_list_ops

Bounding Box List operations.

Example box operations that are supported:
  • areas: compute bounding box areas
  • iou: pairwise intersection-over-union scores
  • sq_dist: pairwise distances between bounding boxes

Whenever box_list_ops functions output a BoxList, the fields of the incoming BoxList are retained unless documented otherwise.

class easy_vision.python.core.ops.box_list_ops.SortOrder[source]

Bases: object

Enum class for sort order.

ascend

ascend order.

descend

descend order.

ascend = 1
descend = 2
easy_vision.python.core.ops.box_list_ops.area(boxlist, scope=None)[source]

Computes area of boxes.

Parameters:
  • boxlist – BoxList holding N boxes
  • scope – name scope.
Returns:

a tensor with shape [N] representing box areas.

easy_vision.python.core.ops.box_list_ops.boolean_mask(boxlist, indicator, fields=None, scope=None)[source]

Select boxes from BoxList according to indicator and return new BoxList.

boolean_mask returns the subset of boxes that are marked as “True” by the indicator tensor. By default, boolean_mask returns boxes corresponding to the input index list, as well as all additional fields stored in the boxlist (indexing into the first dimension). However one can optionally only draw from a subset of fields.

Parameters:
  • boxlist – BoxList holding N boxes
  • indicator – a rank-1 boolean tensor
  • fields – (optional) list of fields to also gather from. If None (default), all fields are gathered from. Pass an empty fields list to only gather the box coordinates.
  • scope – name scope.
Returns:

a BoxList corresponding to the subset of the input BoxList

specified by indicator

Return type:

subboxlist

Raises:

ValueError – if indicator is not a rank-1 boolean tensor.

easy_vision.python.core.ops.box_list_ops.box_voting(selected_boxes, pool_boxes, iou_thresh=0.5)[source]

Performs box voting as described in S. Gidaris and N. Komodakis, ICCV 2015.

Performs box voting as described in ‘Object detection via a multi-region & semantic segmentation-aware CNN model’, Gidaris and Komodakis, ICCV 2015. For each box ‘B’ in selected_boxes, we find the set ‘S’ of boxes in pool_boxes with iou overlap >= iou_thresh. The location of B is set to the weighted average location of boxes in S (scores are used for weighting). And the score of B is set to the average score of boxes in S.

Parameters:
  • selected_boxes – BoxList containing a subset of boxes in pool_boxes. These boxes are usually selected from pool_boxes using non max suppression.
  • pool_boxes – BoxList containing a set of (possibly redundant) boxes.
  • iou_thresh – (float scalar) iou threshold for matching boxes in selected_boxes and pool_boxes.
Returns:

BoxList containing averaged locations and scores for each box in selected_boxes.

Raises:

ValueError – if a) selected_boxes or pool_boxes is not a BoxList. b) if iou_thresh is not in [0, 1]. c) pool_boxes does not have a scores field.

easy_vision.python.core.ops.box_list_ops.change_coordinate_frame(boxlist, window, scope=None)[source]

Change coordinate frame of the boxlist to be relative to window’s frame.

Given a window of the form [ymin, xmin, ymax, xmax], changes bounding box coordinates from boxlist to be relative to this window (e.g., the min corner maps to (0,0) and the max corner maps to (1,1)).

An example use case is data augmentation: where we are given groundtruth boxes (boxlist) and would like to randomly crop the image to some window (window). In this case we need to change the coordinate frame of each groundtruth box to be relative to this new window.

Parameters:
  • boxlist – A BoxList object holding N boxes.
  • window – A rank 1 tensor [4].
  • scope – name scope.
Returns:

Returns a BoxList object with N boxes.

easy_vision.python.core.ops.box_list_ops.clip_to_window(boxlist, window, filter_nonoverlapping=True, scope=None)[source]

Clip bounding boxes to a window.

This op clips any input bounding boxes (represented by bounding box corners) to a window, optionally filtering out boxes that do not overlap at all with the window.

Parameters:
  • boxlist – BoxList holding M_in boxes
  • window – a tensor of shape [4] representing the [y_min, x_min, y_max, x_max] window to which the op should clip boxes.
  • filter_nonoverlapping – whether to filter out boxes that do not overlap at all with the window.
  • scope – name scope.
Returns:

a BoxList holding M_out boxes where M_out <= M_in

easy_vision.python.core.ops.box_list_ops.concatenate(boxlists, fields=None, scope=None)[source]

Concatenate list of BoxLists.

This op concatenates a list of input BoxLists into a larger BoxList. It also handles concatenation of BoxList fields as long as the field tensor shapes are equal except for the first dimension.

Parameters:
  • boxlists – list of BoxList objects
  • fields – optional list of fields to also concatenate. By default, all fields from the first BoxList in the list are included in the concatenation.
  • scope – name scope.
Returns:

a BoxList with number of boxes equal to

sum([boxlist.num_boxes() for boxlist in BoxList])

Raises:

ValueError – if boxlists is invalid (i.e., is not a list, is empty, or contains non BoxList objects), or if requested fields are not contained in all boxlists

easy_vision.python.core.ops.box_list_ops.filter_field_value_equals(boxlist, field, value, scope=None)[source]

Filter to keep only boxes with field entries equal to the given value.

Parameters:
  • boxlist – BoxList holding N boxes.
  • field – field name for filtering.
  • value – scalar value.
  • scope – name scope.
Returns:

a BoxList holding M boxes where M <= N

Raises:

ValueError – if boxlist not a BoxList object or if it does not have the specified field.

easy_vision.python.core.ops.box_list_ops.filter_greater_than(boxlist, thresh, scope=None)[source]

Filter to keep only boxes with score exceeding a given threshold.

This op keeps the collection of boxes whose corresponding scores are greater than the input threshold.

TODO(jonathanhuang): Change function name to filter_scores_greater_than

Parameters:
  • boxlist – BoxList holding N boxes. Must contain a ‘scores’ field representing detection scores.
  • thresh – scalar threshold
  • scope – name scope.
Returns:

a BoxList holding M boxes where M <= N

Raises:

ValueError – if boxlist not a BoxList object or if it does not have a scores field

easy_vision.python.core.ops.box_list_ops.gather(boxlist, indices, fields=None, scope=None)[source]

Gather boxes from BoxList according to indices and return new BoxList.

By default, gather returns boxes corresponding to the input index list, as well as all additional fields stored in the boxlist (indexing into the first dimension). However one can optionally only gather from a subset of fields.

Parameters:
  • boxlist – BoxList holding N boxes
  • indices – a rank-1 tensor of type int32 / int64
  • fields – (optional) list of fields to also gather from. If None (default), all fields are gathered from. Pass an empty fields list to only gather the box coordinates.
  • scope – name scope.
Returns:

a BoxList corresponding to the subset of the input BoxList specified by indices

Return type:

subboxlist

Raises:

ValueError – if specified field is not contained in boxlist or if the indices are not of type int32

easy_vision.python.core.ops.box_list_ops.get_minimal_coverage_box(boxlist, default_box=None, scope=None)[source]

Creates a single bounding box which covers all boxes in the boxlist.

Parameters:
  • boxlist – A Boxlist.
  • default_box – A [1, 4] float32 tensor. If no boxes are present in boxlist, this default box will be returned. If None, will use a default box of [[0., 0., 1., 1.]].
  • scope – Name scope.
Returns:

A [1, 4] float32 tensor with a bounding box that tightly covers all the boxes in the box list. If the boxlist does not contain any boxes, the default box is returned.

easy_vision.python.core.ops.box_list_ops.height_width(boxlist, scope=None)[source]

Computes height and width of boxes in boxlist.

Parameters:
  • boxlist – BoxList holding N boxes
  • scope – name scope.
Returns:

A tensor with shape [N] representing box heights. Width: A tensor with shape [N] representing box widths.

Return type:

Height

easy_vision.python.core.ops.box_list_ops.intersection(boxlist1, boxlist2, scope=None)[source]

Compute pairwise intersection areas between boxes.

Parameters:
  • boxlist1 – BoxList holding N boxes
  • boxlist2 – BoxList holding M boxes
  • scope – name scope.
Returns:

a tensor with shape [N, M] representing pairwise intersections

easy_vision.python.core.ops.box_list_ops.ioa(boxlist1, boxlist2, scope=None)[source]

Computes pairwise intersection-over-area between box collections.

intersection-over-area (IOA) between two boxes box1 and box2 is defined as their intersection area over box2’s area. Note that ioa is not symmetric, that is, ioa(box1, box2) != ioa(box2, box1).

Parameters:
  • boxlist1 – BoxList holding N boxes
  • boxlist2 – BoxList holding M boxes
  • scope – name scope.
Returns:

a tensor with shape [N, M] representing pairwise ioa scores.

easy_vision.python.core.ops.box_list_ops.iou(boxlist1, boxlist2, scope=None)[source]

Computes pairwise intersection-over-union between box collections.

Parameters:
  • boxlist1 – BoxList holding N boxes
  • boxlist2 – BoxList holding M boxes
  • scope – name scope.
Returns:

a tensor with shape [N, M] representing pairwise iou scores.

easy_vision.python.core.ops.box_list_ops.matched_intersection(boxlist1, boxlist2, scope=None)[source]

Compute intersection areas between corresponding boxes in two boxlists.

Parameters:
  • boxlist1 – BoxList holding N boxes
  • boxlist2 – BoxList holding N boxes
  • scope – name scope.
Returns:

a tensor with shape [N] representing pairwise intersections

easy_vision.python.core.ops.box_list_ops.matched_iou(boxlist1, boxlist2, scope=None)[source]

Compute intersection-over-union between corresponding boxes in boxlists.

Parameters:
  • boxlist1 – BoxList holding N boxes
  • boxlist2 – BoxList holding N boxes
  • scope – name scope.
Returns:

a tensor with shape [N] representing pairwise iou scores.

easy_vision.python.core.ops.box_list_ops.non_max_suppression(boxlist, thresh, max_output_size, scope=None)[source]

Non maximum suppression.

This op greedily selects a subset of detection bounding boxes, pruning away boxes that have high IOU (intersection over union) overlap (> thresh) with already selected boxes. Note that this only works for a single class — to apply NMS to multi-class predictions, use MultiClassNonMaxSuppression.

Parameters:
  • boxlist – BoxList holding N boxes. Must contain a ‘scores’ field representing detection scores.
  • thresh – scalar threshold
  • max_output_size – maximum number of retained boxes
  • scope – name scope.
Returns:

a BoxList holding M boxes where M <= max_output_size

Raises:

ValueError – if thresh is not in [0, 1]

easy_vision.python.core.ops.box_list_ops.pad_or_clip_box_list(boxlist, num_boxes, scope=None)[source]

Pads or clips all fields of a BoxList.

Parameters:
  • boxlist – A BoxList with arbitrary of number of boxes.
  • num_boxes – First num_boxes in boxlist are kept. The fields are zero-padded if num_boxes is bigger than the actual number of boxes.
  • scope – name scope.
Returns:

BoxList with all fields padded or clipped.

easy_vision.python.core.ops.box_list_ops.prune_completely_outside_window(boxlist, window, scope=None)[source]

Prunes bounding boxes that fall completely outside of the given window.

The function clip_to_window prunes bounding boxes that fall completely outside the window, but also clips any bounding boxes that partially overflow. This function does not clip partially overflowing boxes.

Parameters:
  • boxlist – a BoxList holding M_in boxes.
  • window – a float tensor of shape [4] representing [ymin, xmin, ymax, xmax] of the window
  • scope – name scope.
Returns:

a new BoxList with all bounding boxes partially or fully in

the window.

valid_indices: a tensor with shape [M_out] indexing the valid bounding boxes

in the input tensor.

Return type:

pruned_boxlist

easy_vision.python.core.ops.box_list_ops.prune_non_overlapping_boxes(boxlist1, boxlist2, min_overlap=0.0, scope=None)[source]

Prunes the boxes in boxlist1 that overlap less than thresh with boxlist2.

For each box in boxlist1, we want its IOA to be more than minoverlap with at least one of the boxes in boxlist2. If it does not, we remove it.

Parameters:
  • boxlist1 – BoxList holding N boxes.
  • boxlist2 – BoxList holding M boxes.
  • min_overlap – Minimum required overlap between boxes, to count them as overlapping.
  • scope – name scope.
Returns:

A pruned boxlist with size [N’, 4]. keep_inds: A tensor with shape [N’] indexing kept bounding boxes in the

first input BoxList boxlist1.

Return type:

new_boxlist1

easy_vision.python.core.ops.box_list_ops.prune_outside_window(boxlist, window, scope=None)[source]

Prunes bounding boxes that fall outside a given window.

This function prunes bounding boxes that even partially fall outside the given window. See also clip_to_window which only prunes bounding boxes that fall completely outside the window, and clips any bounding boxes that partially overflow.

Parameters:
  • boxlist – a BoxList holding M_in boxes.
  • window – a float tensor of shape [4] representing [ymin, xmin, ymax, xmax] of the window
  • scope – name scope.
Returns:

a tensor with shape [M_out, 4] where M_out <= M_in valid_indices: a tensor with shape [M_out] indexing the valid bounding boxes

in the input tensor.

Return type:

pruned_corners

easy_vision.python.core.ops.box_list_ops.prune_small_boxes(boxlist, min_side, scope=None)[source]

Prunes small boxes in the boxlist which have a side smaller than min_side.

Parameters:
  • boxlist – BoxList holding N boxes.
  • min_side – Minimum width AND height of box to survive pruning.
  • scope – name scope.
Returns:

A pruned boxlist.

easy_vision.python.core.ops.box_list_ops.refine_boxes(pool_boxes, nms_iou_thresh, nms_max_detections, voting_iou_thresh=0.5)[source]

Refines a pool of boxes using non max suppression and box voting.

Parameters:
  • pool_boxes – (BoxList) A collection of boxes to be refined. pool_boxes must have a rank 1 ‘scores’ field.
  • nms_iou_thresh – (float scalar) iou threshold for non max suppression (NMS).
  • nms_max_detections – (int scalar) maximum output size for NMS.
  • voting_iou_thresh – (float scalar) iou threshold for box voting.
Returns:

BoxList of refined boxes.

Raises:

ValueError – if a) nms_iou_thresh or voting_iou_thresh is not in [0, 1]. b) pool_boxes is not a BoxList. c) pool_boxes does not have a scores field.

easy_vision.python.core.ops.box_list_ops.refine_boxes_multi_class(pool_boxes, num_classes, nms_iou_thresh, nms_max_detections, voting_iou_thresh=0.5)[source]

Refines a pool of boxes using non max suppression and box voting.

Box refinement is done independently for each class.

Parameters:
  • pool_boxes – (BoxList) A collection of boxes to be refined. pool_boxes must have a rank 1 ‘scores’ field and a rank 1 ‘classes’ field.
  • num_classes – (int scalar) Number of classes.
  • nms_iou_thresh – (float scalar) iou threshold for non max suppression (NMS).
  • nms_max_detections – (int scalar) maximum output size for NMS.
  • voting_iou_thresh – (float scalar) iou threshold for box voting.
Returns:

BoxList of refined boxes.

Raises:

ValueError – if a) nms_iou_thresh or voting_iou_thresh is not in [0, 1]. b) pool_boxes is not a BoxList. c) pool_boxes does not have a scores and classes field.

easy_vision.python.core.ops.box_list_ops.scale(boxlist, y_scale, x_scale, scope=None)[source]

scale box coordinates in x and y dimensions.

Parameters:
  • boxlist – BoxList holding N boxes
  • y_scale – (float) scalar tensor
  • x_scale – (float) scalar tensor
  • scope – name scope.
Returns:

BoxList holding N boxes

Return type:

boxlist

easy_vision.python.core.ops.box_list_ops.select_random_box(boxlist, default_box=None, seed=None, scope=None)[source]

Selects a random bounding box from a BoxList.

Parameters:
  • boxlist – A BoxList.
  • default_box – A [1, 4] float32 tensor. If no boxes are present in boxlist, this default box will be returned. If None, will use a default box of [[-1., -1., -1., -1.]].
  • seed – Random seed.
  • scope – Name scope.
Returns:

A [1, 4] tensor with a random bounding box. valid: A bool tensor indicating whether a valid bounding box is returned

(True) or whether the default box is returned (False).

Return type:

bbox

easy_vision.python.core.ops.box_list_ops.sort_by_field(boxlist, field, order=2, scope=None)[source]

Sort boxes and associated fields according to a scalar field.

A common use case is reordering the boxes according to descending scores.

Parameters:
  • boxlist – BoxList holding N boxes.
  • field – A BoxList field for sorting and reordering the BoxList.
  • order – (Optional) descend or ascend. Default is descend.
  • scope – name scope.
Returns:

A sorted BoxList with the field in the specified order.

Return type:

sorted_boxlist

Raises:
  • ValueError – if specified field does not exist
  • ValueError – if the order is not either descend or ascend
easy_vision.python.core.ops.box_list_ops.sq_dist(boxlist1, boxlist2, scope=None)[source]

Computes the pairwise squared distances between box corners.

This op treats each box as if it were a point in a 4d Euclidean space and computes pairwise squared distances.

Mathematically, we are given two matrices of box coordinates X and Y, where X(i,:) is the i’th row of X, containing the 4 numbers defining the corners of the i’th box in boxlist1. Similarly Y(j,:) corresponds to boxlist2. We compute Z(i,j) = ||X(i,:) - Y(j,:)||^2

= ||X(i,:)||^2 + ||Y(j,:)||^2 - 2 X(i,:)’ * Y(j,:),
Parameters:
  • boxlist1 – BoxList holding N boxes
  • boxlist2 – BoxList holding M boxes
  • scope – name scope.
Returns:

a tensor with shape [N, M] representing pairwise distances

easy_vision.python.core.ops.box_list_ops.to_absolute_coordinates(boxlist, height, width, check_range=True, maximum_normalized_coordinate=1.01, scope=None)[source]

Converts normalized box coordinates to absolute pixel coordinates.

This function raises an assertion failed error when the maximum box coordinate value is larger than maximum_normalized_coordinate (in which case coordinates are already absolute).

Parameters:
  • boxlist – BoxList with coordinates in range [0, 1].
  • height – Maximum value for height of absolute box coordinates.
  • width – Maximum value for width of absolute box coordinates.
  • check_range – If True, checks if the coordinates are normalized or not.
  • maximum_normalized_coordinate – Maximum coordinate value to be considered as normalized, default to 1.01.
  • scope – name scope.
Returns:

boxlist with absolute coordinates in terms of the image size.

easy_vision.python.core.ops.box_list_ops.to_normalized_coordinates(boxlist, height, width, check_range=True, scope=None)[source]

Converts absolute box coordinates to normalized coordinates in [0, 1].

Usually one uses the dynamic shape of the image or conv-layer tensor:
boxlist = box_list_ops.to_normalized_coordinates(boxlist,
tf.shape(images)[1], tf.shape(images)[2]),

This function raises an assertion failed error at graph execution time when the maximum coordinate is smaller than 1.01 (which means that coordinates are already normalized). The value 1.01 is to deal with small rounding errors.

Parameters:
  • boxlist – BoxList with coordinates in terms of pixel-locations.
  • height – Maximum value for height of absolute box coordinates.
  • width – Maximum value for width of absolute box coordinates.
  • check_range – If True, checks if the coordinates are normalized or not.
  • scope – name scope.
Returns:

boxlist with normalized coordinates in [0, 1].

easy_vision.python.core.ops.box_list_ops.visualize_boxes_in_image(image, boxlist, normalized=False, scope=None)[source]

Overlay bounding box list on image.

Currently this visualization plots a 1 pixel thick red bounding box on top of the image. Note that tf.image.draw_bounding_boxes essentially is 1 indexed.

Parameters:
  • image – an image tensor with shape [height, width, 3]
  • boxlist – a BoxList
  • normalized – (boolean) specify whether corners are to be interpreted as absolute coordinates in image space or normalized with respect to the image size.
  • scope – name scope.
Returns:

an image tensor with shape [height, width, 3]

Return type:

image_and_boxes

easy_vision.python.core.ops.common_layers

easy_vision.python.core.ops.common_layers.conv2d_fixed_padding(inputs, filters, kernel_size, stride, rate=1, scope=None, **kwargs)[source]

Strided 2-D convolution with explicit fixed padding.

When stride > 1, then we do explicit zero-padding, followed by conv2d with ‘VALID’ padding.

Parameters:
  • inputs – A 4-D tensor of size [batch, height_in, width_in, channels].
  • filters – An integer, the number of output filters.
  • kernel_size – An int with the kernel_size of the filters.
  • stride – An integer, the output stride.
  • rate – An integer, rate for atrous convolution.
  • scope – Scope.
Returns:

A 4-D tensor of size [batch, height_out, width_out, channels] with

the convolution output.

Return type:

output

easy_vision.python.core.ops.common_layers.conv_spatial_attention_module(inputs, kernel_size=7, compress=True)[source]
Convolutional Spatial Attention Module of CBAM
CBAM: Convolutional Block Attention Module https://arxiv.org/pdf/1807.06521.pdf
Parameters:
  • inputs – input tensor with shape [batch_size, height, width, channels]
  • kernel_size – kernel size of attention convolution
  • compress – do channel-wise max-pooling and average-pooling or not
Returns:

Tensor with same shape as the input.

easy_vision.python.core.ops.common_layers.mish(inputs)[source]
Mish Activation Function.
mish(x) = x * tanh(softplus(x)) = x * tanh(ln(1 + e^{x}))
Parameters:inputs – Arbitrary input tensor.
Returns:Tensor with same shape as the input.
easy_vision.python.core.ops.common_layers.squeeze_and_excitation_2d(*args, **kwargs)[source]

squeeze and excitation block ref to Hu, J., Shen, L., & Sun, G. (2017). Squeeze-and-Excitation Networks. CoRR.

Parameters:
  • inputs – input tensor of size [batch_size, height, width, channels].
  • se_rate – squeeze-and-excitation reduce rate.
  • inputs_mask – input tensor valid mask of size [batch_size, height, width].
Returns:

output tensor with same shape as inputs

easy_vision.python.core.ops.common_ops

A module for helper tensorflow ops.

class easy_vision.python.core.ops.common_ops.EqualizationLossConfig(weight, exclude_prefixes)

Bases: tuple

exclude_prefixes

Alias for field number 1

weight

Alias for field number 0

easy_vision.python.core.ops.common_ops.batch_position_sensitive_crop_regions(images, boxes, crop_size, num_spatial_bins, global_pool, parallel_iterations=64)[source]

Position sensitive crop with batches of images and boxes.

This op is exactly like position_sensitive_crop_regions below but operates on batches of images and boxes. See position_sensitive_crop_regions function below for the operation applied per batch element.

Parameters:
  • images – A Tensor. Must be one of the following types: uint8, int8, int16, int32, int64, half, float32, float64. A 4-D tensor of shape [batch, image_height, image_width, depth]. Both image_height and image_width need to be positive.
  • boxes – A Tensor of type float32. A 3-D tensor of shape [batch, num_boxes, 4]. Each box is specified in normalized coordinates [y1, x1, y2, x2]. A normalized coordinate value of y is mapped to the image coordinate at y * (image_height - 1), so as the [0, 1] interval of normalized image height is mapped to `[0, image_height - 1] in image height coordinates. We do allow y1 > y2, in which case the sampled crop is an up-down flipped version of the original image. The width dimension is treated similarly.
  • crop_size – See position_sensitive_crop_regions below.
  • num_spatial_bins – See position_sensitive_crop_regions below.
  • global_pool – See position_sensitive_crop_regions below.
  • parallel_iterations – Number of batch items to process in parallel.

Returns:

easy_vision.python.core.ops.common_ops.dense_to_sparse_boxes(dense_locations, dense_num_boxes, num_classes)[source]

Converts bounding boxes from dense to sparse form.

Parameters:
  • dense_locations – a [max_num_boxes, 4] tensor in which only the first k rows are valid bounding box location coordinates, where k is the sum of elements in dense_num_boxes.
  • dense_num_boxes – a [max_num_classes] tensor indicating the counts of various bounding box classes e.g. [1, 0, 0, 2] means that the first bounding box is of class 0 and the second and third bounding boxes are of class 3. The sum of elements in this tensor is the number of valid bounding boxes.
  • num_classes – number of classes
Returns:

a [num_boxes, 4] tensor containing only valid bounding

boxes (i.e. the first num_boxes rows of dense_locations)

box_classes: a [num_boxes] tensor containing the classes of each bounding

box (e.g. dense_num_boxes = [1, 0, 0, 2] => box_classes = [0, 3, 3]

Return type:

box_locations

easy_vision.python.core.ops.common_ops.expanded_shape(orig_shape, start_dim, num_dims)[source]

Inserts multiple ones into a shape vector.

Inserts an all-1 vector of length num_dims at position start_dim into a shape. Can be combined with tf.reshape to generalize tf.expand_dims.

Parameters:
  • orig_shape – the shape into which the all-1 vector is added (int32 vector)
  • start_dim – insertion position (int scalar)
  • num_dims – length of the inserted all-1 vector (int scalar)
Returns:

An int32 vector of length tf.size(orig_shape) + num_dims.

easy_vision.python.core.ops.common_ops.filter_groundtruth_with_crowd_boxes(tensor_dict)[source]

Filters out groundtruth with boxes corresponding to crowd.

Parameters:tensor_dict – a dictionary of following groundtruth tensors - fields.InputDataFields.groundtruth_boxes fields.InputDataFields.groundtruth_classes fields.InputDataFields.groundtruth_keypoints fields.InputDataFields.groundtruth_instance_masks fields.InputDataFields.groundtruth_is_crowd fields.InputDataFields.groundtruth_area fields.InputDataFields.groundtruth_label_types
Returns:a dictionary of tensors containing only the groundtruth that have bounding boxes.
easy_vision.python.core.ops.common_ops.filter_groundtruth_with_nan_box_coordinates(tensor_dict)[source]

Filters out groundtruth with no bounding boxes.

Parameters:tensor_dict – a dictionary of following groundtruth tensors - fields.InputDataFields.groundtruth_boxes fields.InputDataFields.groundtruth_classes fields.InputDataFields.groundtruth_keypoints fields.InputDataFields.groundtruth_instance_masks fields.InputDataFields.groundtruth_is_crowd fields.InputDataFields.groundtruth_area fields.InputDataFields.groundtruth_label_types
Returns:a dictionary of tensors containing only the groundtruth that have bounding boxes.
easy_vision.python.core.ops.common_ops.fixed_padding(inputs, kernel_size, rate=1)[source]

Pads the input along the spatial dimensions independently of input size.

Parameters:
  • inputs – A tensor of size [batch, height_in, width_in, channels].
  • kernel_size – The kernel to be used in the conv2d or max_pool2d operation. Should be a positive integer.
  • rate – An integer, rate for atrous convolution.
Returns:

A tensor of size [batch, height_out, width_out, channels] with the

input, either intact (if kernel_size == 1) or padded (if kernel_size > 1).

Return type:

output

easy_vision.python.core.ops.common_ops.get_pooling_rate(from_feature, to_feature)[source]

Get pooling rate from from_feature to to_feature

easy_vision.python.core.ops.common_ops.image_coordinates_to_normalized(absolute_boxes, image_shape, parallel_iterations=32)[source]

Converts a batch of boxes from image coordinates to normal.

Parameters:
  • absolute_boxes – a float32 tensor of shape [None, num_boxes, 4] containing the boxes in image coordinates.
  • image_shape – a float32 tensor of shape [4] containing the image shape.
  • parallel_iterations – parallelism for the map_fn op.
Returns:

a float32 tensor of shape [None, num_boxes, 4] in

normalized coordinates.

Return type:

normalized_boxes

easy_vision.python.core.ops.common_ops.image_mask(valid_shape, max_shape=None, dtype=tf.bool, name=None)[source]

Returns a image mask tensor. :param valid_shape: integer tensor of shape (batch_size, 2 or 3), image valid shape :param max_shape: integer tensor of shape (2 or 3), size of y x dimension of returned tensor. :param dtype: output type of the resulting tensor. :param name: name of the op.

Returns:A mask tensor of shape (batch_size, max_shape[0], max_shape[1]), cast to specified dtype.
easy_vision.python.core.ops.common_ops.indices_to_dense_vector(indices, size, indices_value=1.0, default_value=0, dtype=tf.float32)[source]

Creates dense vector with indices set to specific value and rest to zeros.

This function exists because it is unclear if it is safe to use
tf.sparse_to_dense(indices, [size], 1, validate_indices=False)

with indices which are not ordered. This function accepts a dynamic size (e.g. tf.shape(tensor)[0])

Parameters:
  • indices – 1d Tensor with integer indices which are to be set to indices_values.
  • size – scalar with size (integer) of output Tensor.
  • indices_value – values of elements specified by indices in the output vector
  • default_value – values of other elements in the output vector.
  • dtype – data type.
Returns:

dense 1D Tensor of shape [size] with indices set to indices_values and the

rest set to default_value.

easy_vision.python.core.ops.common_ops.matmul_crop_and_resize(image, boxes, crop_size, scope=None)[source]

Matrix multiplication based implementation of the crop and resize op.

Extracts crops from the input image tensor and bilinearly resizes them (possibly with aspect ratio change) to a common output size specified by crop_size. This is more general than the crop_to_bounding_box op which extracts a fixed size slice from the input image and does not allow resizing or aspect ratio change.

Returns a tensor with crops from the input image at positions defined at the bounding box locations in boxes. The cropped boxes are all resized (with bilinear interpolation) to a fixed size = [crop_height, crop_width]. The result is a 5-D tensor [batch, num_boxes, crop_height, crop_width, depth].

Running time complexity:
O((# channels) * (# boxes) * (crop_size)^2 * M), where M is the number

of pixels of the longer edge of the image.

Note that this operation is meant to replicate the behavior of the standard tf.image.crop_and_resize operation but there are a few differences. Specifically:

  1. The extrapolation value (the values that are interpolated from outside
the bounds of the image window) is always zero
  1. Only XLA supported operations are used (e.g., matrix multiplication).
  2. There is no box_indices argument — to run this op on multiple images,
one must currently call this op independently on each image.
  1. All shapes and the crop_size parameter are assumed to be statically
defined. Moreover, the number of boxes must be strictly nonzero.
Parameters:
  • image – A Tensor. Must be one of the following types: uint8, int8, int16, int32, int64, half, ‘bfloat16’, float32, float64. A 4-D tensor of shape [batch, image_height, image_width, depth]. Both image_height and image_width need to be positive.
  • boxes – A Tensor of type float32 or ‘bfloat16’. A 3-D tensor of shape [batch, num_boxes, 4]. The boxes are specified in normalized coordinates and are of the form [y1, x1, y2, x2]. A normalized coordinate value of y is mapped to the image coordinate at y * (image_height - 1), so as the [0, 1] interval of normalized image height is mapped to [0, image_height - 1] in image height coordinates. We do allow y1 > y2, in which case the sampled crop is an up-down flipped version of the original image. The width dimension is treated similarly. Normalized coordinates outside the `[0, 1] range are allowed, in which case we use extrapolation_value to extrapolate the input image values.
  • crop_size – A list of two integers [crop_height, crop_width]. All cropped image patches are resized to this size. The aspect ratio of the image content is not preserved. Both crop_height and crop_width need to be positive.
  • scope – A name for the operation (optional).
Returns:

A 5-D tensor of shape [batch, num_boxes, crop_height, crop_width, depth]

Raises:
  • ValueError – if image tensor does not have shape [batch, image_height, image_width, depth] and all dimensions statically defined.
  • ValueError – if boxes tensor does not have shape [batch, num_boxes, 4] where num_boxes > 0.
  • ValueError – if crop_size is not a list of two positive integers
easy_vision.python.core.ops.common_ops.matmul_gather_on_zeroth_axis(params, indices, scope=None)[source]

Matrix multiplication based implementation of tf.gather on zeroth axis.

TODO(rathodv, jonathanhuang): enable sparse matmul option.

Parameters:
  • params – A float32 Tensor. The tensor from which to gather values. Must be at least rank 1.
  • indices – A Tensor. Must be one of the following types: int32, int64. Must be in range [0, params.shape[0])
  • scope – A name for the operation (optional).
Returns:

A Tensor. Has the same type as params. Values from params gathered from indices given by indices, with shape indices.shape + params.shape[1:].

easy_vision.python.core.ops.common_ops.merge_boxes_with_multiple_labels(boxes, classes, confidences, num_classes, quantization_bins=10000)[source]

Merges boxes with same coordinates and returns K-hot encoded classes.

Parameters:
  • boxes – A tf.float32 tensor with shape [N, 4] holding N boxes. Only normalized coordinates are allowed.
  • classes – A tf.int32 tensor with shape [N] holding class indices. The class index starts at 0.
  • confidences – A tf.float32 tensor with shape [N] holding class confidences.
  • num_classes – total number of classes to use for K-hot encoding.
  • quantization_bins – the number of bins used to quantize the box coordinate.
Returns:

A tf.float32 tensor with shape [N’, 4] holding boxes,

where N’ <= N.

class_encodings: A tf.int32 tensor with shape [N’, num_classes] holding

K-hot encodings for the merged boxes.

confidence_encodings: A tf.float32 tensor with shape [N’, num_classes]

holding encodings of confidences for the merged boxes.

merged_box_indices: A tf.int32 tensor with shape [N’] holding original

indices of the boxes.

Return type:

merged_boxes

easy_vision.python.core.ops.common_ops.meshgrid(x, y)[source]

Tiles the contents of x and y into a pair of grids.

Multidimensional analog of numpy.meshgrid, giving the same behavior if x and y are vectors. Generally, this will give:

xgrid(i1, …, i_m, j_1, …, j_n) = x(j_1, …, j_n) ygrid(i1, …, i_m, j_1, …, j_n) = y(i_1, …, i_m)

Keep in mind that the order of the arguments and outputs is reverse relative to the order of the indices they go into, done for compatibility with numpy. The output tensors have the same shapes. Specifically:

xgrid.get_shape() = y.get_shape().concatenate(x.get_shape()) ygrid.get_shape() = y.get_shape().concatenate(x.get_shape())

Parameters:
  • x – A tensor of arbitrary shape and rank. xgrid will contain these values varying in its last dimensions.
  • y – A tensor of arbitrary shape and rank. ygrid will contain these values varying in its first dimensions.
Returns:

A tuple of tensors (xgrid, ygrid).

easy_vision.python.core.ops.common_ops.native_crop_and_resize(image, boxes, crop_size, scope=None)[source]

Same as matmul_crop_and_resize but uses tf.image.crop_and_resize.

easy_vision.python.core.ops.common_ops.nearest_neighbor_upsampling(input_tensor, scale=None, height_scale=None, width_scale=None)[source]

Nearest neighbor upsampling implementation.

Nearest neighbor upsampling function that maps input tensor with shape [batch_size, height, width, channels] to [batch_size, height * scale , width * scale, channels]. This implementation only uses reshape and broadcasting to make it TPU compatible.

Parameters:
  • input_tensor – A float32 tensor of size [batch, height_in, width_in, channels].
  • scale – An integer multiple to scale resolution of input data in both height and width dimensions.
  • height_scale – An integer multiple to scale the height of input image. This option when provided overrides scale option.
  • width_scale – An integer multiple to scale the width of input image. This option when provided overrides scale option.
Returns:

A float32 tensor of size

[batch, height_in*scale, width_in*scale, channels].

Return type:

data_up

Raises:

ValueError – If both scale and height_scale or if both scale and width_scale are None.

easy_vision.python.core.ops.common_ops.normalize_to_target(inputs, target_norm_value, dim, epsilon=1e-07, trainable=True, scope='NormalizeToTarget', summarize=True)[source]

L2 normalizes the inputs across the specified dimension to a target norm.

This op implements the L2 Normalization layer introduced in Liu, Wei, et al. “SSD: Single Shot MultiBox Detector.” and Liu, Wei, Andrew Rabinovich, and Alexander C. Berg. “Parsenet: Looking wider to see better.” and is useful for bringing activations from multiple layers in a convnet to a standard scale.

Note that the rank of inputs must be known and the dimension to which normalization is to be applied should be statically defined.

TODO(jonathanhuang): Add option to scale by L2 norm of the entire input.

Parameters:
  • inputs – A Tensor of arbitrary size.
  • target_norm_value – A float value that specifies an initial target norm or a list of floats (whose length must be equal to the depth along the dimension to be normalized) specifying a per-dimension multiplier after normalization.
  • dim – The dimension along which the input is normalized.
  • epsilon – A small value to add to the inputs to avoid dividing by zero.
  • trainable – Whether the norm is trainable or not
  • scope – Optional scope for variable_scope.
  • summarize – Whether or not to add a tensorflow summary for the op.
Returns:

The input tensor normalized to the specified target norm.

Raises:
  • ValueError – If dim is smaller than the number of dimensions in ‘inputs’.
  • ValueError – If target_norm_value is not a float or a list of floats with length equal to the depth along the dimension to be normalized.
easy_vision.python.core.ops.common_ops.normalized_to_image_coordinates(normalized_boxes, image_shape, parallel_iterations=32)[source]

Converts a batch of boxes from normal to image coordinates.

Parameters:
  • normalized_boxes – a float32 tensor of shape [None, num_boxes, 4] in normalized coordinates.
  • image_shape – a float32 tensor of shape [4] containing the image shape.
  • parallel_iterations – parallelism for the map_fn op.
Returns:

a float32 tensor of shape [None, num_boxes, 4] containing

the boxes in image coordinates.

Return type:

absolute_boxes

easy_vision.python.core.ops.common_ops.pad_to_multiple(tensor, multiple)[source]

Returns the tensor zero padded to the specified multiple.

Appends 0s to the end of the first and second dimension (height and width) of the tensor until both dimensions are a multiple of the input argument ‘multiple’. E.g. given an input tensor of shape [1, 3, 5, 1] and an input multiple of 4, PadToMultiple will append 0s so that the resulting tensor will be of shape [1, 4, 8, 1].

Parameters:
  • tensor – rank 4 float32 tensor, where tensor -> [batch_size, height, width, channels].
  • multiple – the multiple to pad to.
Returns:

the tensor zero padded to the specified multiple.

Return type:

padded_tensor

easy_vision.python.core.ops.common_ops.padded_map_fn(fn, elems, dtype=None, parallel_iterations=None, back_prop=True, swap_memory=False, infer_shape=True, force_float=False, name=None)[source]

map on the list of tensors unpacked from elems on dimension 0 with padding. The simplest version of map_fn repeatedly applies the callable fn to a sequence of elements from first to last. The elements are made of the tensors unpacked from elems. dtype is the data type of the return value of fn. Users must provide dtype if it is different from the data type of elems. Suppose that elems is unpacked into values, a list of tensors. The shape of the result tensor is [values.shape[0]] + fn(values[0]).shape. This method also allows multi-arity elems and output of fn. If elems is a (possibly nested) list or tuple of tensors, then each of these tensors must have a matching first (unpack) dimension. The signature of fn may match the structure of elems. That is, if elems is (t1, [t2, t3, [t4, t5]]), then an appropriate signature for fn is: fn = lambda (t1, [t2, t3, [t4, t5]]):. Furthermore, fn may emit a different structure than its input. For example, fn may look like: fn = lambda t1: return (t1 + 1, t1 - 1). In this case, the dtype parameter is not optional: dtype must be a type or (possibly nested) tuple of types matching the output of fn. To apply a functional operation to the nonzero elements of a SparseTensor one of the following methods is recommended. First, if the function is expressible as TensorFlow ops, use ```python

result = SparseTensor(input.indices, fn(input.values), input.dense_shape)

``` If, however, the function is not expressible as a TensorFlow op, then use ```python result = SparseTensor(

input.indices, map_fn(fn, input.values), input.dense_shape)

``` instead. When executing eagerly, map_fn does not execute in parallel even if parallel_iterations is set to a value > 1. You can still get the performance benefits of running a function in parallel by using the tf.contrib.eager.defun decorator, ```python # Assume the function being used in map_fn is fn. # To ensure map_fn calls fn in parallel, use the defun decorator. @tf.contrib.eager.defun def func(tensor):

return tf.map_fn(fn, tensor)

``` Note that if you use the defun decorator, any non-TensorFlow Python code that you may have written in your function won’t get executed. See tf.contrib.eager.defun for more details. The recommendation would be to debug without defun but switch to defun to get performance benefits of running map_fn in parallel. :param fn: The callable to be performed. It accepts one argument, which will

have the same (possibly nested) structure as elems. Its output must have the same structure as dtype if one is provided, otherwise it must have the same structure as elems.
Parameters:
  • elems – A tensor or (possibly nested) sequence of tensors, each of which will be unpacked along their first dimension. The nested sequence of the resulting slices will be applied to fn.
  • dtype – (optional) The output type(s) of fn. If fn returns a structure of Tensors differing from the structure of elems, then dtype is not optional and must have the same structure as the output of fn.
  • parallel_iterations – (optional) The number of iterations allowed to run in parallel. When graph building, the default value is 10. While executing eagerly, the default value is set to 1.
  • back_prop – (optional) True enables support for back propagation.
  • swap_memory – (optional) True enables GPU-CPU memory swapping.
  • infer_shape – (optional) False disables tests for consistent output shapes.
  • force_float – (optional) Force use float TensorArrays to compute with gpu, TensorArray only register on float and double dtype on gpu device.
  • name – (optional) Name prefix for the returned tensors.
Returns:

A tensor or (possibly nested) sequence of tensors. Each tensor packs the results of applying fn to tensors unpacked from elems along the first dimension, from first to last.

Raises:
  • TypeError – if fn is not callable or the structure of the output of fn and dtype do not match, or if elems is a SparseTensor.
  • ValueError – if the lengths of the output of fn and dtype do not match.

Examples

`python elems = np.array([1, 2, 3, 4, 5, 6]) squares = map_fn(lambda x: x * x, elems) # squares == [1, 4, 9, 16, 25, 36] ` `python elems = (np.array([1, 2, 3]), np.array([-1, 1, -1])) alternate = map_fn(lambda x: x[0] * x[1], elems, dtype=tf.int64) # alternate == [-1, 2, -3] ` `python elems = np.array([1, 2, 3]) alternates = map_fn(lambda x: (x, -x), elems, dtype=(tf.int64, tf.int64)) # alternates[0] == [1, 2, 3] # alternates[1] == [-1, -2, -3] `

easy_vision.python.core.ops.common_ops.padded_one_hot_encoding(indices, depth, left_pad)[source]

Returns a zero padded one-hot tensor.

This function converts a sparse representation of indices (e.g., [4]) to a zero padded one-hot representation (e.g., [0, 0, 0, 0, 1] with depth = 4 and left_pad = 1). If indices is empty, the result will simply be a tensor of shape (0, depth + left_pad). If depth = 0, then this function just returns None.

Parameters:
  • indices – an integer tensor of shape [num_indices].
  • depth – depth for the one-hot tensor (integer).
  • left_pad – number of zeros to left pad the one-hot tensor with (integer).
Returns:

a tensor with shape (num_indices, depth + left_pad). Returns

None if the depth is zero.

Return type:

padded_onehot

Raises:

ValueError – if indices does not have rank 1 or if left_pad or `depth are either negative or non-integers.

TODO(rathodv): add runtime checks for depth and indices.

easy_vision.python.core.ops.common_ops.position_sensitive_crop_regions(image, boxes, crop_size, num_spatial_bins, global_pool)[source]

Position-sensitive crop and pool rectangular regions from a feature grid.

The output crops are split into spatial_bins_y vertical bins and spatial_bins_x horizontal bins. For each intersection of a vertical and a horizontal bin the output values are gathered by performing tf.image.crop_and_resize (bilinear resampling) on a a separate subset of channels of the image. This reduces depth by a factor of (spatial_bins_y * spatial_bins_x).

When global_pool is True, this function implements a differentiable version of position-sensitive RoI pooling used in [R-FCN detection system](https://arxiv.org/abs/1605.06409).

When global_pool is False, this function implements a differentiable version of position-sensitive assembling operation used in [instance FCN](https://arxiv.org/abs/1603.08678).

Parameters:
  • image – A Tensor. Must be one of the following types: uint8, int8, int16, int32, int64, half, float32, float64. A 3-D tensor of shape [image_height, image_width, depth]. Both image_height and image_width need to be positive.
  • boxes – A Tensor of type float32. A 2-D tensor of shape [num_boxes, 4]. Each box is specified in normalized coordinates [y1, x1, y2, x2]. A normalized coordinate value of y is mapped to the image coordinate at y * (image_height - 1), so as the [0, 1] interval of normalized image height is mapped to `[0, image_height - 1] in image height coordinates. We do allow y1 > y2, in which case the sampled crop is an up-down flipped version of the original image. The width dimension is treated similarly.
  • crop_size – A list of two integers [crop_height, crop_width]. All cropped image patches are resized to this size. The aspect ratio of the image content is not preserved. Both crop_height and crop_width need to be positive.
  • num_spatial_bins – A list of two integers [spatial_bins_y, spatial_bins_x]. Represents the number of position-sensitive bins in y and x directions. Both values should be >= 1. crop_height should be divisible by spatial_bins_y, and similarly for width. The number of image channels should be divisible by (spatial_bins_y * spatial_bins_x). Suggested value from R-FCN paper: [3, 3].
  • global_pool

    A boolean variable. If True, we perform average global pooling on the features assembled from

    the position-sensitive score maps.
    If False, we keep the position-pooled features without global pooling
    over the spatial coordinates.
    Note that using global_pool=True is equivalent to but more efficient than
    running the function with global_pool=False and then performing global average pooling.
Returns:

A 4-D tensor of shape

[num_boxes, K, K, crop_channels], where crop_channels = depth / (spatial_bins_y * spatial_bins_x), where K = 1 when global_pool is True (Average-pooled cropped regions), and K = crop_size when global_pool is False.

Return type:

position_sensitive_features

Raises:

ValueError – Raised in four situations: num_spatial_bins is not >= 1; num_spatial_bins does not divide crop_size; (spatial_bins_y*spatial_bins_x) does not divide depth; bin_crop_size is not square when global_pool=False due to the

constraint in function space_to_depth.

easy_vision.python.core.ops.common_ops.reduce_sum_trailing_dimensions(tensor, ndims)[source]

Computes sum across all dimensions following first ndims dimensions.

easy_vision.python.core.ops.common_ops.reframe_box_masks_to_image_masks(box_masks, boxes, image_height, image_width)[source]

Transforms the box masks back to full image masks.

Embeds masks in bounding boxes of larger masks whose shapes correspond to image shape.

Parameters:
  • box_masks – A tf.float32 tensor of size [num_masks, mask_height, mask_width].
  • boxes – A tf.float32 tensor of size [num_masks, 4] containing the box corners. Row i contains [ymin, xmin, ymax, xmax] of the box corresponding to mask i. Note that the box corners are in normalized coordinates.
  • image_height – Image height. The output mask will have the same height as the image height.
  • image_width – Image width. The output mask will have the same width as the image width.
Returns:

A tf.float32 tensor of size [num_masks, image_height, image_width].

easy_vision.python.core.ops.common_ops.replace_nan_groundtruth_label_scores_with_ones(label_scores)[source]

Replaces nan label scores with 1.0.

Parameters:label_scores – a tensor containing object annoation label scores.
Returns:a tensor where NaN label scores have been replaced by ones.
easy_vision.python.core.ops.common_ops.retain_groundtruth(tensor_dict, valid_indices)[source]

Retains groundtruth by valid indices.

Parameters:
  • tensor_dict – a dictionary of following groundtruth tensors - fields.InputDataFields.groundtruth_boxes fields.InputDataFields.groundtruth_classes fields.InputDataFields.groundtruth_keypoints fields.InputDataFields.groundtruth_instance_masks fields.InputDataFields.groundtruth_is_crowd fields.InputDataFields.groundtruth_area fields.InputDataFields.groundtruth_label_types fields.InputDataFields.groundtruth_difficult
  • valid_indices – a tensor with valid indices for the box-level groundtruth.
Returns:

a dictionary of tensors containing only the groundtruth for valid_indices.

Raises:
  • ValueError – If the shape of valid_indices is invalid.
  • ValueError – field fields.InputDataFields.groundtruth_boxes is not present in tensor_dict.
easy_vision.python.core.ops.common_ops.retain_groundtruth_with_positive_classes(tensor_dict)[source]

Retains only groundtruth with positive class ids.

Parameters:tensor_dict – a dictionary of following groundtruth tensors - fields.InputDataFields.groundtruth_boxes fields.InputDataFields.groundtruth_classes fields.InputDataFields.groundtruth_keypoints fields.InputDataFields.groundtruth_instance_masks fields.InputDataFields.groundtruth_is_crowd fields.InputDataFields.groundtruth_area fields.InputDataFields.groundtruth_label_types fields.InputDataFields.groundtruth_difficult
Returns:a dictionary of tensors containing only the groundtruth with positive classes.
Raises:ValueError – If groundtruth_classes tensor is not in tensor_dict.
easy_vision.python.core.ops.common_ops.safe_div(numerator, denominator, name='safe_div')[source]

Divides two tensors element-wise, returning 0 if the denominator is <= 0.

Parameters:
  • numerator – A real Tensor.
  • denominator – A real Tensor, with dtype matching numerator.
  • name – Name for the returned op.
Returns:

0 if denominator <= 0, else numerator / denominator

easy_vision.python.core.ops.common_ops.static_or_dynamic_map_fn(fn, elems, dtype=None, parallel_iterations=32, back_prop=True)[source]

Runs map_fn as a (static) for loop when possible.

This function rewrites the map_fn as an explicit unstack input -> for loop over function calls -> stack result combination. This allows our graphs to be acyclic when the batch size is static. For comparison, see https://www.tensorflow.org/api_docs/python/tf/map_fn.

Note that static_or_dynamic_map_fn currently is not fully interchangeable with the default tf.map_fn function as it does not accept nested inputs (only Tensors or lists of Tensors). Likewise, the output of fn can only be a Tensor or list of Tensors.

TODO(jonathanhuang): make this function fully interchangeable with tf.map_fn.

Parameters:
  • fn – The callable to be performed. It accepts one argument, which will have the same structure as elems. Its output must have the same structure as elems.
  • elems – A tensor or list of tensors, each of which will be unpacked along their first dimension. The sequence of the resulting slices will be applied to fn.
  • dtype – (optional) The output type(s) of fn. If fn returns a structure of Tensors differing from the structure of elems, then dtype is not optional and must have the same structure as the output of fn.
  • parallel_iterations – (optional) number of batch items to process in parallel. This flag is only used if the native tf.map_fn is used and defaults to 32 instead of 10 (unlike the standard tf.map_fn default).
  • back_prop – (optional) True enables support for back propagation. This flag is only used if the native tf.map_fn is used.
Returns:

A tensor or sequence of tensors. Each tensor packs the results of applying fn to tensors unpacked from elems along the first dimension, from first to last.

Raises:
  • ValueError – if elems a Tensor or a list of Tensors.
  • ValueError – if fn does not return a Tensor or list of Tensors
easy_vision.python.core.ops.common_ops.time_segments_coordinates_to_normalized(absolute_boxes, clip_shape, parallel_iterations=32)[source]

Converts a batch of boxes from image coordinates to normal.

Parameters:
  • absolute_boxes – a float32 tensor of shape [None, num_boxes, 4] containing the boxes in image coordinates.
  • image_shape – a float32 tensor of shape [5] containing the clip shape.
  • parallel_iterations – parallelism for the map_fn op.
Returns:

a float32 tensor of shape [None, num_boxes, 4] in

normalized coordinates.

Return type:

normalized_boxes

easy_vision.python.core.ops.embedding_layer

Implementation of embedding layer with shared weights.

class easy_vision.python.core.ops.embedding_layer.EmbeddingSharedWeights(vocab_size, hidden_size, init_var=None, embed_scale=True, regularizer=None)[source]

Bases: tensorflow.python.layers.base.Layer

Calculates input embeddings and pre-softmax linear with shared weights.

__init__(vocab_size, hidden_size, init_var=None, embed_scale=True, regularizer=None)[source]
build(_)[source]

Creates the variables of the layer.

call(x)[source]

Get token embeddings of x.

Parameters:x – An int64 tensor with shape [batch_size, length]
Returns:float32 tensor with shape [batch_size, length, embedding_size] padding: float32 tensor with shape [batch_size, length] indicating the
locations of the padding tokens in x.
Return type:embeddings
linear(x)[source]

Computes logits by running x through a linear layer.

Parameters:x – A float32 tensor with shape [batch_size, length, hidden_size]
Returns:float32 tensor with shape [batch_size, length, vocab_size].

easy_vision.python.core.ops.keypoint_ops

Keypoint operations.

Keypoints are represented as tensors of shape [num_instances, num_keypoints, 2], where the last dimension holds rank 2 tensors of the form [y, x] representing the coordinates of the keypoint.

easy_vision.python.core.ops.keypoint_ops.change_coordinate_frame(keypoints, window, scope=None)[source]

Changes coordinate frame of the keypoints to be relative to window’s frame.

Given a window of the form [y_min, x_min, y_max, x_max], changes keypoint coordinates from keypoints of shape [num_instances, num_keypoints, 2] to be relative to this window.

An example use case is data augmentation: where we are given groundtruth keypoints and would like to randomly crop the image to some window. In this case we need to change the coordinate frame of each groundtruth keypoint to be relative to this new window.

Parameters:
  • keypoints – a tensor of shape [num_instances, num_keypoints, 2]
  • window – a tensor of shape [4] representing the [y_min, x_min, y_max, x_max] window we should change the coordinate frame to.
  • scope – name scope.
Returns:

a tensor of shape [num_instances, num_keypoints, 2]

Return type:

new_keypoints

easy_vision.python.core.ops.keypoint_ops.clip_to_window(keypoints, window, scope=None)[source]

Clips keypoints to a window.

This op clips any input keypoints to a window.

Parameters:
  • keypoints – a tensor of shape [num_instances, num_keypoints, 2]
  • window – a tensor of shape [4] representing the [y_min, x_min, y_max, x_max] window to which the op should clip the keypoints.
  • scope – name scope.
Returns:

a tensor of shape [num_instances, num_keypoints, 2]

Return type:

new_keypoints

easy_vision.python.core.ops.keypoint_ops.flip_horizontal(keypoints, flip_point, flip_permutation, scope=None)[source]

Flips the keypoints horizontally around the flip_point.

This operation flips the x coordinate for each keypoint around the flip_point and also permutes the keypoints in a manner specified by flip_permutation.

Parameters:
  • keypoints – a tensor of shape [num_instances, num_keypoints, 2]
  • flip_point – (float) scalar tensor representing the x coordinate to flip the keypoints around.
  • flip_permutation – rank 1 int32 tensor containing the keypoint flip permutation. This specifies the mapping from original keypoint indices to the flipped keypoint indices. This is used primarily for keypoints that are not reflection invariant. E.g. Suppose there are 3 keypoints representing [‘head’, ‘right_eye’, ‘left_eye’], then a logical choice for flip_permutation might be [0, 2, 1] since we want to swap the ‘left_eye’ and ‘right_eye’ after a horizontal flip.
  • scope – name scope.
Returns:

a tensor of shape [num_instances, num_keypoints, 2]

Return type:

new_keypoints

easy_vision.python.core.ops.keypoint_ops.flip_vertical(keypoints, flip_point, flip_permutation, scope=None)[source]

Flips the keypoints vertically around the flip_point.

This operation flips the y coordinate for each keypoint around the flip_point and also permutes the keypoints in a manner specified by flip_permutation.

Parameters:
  • keypoints – a tensor of shape [num_instances, num_keypoints, 2]
  • flip_point – (float) scalar tensor representing the y coordinate to flip the keypoints around.
  • flip_permutation – rank 1 int32 tensor containing the keypoint flip permutation. This specifies the mapping from original keypoint indices to the flipped keypoint indices. This is used primarily for keypoints that are not reflection invariant. E.g. Suppose there are 3 keypoints representing [‘head’, ‘right_eye’, ‘left_eye’], then a logical choice for flip_permutation might be [0, 2, 1] since we want to swap the ‘left_eye’ and ‘right_eye’ after a horizontal flip.
  • scope – name scope.
Returns:

a tensor of shape [num_instances, num_keypoints, 2]

Return type:

new_keypoints

easy_vision.python.core.ops.keypoint_ops.prune_outside_window(keypoints, window, scope=None)[source]

Prunes keypoints that fall outside a given window.

This function replaces keypoints that fall outside the given window with nan. See also clip_to_window which clips any keypoints that fall outside the given window.

Parameters:
  • keypoints – a tensor of shape [num_instances, num_keypoints, 2]
  • window – a tensor of shape [4] representing the [y_min, x_min, y_max, x_max] window outside of which the op should prune the keypoints.
  • scope – name scope.
Returns:

a tensor of shape [num_instances, num_keypoints, 2]

Return type:

new_keypoints

easy_vision.python.core.ops.keypoint_ops.rot90(keypoints, k=<tf.Tensor 'Const:0' shape=() dtype=int32>, scope=None)[source]

Rotates the keypoints counter-clockwise by 90 degrees.

Parameters:
  • keypoints – a tensor of shape [num_instances, num_keypoints, 2]
  • scope – name scope.
  • k – number of rotate 90 operation
Returns:

a tensor of shape [num_instances, num_keypoints, 2]

Return type:

new_keypoints

easy_vision.python.core.ops.keypoint_ops.scale(keypoints, y_scale, x_scale, scope=None)[source]

Scales keypoint coordinates in x and y dimensions.

Parameters:
  • keypoints – a tensor of shape [num_instances, num_keypoints, 2]
  • y_scale – (float) scalar tensor
  • x_scale – (float) scalar tensor
  • scope – name scope.
Returns:

a tensor of shape [num_instances, num_keypoints, 2]

Return type:

new_keypoints

easy_vision.python.core.ops.keypoint_ops.to_absolute_coordinates(keypoints, height, width, check_range=True, scope=None)[source]

Converts normalized keypoint coordinates to absolute pixel coordinates.

This function raises an assertion failed error when the maximum keypoint coordinate value is larger than 1.01 (in which case coordinates are already absolute).

Parameters:
  • keypoints – A tensor of shape [num_instances, num_keypoints, 2]
  • height – Maximum value for y coordinate of absolute keypoint coordinates.
  • width – Maximum value for x coordinate of absolute keypoint coordinates.
  • check_range – If True, checks if the coordinates are normalized or not.
  • scope – name scope.
Returns:

tensor of shape [num_instances, num_keypoints, 2] with absolute coordinates in terms of the image size.

easy_vision.python.core.ops.keypoint_ops.to_normalized_coordinates(keypoints, height, width, check_range=True, scope=None)[source]

Converts absolute keypoint coordinates to normalized coordinates in [0, 1].

Usually one uses the dynamic shape of the image or conv-layer tensor:
keypoints = keypoint_ops.to_normalized_coordinates(keypoints,
tf.shape(images)[1], tf.shape(images)[2]),

This function raises an assertion failed error at graph execution time when the maximum coordinate is smaller than 1.01 (which means that coordinates are already normalized). The value 1.01 is to deal with small rounding errors.

Parameters:
  • keypoints – A tensor of shape [num_instances, num_keypoints, 2].
  • height – Maximum value for y coordinate of absolute keypoint coordinates.
  • width – Maximum value for x coordinate of absolute keypoint coordinates.
  • check_range – If True, checks if the coordinates are normalized.
  • scope – name scope.
Returns:

tensor of shape [num_instances, num_keypoints, 2] with normalized coordinates in [0, 1].

easy_vision.python.core.ops.normalization

Contains the normalization layer classes and their functional aliases.

easy_vision.python.core.ops.normalization.group_norm(*args, **kwargs)[source]

Functional interface for the group normalization layer.

Reference: https://arxiv.org/abs/1803.08494.

“Group Normalization”, Yuxin Wu, Kaiming He
Parameters:
  • inputs – A Tensor with at least 2 dimensions one which is channels. All shape dimensions except for batch must be fully defined.
  • groups – Integer. Divide the channels into this number of groups over which normalization statistics are computed. This number must be commensurate with the number of channels in inputs.
  • channels_axis – An integer. Specifies index of channels axis which will be broken into groups, each of which whose statistics will be computed across. Must be mutually exclusive with reduction_axes. Preferred usage is to specify negative integers to be agnostic as to whether a batch dimension is included.
  • reduction_axes
    Tuple of integers. Specifies dimensions over which
    statistics will be accumulated. Must be mutually exclusive with channels_axis. Statistics will not be accumulated across axes not specified in reduction_axes nor channel_axis. Preferred usage is to specify negative integers to be agnostic to whether a batch dimension is included.
    Some sample usage cases:
    NHWC format: channels_axis=-1, reduction_axes=[-3, -2] NCHW format: channels_axis=-3, reduction_axes=[-2, -1]
  • center – If True, add offset of beta to normalized tensor. If False, beta is ignored.
  • scale – If True, multiply by gamma. If False, gamma is not used. When the next layer is linear (also e.g. nn.relu), this can be disabled since the scaling can be done by the next layer.
  • epsilon – Small float added to variance to avoid dividing by zero.
  • activation_fn – Activation function, default set to None to skip it and maintain a linear activation.
  • param_initializers – Optional initializers for beta, gamma, moving mean and moving variance.
  • reuse – Whether or not the layer and its variables should be reused. To be able to reuse the layer scope must be given.
  • variables_collections – Optional collections for the variables.
  • outputs_collections – Collections to add the outputs.
  • trainable – If True also add variables to the graph collection GraphKeys.TRAINABLE_VARIABLES (see tf.Variable).
  • scope – Optional scope for variable_scope.
  • mean_close_to_zero – The mean of input before ReLU will be close to zero when batch size >= 4k for Resnet-50 on TPU. If True, use nn.sufficient_statistics and nn.normalize_moments to calculate the variance. This is the same behavior as fused equals True in batch normalization. If False, use nn.moments to calculate the variance. When mean is close to zero, like 1e-4, use mean to calculate the variance may have poor result due to repeated roundoff error and denormalization in mean. When mean is large, like 1e2, sum(input`^2) is so large that only the high-order digits of the elements are being accumulated. Thus, use sum(`input - mean)^2/n to calculate the variance has better accuracy compared to (sum(input`^2)/n - `mean`^2) when `mean is large.
Returns:

A Tensor representing the output of the operation.

Raises:
  • ValueError – If the rank of inputs is undefined.
  • ValueError – If rank or channels dimension of inputs is undefined.
  • ValueError – If number of groups is not commensurate with number of channels.
  • ValueError – If reduction_axes or channels_axis are out of bounds.
  • ValueError – If reduction_axes are not mutually exclusive with channels_axis.

easy_vision.python.core.ops.post_processing

Post-processing operations on detected boxes.

easy_vision.python.core.ops.post_processing.batch_multiclass_non_max_suppression(boxes, scores, score_thresh, iou_thresh, max_size_per_class, max_total_size=0, clip_window=None, change_coordinate_frame=False, class_agnostic=False, num_valid_boxes=None, masks=None, additional_fields=None, scope=None, parallel_iterations=32)[source]

Multi-class version of non maximum suppression that operates on a batch.

This op is similar to multiclass_non_max_suppression but operates on a batch of boxes and scores. See documentation for multiclass_non_max_suppression for details.

Parameters:
  • boxes

    A [batch_size, num_anchors, q, 4] float32 tensor containing detections. If q is 1 then same boxes are used for all classes

    otherwise, if q is equal to number of classes, class-specific boxes are used.
  • scores – A [batch_size, num_anchors, num_classes] float32 tensor containing the scores for each of the num_anchors detections.
  • score_thresh – scalar threshold for score (low scoring boxes are removed).
  • iou_thresh – scalar threshold for IOU (new boxes that have high IOU overlap with previously selected boxes are removed).
  • max_size_per_class – maximum number of retained boxes per class.
  • max_total_size – maximum number of boxes retained over all classes. By default returns all boxes retained after capping boxes per class.
  • clip_window – A float32 tensor of shape [batch_size, 4] where each entry is of the form [y_min, x_min, y_max, x_max] representing the window to clip boxes to before performing non-max suppression. This argument can also be a tensor of shape [4] in which case, the same clip window is applied to all images in the batch. If clip_widow is None, all boxes are used to perform non-max suppression.
  • change_coordinate_frame – Whether to normalize coordinates after clipping relative to clip_window (this can only be set to True if a clip_window is provided)
  • num_valid_boxes – (optional) a Tensor of type int32. A 1-D tensor of shape [batch_size] representing the number of valid boxes to be considered for each image in the batch. This parameter allows for ignoring zero paddings.
  • masks – (optional) a [batch_size, num_anchors, q, mask_height, mask_width] float32 tensor containing box masks. q can be either number of classes or 1 depending on whether a separate mask is predicted per class.
  • additional_fields – (optional) If not None, a dictionary that maps keys to tensors whose dimensions are [batch_size, num_anchors, …].
  • scope – tf scope name.
  • parallel_iterations – (optional) number of batch items to process in parallel.
Returns:

A [batch_size, max_detections, 4] float32 tensor

containing the non-max suppressed boxes.

’nmsed_scores’: A [batch_size, max_detections] float32 tensor containing

the scores for the boxes.

’nmsed_classes’: A [batch_size, max_detections] float32 tensor

containing the class for boxes.

’nmsed_masks’: (optional) a

[batch_size, max_detections, mask_height, mask_width] float32 tensor containing masks for each selected box. This is set to None if input masks is None.

’nmsed_additional_fields’: (optional) a dictionary of

[batch_size, max_detections, …] float32 tensors corresponding to the tensors specified in the input additional_fields. This is not returned if input additional_fields is None.

’num_detections’: A [batch_size] int32 tensor indicating the number of

valid detections per batch item. Only the top num_detections[i] entries in nms_boxes[i], nms_scores[i] and nms_class[i] are valid. The rest of the entries are zero paddings.

Return type:

‘nmsed_boxes’

Raises:

ValueError – if q in boxes.shape is not 1 or not equal to number of classes as inferred from scores.shape.

easy_vision.python.core.ops.post_processing.batch_multiclass_ts_non_max_suppression(boxes, scores, score_thresh, iou_thresh, max_size_per_class, max_total_size=0, clip_window=None, change_coordinate_frame=False, num_valid_boxes=None, masks=None, additional_fields=None, scope=None, parallel_iterations=32)[source]

Multi-class version of non maximum suppression that operates on a batch.

This op is similar to multiclass_non_max_suppression but operates on a batch of boxes and scores. See documentation for multiclass_non_max_suppression for details.

Parameters:
  • boxes

    A [batch_size, num_anchors, q, 4] float32 tensor containing detections. If q is 1 then same boxes are used for all classes

    otherwise, if q is equal to number of classes, class-specific boxes are used.
  • scores – A [batch_size, num_anchors, num_classes] float32 tensor containing the scores for each of the num_anchors detections.
  • score_thresh – scalar threshold for score (low scoring boxes are removed).
  • iou_thresh – scalar threshold for IOU (new boxes that have high IOU overlap with previously selected boxes are removed).
  • max_size_per_class – maximum number of retained boxes per class.
  • max_total_size – maximum number of boxes retained over all classes. By default returns all boxes retained after capping boxes per class.
  • clip_window – A float32 tensor of shape [batch_size, 4] where each entry is of the form [y_min, x_min, y_max, x_max] representing the window to clip boxes to before performing non-max suppression. This argument can also be a tensor of shape [4] in which case, the same clip window is applied to all images in the batch. If clip_widow is None, all boxes are used to perform non-max suppression.
  • change_coordinate_frame – Whether to normalize coordinates after clipping relative to clip_window (this can only be set to True if a clip_window is provided)
  • num_valid_boxes – (optional) a Tensor of type int32. A 1-D tensor of shape [batch_size] representing the number of valid boxes to be considered for each image in the batch. This parameter allows for ignoring zero paddings.
  • masks – (optional) a [batch_size, num_anchors, q, mask_height, mask_width] float32 tensor containing box masks. q can be either number of classes or 1 depending on whether a separate mask is predicted per class.
  • additional_fields – (optional) If not None, a dictionary that maps keys to tensors whose dimensions are [batch_size, num_anchors, …].
  • scope – tf scope name.
  • parallel_iterations – (optional) number of batch items to process in parallel.
Returns:

A [batch_size, max_detections, 4] float32 tensor

containing the non-max suppressed boxes.

’nmsed_scores’: A [batch_size, max_detections] float32 tensor containing

the scores for the boxes.

’nmsed_classes’: A [batch_size, max_detections] float32 tensor

containing the class for boxes.

’nmsed_masks’: (optional) a

[batch_size, max_detections, mask_height, mask_width] float32 tensor containing masks for each selected box. This is set to None if input masks is None.

’nmsed_additional_fields’: (optional) a dictionary of

[batch_size, max_detections, …] float32 tensors corresponding to the tensors specified in the input additional_fields. This is not returned if input additional_fields is None.

’num_detections’: A [batch_size] int32 tensor indicating the number of

valid detections per batch item. Only the top num_detections[i] entries in nms_boxes[i], nms_scores[i] and nms_class[i] are valid. The rest of the entries are zero paddings.

Return type:

‘nmsed_boxes’

Raises:

ValueError – if q in boxes.shape is not 1 or not equal to number of classes as inferred from scores.shape.

easy_vision.python.core.ops.post_processing.class_agnostic_non_max_suppression(boxes, scores, score_thresh, iou_thresh, max_classes_per_detection=1, max_total_size=0, clip_window=None, change_coordinate_frame=False, masks=None, boundaries=None, use_partitioned_nms=False, additional_fields=None, soft_nms_sigma=0.0, scope=None)[source]

Class-agnostic version of non maximum suppression. This op greedily selects a subset of detection bounding boxes, pruning away boxes that have high IOU (intersection over union) overlap (> thresh) with already selected boxes. It operates on all the boxes using max scores across all classes for which scores are provided (via the scores field of the input box_list), pruning boxes with score less than a provided threshold prior to applying NMS. Please note that this operation is performed in a class-agnostic way, therefore any background classes should be removed prior to calling this function. Selected boxes are guaranteed to be sorted in decreasing order by score (but the sort is not guaranteed to be stable). :param boxes: A [k, q, 4] float32 tensor containing k detections. q can be either

number of classes or 1 depending on whether a separate box is predicted per class.
Parameters:
  • scores – A [k, num_classes] float32 tensor containing the scores for each of the k detections. The scores have to be non-negative when pad_to_max_output_size is True.
  • score_thresh – scalar threshold for score (low scoring boxes are removed).
  • iou_thresh – scalar threshold for IOU (new boxes that have high IOU overlap with previously selected boxes are removed).
  • max_classes_per_detection – maximum number of retained classes per detection box in class-agnostic NMS.
  • max_total_size – maximum number of boxes retained over all classes. By default returns all boxes retained after capping boxes per class.
  • clip_window – A float32 tensor of the form [y_min, x_min, y_max, x_max] representing the window to clip and normalize boxes to before performing non-max suppression.
  • change_coordinate_frame – Whether to normalize coordinates after clipping relative to clip_window (this can only be set to True if a clip_window is provided)
  • masks – (optional) a [k, q, mask_height, mask_width] float32 tensor containing box masks. q can be either number of classes or 1 depending on whether a separate mask is predicted per class.
  • boundaries – (optional) a [k, q, boundary_height, boundary_width] float32 tensor containing box boundaries. q can be either number of classes or 1 depending on whether a separate boundary is predicted per class.
  • use_partitioned_nms – If true, use partitioned version of non_max_suppression.
  • additional_fields – (optional) If not None, a dictionary that maps keys to tensors whose first dimensions are all of size k. After non-maximum suppression, all tensors corresponding to the selected boxes will be added to resulting BoxList.
  • soft_nms_sigma – A scalar float representing the Soft NMS sigma parameter; See Bodla et al, https://arxiv.org/abs/1704.04503). When soft_nms_sigma=0.0 (which is default), we fall back to standard (hard) NMS. Soft NMS is currently only supported when pad_to_max_output_size is False.
  • scope – name scope.
Returns:

A tuple of sorted_boxes and num_valid_nms_boxes. The sorted_boxes is a

BoxList holds M boxes with a rank-1 scores field representing corresponding scores for each box with scores sorted in decreasing order and a rank-1 classes field representing a class label for each box. The num_valid_nms_boxes is a 0-D integer tensor representing the number of valid elements in BoxList, with the valid elements appearing first.

Raises:

ValueError – if iou_thresh is not in [0, 1] or if input boxlist does not have a valid scores field or if non-zero soft_nms_sigma is provided when pad_to_max_output_size is True.

easy_vision.python.core.ops.post_processing.multiclass_non_max_suppression(boxes, scores, score_thresh, iou_thresh, max_size_per_class, max_total_size=0, clip_window=None, change_coordinate_frame=False, masks=None, boundaries=None, additional_fields=None, scope=None)[source]

Multi-class version of non maximum suppression.

This op greedily selects a subset of detection bounding boxes, pruning away boxes that have high IOU (intersection over union) overlap (> thresh) with already selected boxes. It operates independently for each class for which scores are provided (via the scores field of the input box_list), pruning boxes with score less than a provided threshold prior to applying NMS.

Please note that this operation is performed on all classes, therefore any background classes should be removed prior to calling this function.

Parameters:
  • boxes – A [k, q, 4] float32 tensor containing k detections. q can be either number of classes or 1 depending on whether a separate box is predicted per class.
  • scores – A [k, num_classes] float32 tensor containing the scores for each of the k detections.
  • score_thresh – scalar threshold for score (low scoring boxes are removed).
  • iou_thresh – scalar threshold for IOU (new boxes that have high IOU overlap with previously selected boxes are removed).
  • max_size_per_class – maximum number of retained boxes per class.
  • max_total_size – maximum number of boxes retained over all classes. By default returns all boxes retained after capping boxes per class.
  • clip_window – A float32 tensor of the form [y_min, x_min, y_max, x_max] representing the window to clip and normalize boxes to before performing non-max suppression.
  • change_coordinate_frame – Whether to normalize coordinates after clipping relative to clip_window (this can only be set to True if a clip_window is provided)
  • masks – (optional) a [k, q, mask_height, mask_width] float32 tensor containing box masks. q can be either number of classes or 1 depending on whether a separate mask is predicted per class.
  • boundaries – (optional) a [k, q, boundary_height, boundary_width] float32 tensor containing box boundaries. q can be either number of classes or 1 depending on whether a separate boundary is predicted per class.
  • additional_fields – (optional) If not None, a dictionary that maps keys to tensors whose first dimensions are all of size k. After non-maximum suppression, all tensors corresponding to the selected boxes will be added to resulting BoxList.
  • scope – name scope.
Returns:

a BoxList holding M boxes with a rank-1 scores field representing

corresponding scores for each box with scores sorted in decreasing order and a rank-1 classes field representing a class label for each box.

Raises:

ValueError – if iou_thresh is not in [0, 1] or if input boxlist does not have a valid scores field.

easy_vision.python.core.ops.post_processing.multiclass_ts_non_max_suppression(segments, scores, score_thresh, iou_thresh, max_size_per_class, max_total_size=0, clip_window=None, change_coordinate_frame=False, masks=None, boundaries=None, additional_fields=None, scope=None)[source]

Multi-class version of non maximum suppression.

Parameters:
  • segments – A [k, q, 2] float32 tensor containing k detections. q can be either number of classes or 1 depending on whether a separate box is predicted per class.
  • scores – A [k, num_classes] float32 tensor containing the scores for each of the k detections.
  • score_thresh – scalar threshold for score (low scoring boxes are removed).
  • iou_thresh – scalar threshold for IOU (new boxes that have high IOU overlap with previously selected boxes are removed).
  • max_size_per_class – maximum number of retained boxes per class.
  • max_total_size – maximum number of boxes retained over all classes. By default returns all boxes retained after capping boxes per class.
  • clip_window – A float32 tensor of the form [y_min, x_min, y_max, x_max] representing the window to clip and normalize boxes to before performing non-max suppression.
  • change_coordinate_frame – Whether to normalize coordinates after clipping relative to clip_window (this can only be set to True if a clip_window is provided)
  • masks – (optional) a [k, q, mask_height, mask_width] float32 tensor containing box masks. q can be either number of classes or 1 depending on whether a separate mask is predicted per class.
  • boundaries – (optional) a [k, q, boundary_height, boundary_width] float32 tensor containing box boundaries. q can be either number of classes or 1 depending on whether a separate boundary is predicted per class.
  • additional_fields – (optional) If not None, a dictionary that maps keys to tensors whose first dimensions are all of size k. After non-maximum suppression, all tensors corresponding to the selected boxes will be added to resulting BoxList.
  • scope – name scope.
Returns:

a BoxList holding M boxes with a rank-1 scores field representing

corresponding scores for each box with scores sorted in decreasing order and a rank-1 classes field representing a class label for each box.

Raises:

ValueError – if iou_thresh is not in [0, 1] or if input boxlist does not have a valid scores field.

easy_vision.python.core.ops.region_similarity_calculator

Region Similarity Calculators for BoxLists.

Region Similarity Calculators compare a pairwise measure of similarity between the boxes in two BoxLists.

class easy_vision.python.core.ops.region_similarity_calculator.IoaSimilarity[source]

Bases: easy_vision.python.core.ops.region_similarity_calculator.RegionSimilarityCalculator

Class to compute similarity based on Intersection over Area (IOA) metric.

This class computes pairwise similarity between two BoxLists based on their pairwise intersections divided by the areas of second BoxLists.

class easy_vision.python.core.ops.region_similarity_calculator.IouSimilarity[source]

Bases: easy_vision.python.core.ops.region_similarity_calculator.RegionSimilarityCalculator

Class to compute similarity based on Intersection over Union (IOU) metric.

This class computes pairwise similarity between two BoxLists based on IOU.

class easy_vision.python.core.ops.region_similarity_calculator.NegSqDistSimilarity[source]

Bases: easy_vision.python.core.ops.region_similarity_calculator.RegionSimilarityCalculator

Class to compute similarity based on the squared distance metric.

This class computes pairwise similarity between two BoxLists based on the negative squared distance metric.

class easy_vision.python.core.ops.region_similarity_calculator.RegionSimilarityCalculator[source]

Bases: object

Abstract base class for region similarity calculator.

compare(boxlist1, boxlist2, scope=None)[source]

Computes matrix of pairwise similarity between BoxLists.

This op (to be overriden) computes a measure of pairwise similarity between the boxes in the given BoxLists. Higher values indicate more similarity.

Note that this method simply measures similarity and does not explicitly perform a matching.

Parameters:
  • boxlist1 – BoxList holding N boxes.
  • boxlist2 – BoxList holding M boxes.
  • scope – Op scope name. Defaults to ‘Compare’ if None.
Returns:

a (float32) tensor of shape [N, M] with pairwise similarity score.

class easy_vision.python.core.ops.region_similarity_calculator.TSIouSimilarity[source]

Bases: easy_vision.python.core.ops.region_similarity_calculator.RegionSimilarityCalculator

Class to compute similarity based on temporal segments Intersection over Union (IOU) metric.

This class computes pairwise similarity between two BoxLists based on IOU.

class easy_vision.python.core.ops.region_similarity_calculator.YOLOIouSimilarity[source]

Bases: easy_vision.python.core.ops.region_similarity_calculator.RegionSimilarityCalculator

Class to compute point-wise similarity based on Intersection
over Union (IOU) metric.

This class computes pairwise similarity between two BoxLists based on IOU.

easy_vision.python.core.ops.rnn_ops

easy_vision.python.core.ops.rnn_ops.build_dynamic_rnn(inputs, sequence_length, cell_config, encoder_type, num_layers, num_residual_layers, is_training, time_major=True, dtype=tf.float32)[source]

Create a dynamic rnn.

Parameters:
  • inputs – [time, batch, units] if time_major
  • sequence_length – inputs valid length
  • encoder_type – bi | uni | none
  • cell_config – type config of each cell.
  • num_layers – number of cells.
  • num_residual_layers – Number of residual layers from top to bottom. For example, if num_layers=4 and num_residual_layers=2, the last 2 RNN cells in the returned list will be wrapped with ResidualWrapper.
  • is_training – is tf.contrib.learn.TRAIN or not
  • time_major – is time first axis or not
  • dtype – data type
Returns:

The rnn output and the tuple RNN cell’s state.

easy_vision.python.core.ops.rnn_ops.create_rnn_cell(cell_config, num_layers, num_residual_layers, is_training)[source]

Create multi-layer RNN cell.

Parameters:
  • cell_config – type pb2 config representing the cell type.
  • num_layers – number of cells.
  • num_residual_layers – Number of residual layers from top to bottom. For example, if num_layers=4 and num_residual_layers=2, the last 2 RNN cells in the returned list will be wrapped with ResidualWrapper.
  • is_training – either tf.contrib.learn.TRAIN or not
Returns:

An RNNCell instance.

easy_vision.python.core.ops.shape_utils

Utils used to manipulate tensor shapes.

easy_vision.python.core.ops.shape_utils.assert_box_normalized(boxes, maximum_normalized_coordinate=1.1)[source]

Asserts the input box tensor is normalized.

Parameters:
  • boxes – a tensor of shape [N, 4] where N is the number of boxes.
  • maximum_normalized_coordinate – Maximum coordinate value to be considered as normalized, default to 1.1.
Returns:

a tf.Assert op which fails when the input box tensor is not normalized.

Raises:

ValueError – When the input box tensor is not normalized.

easy_vision.python.core.ops.shape_utils.assert_shape_equal(shape_a, shape_b)[source]

Asserts that shape_a and shape_b are equal.

If the shapes are static, raises a ValueError when the shapes mismatch.

If the shapes are dynamic, raises a tf InvalidArgumentError when the shapes mismatch.

Parameters:
  • shape_a – a list containing shape of the first tensor.
  • shape_b – a list containing shape of the second tensor.
Returns:

Either a tf.no_op() when shapes are all static and a tf.assert_equal() op when the shapes are dynamic.

Raises:

ValueError – When shapes are both static and unequal.

easy_vision.python.core.ops.shape_utils.assert_shape_equal_along_first_dimension(shape_a, shape_b)[source]

Asserts that shape_a and shape_b are the same along the 0th-dimension.

If the shapes are static, raises a ValueError when the shapes mismatch.

If the shapes are dynamic, raises a tf InvalidArgumentError when the shapes mismatch.

Parameters:
  • shape_a – a list containing shape of the first tensor.
  • shape_b – a list containing shape of the second tensor.
Returns:

Either a tf.no_op() when shapes are all static and a tf.assert_equal() op when the shapes are dynamic.

Raises:

ValueError – When shapes are both static and unequal.

easy_vision.python.core.ops.shape_utils.check_min_image_dim(min_dim, image_tensor)[source]

Checks that the image width/height are greater than some number.

This function is used to check that the width and height of an image are above a certain value. If the image shape is static, this function will perform the check at graph construction time. Otherwise, if the image shape varies, an Assertion control dependency will be added to the graph.

Parameters:
  • min_dim – The minimum number of pixels along the width and height of the image.
  • image_tensor – The image tensor to check size for.
Returns:

If image_tensor has dynamic size, return image_tensor with a Assert control dependency. Otherwise returns image_tensor.

Raises:

ValueError – if image_tensor’s’ width or height is smaller than min_dim.

easy_vision.python.core.ops.shape_utils.clip_tensor(t, length)[source]

Clips the input tensor along the first dimension up to the length.

Parameters:
  • t – the input tensor, assuming the rank is at least 1.
  • length – a tensor of shape [1] or an integer, indicating the first dimension of the input tensor t after clipping, assuming length <= t.shape[0].
Returns:

the clipped tensor, whose first dimension is length. If the

length is an integer, the first dimension of clipped_t is set to length statically.

Return type:

clipped_t

easy_vision.python.core.ops.shape_utils.combined_static_and_dynamic_shape(tensor)[source]

Returns a list containing static and dynamic values for the dimensions.

Returns a list of static and dynamic values for shape dimensions. This is useful to preserve static shapes when available in reshape operation.

Parameters:tensor – A tensor of any type.
Returns:A list of size tensor.shape.ndims containing integers or a scalar tensor.
easy_vision.python.core.ops.shape_utils.merge_shape(t, shape_list)[source]

merge static shape info into tensor :param t: the input tensor, assuming the rank is at least 1. :param shape_list: a list of shape, the same length of t.get_shape()

Returns:the tensor t with shape updated
easy_vision.python.core.ops.shape_utils.pad_nd(tensor, output_shape)[source]

Pad given tensor to the output shape.

Parameters:
  • tensor – Input tensor to pad or clip.
  • output_shape – A list of integers / scalar tensors (or None for dynamic dim) representing the size to pad or clip each dimension of the input tensor.
Returns:

Input tensor padded and clipped to the output shape.

easy_vision.python.core.ops.shape_utils.pad_or_clip_nd(tensor, output_shape)[source]

Pad or Clip given tensor to the output shape.

Parameters:
  • tensor – Input tensor to pad or clip.
  • output_shape – A list of integers / scalar tensors (or None for dynamic dim) representing the size to pad or clip each dimension of the input tensor.
Returns:

Input tensor padded and clipped to the output shape.

easy_vision.python.core.ops.shape_utils.pad_or_clip_tensor(t, length)[source]

Pad or clip the input tensor along the first dimension.

Parameters:
  • t – the input tensor, assuming the rank is at least 1.
  • length – a tensor of shape [1] or an integer, indicating the first dimension of the input tensor t after processing.
Returns:

the processed tensor, whose first dimension is length. If the

length is an integer, the first dimension of the processed tensor is set to length statically.

Return type:

processed_t

easy_vision.python.core.ops.shape_utils.pad_tensor(t, length)[source]

Pads the input tensor with 0s along the first dimension up to the length.

Parameters:
  • t – the input tensor, assuming the rank is at least 1.
  • length – a tensor of shape [1] or an integer, indicating the first dimension of the input tensor t after padding, assuming length <= t.shape[0].
Returns:

the padded tensor, whose first dimension is length. If the length

is an integer, the first dimension of padded_t is set to length statically.

Return type:

padded_t

easy_vision.python.core.ops.static_shape

Helper functions to access TensorShape values.

The rank 4 tensor_shape must be of the form [batch_size, height, width, depth].

easy_vision.python.core.ops.static_shape.get_batch_size(tensor_shape)[source]

Returns batch size from the tensor shape.

Parameters:tensor_shape – A rank 4 TensorShape.
Returns:An integer representing the batch size of the tensor.
easy_vision.python.core.ops.static_shape.get_depth(tensor_shape)[source]

Returns depth from the tensor shape.

Parameters:tensor_shape – A rank 4 TensorShape.
Returns:An integer representing the depth of the tensor.
easy_vision.python.core.ops.static_shape.get_height(tensor_shape)[source]

Returns height from the tensor shape.

Parameters:tensor_shape – A rank 4 TensorShape.
Returns:An integer representing the height of the tensor.
easy_vision.python.core.ops.static_shape.get_width(tensor_shape)[source]

Returns width from the tensor shape.

Parameters:tensor_shape – A rank 4 TensorShape.
Returns:An integer representing the width of the tensor.

easy_vision.python.core.ops.target_assigner

Base target assigner module.

The job of a TargetAssigner is, for a given set of anchors (bounding boxes) and groundtruth detections (bounding boxes), to assign classification and regression targets to each anchor as well as weights to each anchor (specifying, e.g., which anchors should not contribute to training loss).

It assigns classification/regression targets by performing the following steps: 1) Computing pairwise similarity between anchors and groundtruth boxes using a

provided RegionSimilarity Calculator
  1. Computing a matching based on the similarity matrix using a provided Matcher
  2. Assigning regression targets based on the matching and a provided BoxCoder
  3. Assigning classification targets based on the matching and groundtruth labels

Note that TargetAssigners only operate on detections from a single image at a time, so any logic for applying a TargetAssigner to multiple images must be handled externally.

class easy_vision.python.core.ops.target_assigner.TargetAssigner(similarity_calc, matcher, box_coder, negative_class_weight=1.0)[source]

Bases: object

Target assigner to compute classification and regression targets.

__init__(similarity_calc, matcher, box_coder, negative_class_weight=1.0)[source]

Construct Object Detection Target Assigner.

Parameters:
  • similarity_calc – a RegionSimilarityCalculator
  • matcher – an core.Matcher used to match groundtruth to anchors.
  • box_coder – an core.BoxCoder used to encode matching groundtruth boxes with respect to anchors.
  • negative_class_weight – classification weight to be associated to negative anchors (default: 1.0). The weight must be in [0., 1.].
Raises:

ValueError – if similarity_calc is not a RegionSimilarityCalculator or if matcher is not a Matcher or if box_coder is not a BoxCoder

assign(anchors, groundtruth_boxes, groundtruth_labels=None, unmatched_class_label=None, groundtruth_weights=None)[source]

Assign classification and regression targets to each anchor.

For a given set of anchors and groundtruth detections, match anchors to groundtruth_boxes and assign classification and regression targets to each anchor as well as weights based on the resulting match (specifying, e.g., which anchors should not contribute to training loss).

Anchors that are not matched to anything are given a classification target of self._unmatched_cls_target which can be specified via the constructor.

Parameters:
  • anchors – a BoxList representing N anchors
  • groundtruth_boxes – a BoxList representing M groundtruth boxes
  • groundtruth_labels – a tensor of shape [M, d_1, … d_k] with labels for each of the ground_truth boxes. The subshape [d_1, … d_k] can be empty (corresponding to scalar inputs). When set to None, groundtruth_labels assumes a binary problem where all ground_truth boxes get a positive label (of 1).
  • unmatched_class_label – a float32 tensor with shape [d_1, d_2, …, d_k] which is consistent with the classification target for each anchor (and can be empty for scalar targets). This shape must thus be compatible with the groundtruth labels that are passed to the “assign” function (which have shape [num_gt_boxes, d_1, d_2, …, d_k]). If set to None, unmatched_cls_target is set to be [0] for each anchor.
  • groundtruth_weights – a float tensor of shape [M] indicating the weight to assign to all anchors match to a particular groundtruth box. The weights must be in [0., 1.]. If None, all weights are set to 1. Generally no groundtruth boxes with zero weight match to any anchors as matchers are aware of groundtruth weights. Additionally, cls_weights and reg_weights are calculated using groundtruth weights as an added safety.
Returns:

a float32 tensor with shape [num_anchors, d_1, d_2 … d_k],

where the subshape [d_1, …, d_k] is compatible with groundtruth_labels which has shape [num_gt_boxes, d_1, d_2, … d_k].

cls_weights: a float32 tensor with shape [num_anchors] reg_targets: a float32 tensor with shape [num_anchors, box_code_dimension] reg_weights: a float32 tensor with shape [num_anchors] match: a matcher.Match object encoding the match between anchors and

groundtruth boxes, with rows corresponding to groundtruth boxes and columns corresponding to anchors.

Return type:

cls_targets

Raises:

ValueError – if anchors or groundtruth_boxes are not of type box_list.BoxList

box_coder
get_box_coder()[source]

Get BoxCoder of this TargetAssigner.

Returns:BoxCoder object.
easy_vision.python.core.ops.target_assigner.batch_assign_targets(target_assigner, anchors_batch, gt_box_batch, gt_class_targets_batch=None, unmatched_class_label=None, gt_weights_batch=None)[source]

Batched assignment of classification and regression targets.

Parameters:
  • target_assigner – a target assigner.
  • anchors_batch – BoxList representing N box anchors or list of BoxList objects with length batch_size representing anchor sets.
  • gt_box_batch – a list of BoxList objects with length batch_size representing groundtruth boxes for each image in the batch
  • gt_class_targets_batch – a list of tensors with length batch_size, where each tensor has shape [num_gt_boxes_i, classification_target_size] and num_gt_boxes_i is the number of boxes in the ith boxlist of gt_box_batch.
  • unmatched_class_label – a float32 tensor with shape [d_1, d_2, …, d_k] which is consistent with the classification target for each anchor (and can be empty for scalar targets). This shape must thus be compatible with the groundtruth labels that are passed to the “assign” function (which have shape [num_gt_boxes, d_1, d_2, …, d_k]).
  • gt_weights_batch – A list of 1-D tf.float32 tensors of shape [num_boxes] containing weights for groundtruth boxes.
Returns:

a tensor with shape [batch_size, num_anchors,

num_classes],

batch_cls_weights: a tensor with shape [batch_size, num_anchors,

num_classes],

batch_reg_targets: a tensor with shape [batch_size, num_anchors,

box_code_dimension]

batch_reg_weights: a tensor with shape [batch_size, num_anchors], match_list: a list of matcher.Match objects encoding the match between

anchors and groundtruth boxes for each image of the batch, with rows of the Match objects corresponding to groundtruth boxes and columns corresponding to anchors.

Return type:

batch_cls_targets

Raises:

ValueError – if input list lengths are inconsistent, i.e., batch_size == len(gt_box_batch) == len(gt_class_targets_batch)

and batch_size == len(anchors_batch) unless anchors_batch is a single BoxList.

easy_vision.python.core.ops.target_assigner.create_target_assigner(reference, stage=None, negative_class_weight=1.0, use_matmul_gather=False)[source]

Factory function for creating standard target assigners.

Parameters:
  • reference – string referencing the type of TargetAssigner.
  • stage – string denoting stage: {proposal, detection}.
  • negative_class_weight – classification weight to be associated to negative anchors (default: 1.0)
  • use_matmul_gather – whether to use matrix multiplication based gather which are better suited for TPUs.
Returns:

desired target assigner.

Return type:

TargetAssigner

Raises:

ValueError – if combination reference+stage is invalid.

easy_vision.python.core.ops.text_net_utils

easy_vision.python.core.ops.text_net_utils.basic_2d(*args, **kwargs)[source]

Resnet unit with two convolution layers

Parameters:
  • inputs – input tensor with shape [batch_size, height, width, channel]
  • filters – the number of filters in the convolution
  • kernel_size – An tuple of 2 integers, specifying the height and width of the 2D convolution window.
  • stride – An tuple of 2 integers, specifying the strides of the convolution along the height and width.
  • dilation_rate – An tuple of 2 integers, specifying the dilation rate to use for dilated convolution.
  • se_rate – squeeze and excitation rate.
  • valid_mask – same shape as input, specifying the valid part of input tensor.
  • scope – variable scope
  • outputs_collections – Collection to add the ResNet unit output.
Returns:

output tensor and output valid mask

easy_vision.python.core.ops.text_net_utils.bottleneck_2d(*args, **kwargs)[source]

Resnet unit with bottleneck style

Parameters:
  • inputs – input tensor with shape [batch_size, height, width, channel]
  • filters – the number of filters in the convolution
  • kernel_size – An tuple of 2 integers, specifying the height and width of the 2D convolution window.
  • stride – An tuple of 2 integers, specifying the strides of the convolution along the height and width.
  • dilation_rate – An tuple of 2 integers, specifying the dilation rate to use for dilated convolution.
  • se_rate – squeeze and excitation rate.
  • valid_mask – same shape as input, specifying the valid part of input tensor.
  • scope – variable scope
  • outputs_collections – Collection to add the ResNet unit output.
Returns:

output tensor and output valid mask

easy_vision.python.core.ops.text_net_utils.net_arg_scope(weight_decay=0.0001, norm_type=2, norm_epsilon=1e-05, norm_scale=True, batch_norm_trainable=False, batch_norm_decay=0.997, group_norm_groups=32, activation_fn=<function relu>, outputs_collections=None)[source]

Defines the default ResNet arg scope.

Parameters:
  • weight_decay – The weight decay to use for regularizing the model.
  • norm_type – normalization layer type.
  • norm_epsilon – Small constant to prevent division by zero when normalizing activations by their variance in normalization.
  • norm_scale – If True, uses an explicit gamma multiplier to scale the activations in normalization layer.
  • batch_norm_trainable – batch_norm in training mode or not
  • batch_norm_decay – The moving average decay when estimating layer activation statistics in batch normalization.
  • group_norm_groups – Divide the channels into this number of groups over which normalization statistics are computed
  • activation_fn – The activation function which is used in ResNet.
  • norm_type – Whether or not to use batch normalization.
  • outputs_collections – scope of end_points collection.
Returns:

An arg_scope to use for the resnet models.

easy_vision.python.core.ops.text_net_utils.reshape_height_to_channel(input, time_major=True)[source]

Reshape input tensor height axis into channel axis

easy_vision.python.core.ops.text_net_utils.reshape_height_to_time(input, time_major=True)[source]

Reshape input tensor height axis into time axis

easy_vision.python.core.ops.text_net_utils.residual_block_2d(inputs, filters, kernels, init_stride=(2, 2), init_dilation=(1, 1), se_rate=None, valid_mask=None, block_type='bottleneck', scope=None)[source]

Conv 2D Residual Block. Support SE & GroupConvolution

Parameters:
  • inputs – input tensor with shape [batch_size, height, width, channel]
  • filters – the number of filters in the convolution
  • kernel_size – An tuple of 2 integers, specifying the height and width of the 2D convolution window.
  • init_stride – An tuple of 2 integers, specifying the strides of the first unit along the height and width.
  • init_dilation – An tuple of 2 integers, specifying the dilation rate of the first unit to use for dilated convolution.
  • se_rate – squeeze and excitation rate.
  • valid_mask – same shape as input, specifying the valid part of input tensor.
  • block_type – specify the resnet unit type bottleneck or basic.
  • scope – variable scope
Returns:

output tensor and output valid mask

easy_vision.python.core.ops.transform_ops

easy_vision.python.core.ops.transform_ops.ThinPlateSpline2(U, source, target, instance_ind, out_size)[source]

Thin Plate Spline Spatial Transformer Layer TPS control points are arranged in arbitrary positions given by source. U : float Tensor [num_batch, height, width, num_channels].

Input Tensor.
source : float Tensor [num_instance, num_point, 2]
The source position of the control points.
target : float Tensor [num_instance, num_point, 2]
The target position of the control points.
instance_ind : A 1-D tensor of shape [num_instance] with int32 values in [0, num_batch).
The value of instance_ind[i] specifies the image that the i-th box refers to.
out_size: tuple of two integers [height, width]
The size of the output of the network (height, width)
Reference :
  1. Spatial Transformer Network implemented by TensorFlow
  1. Thin Plate Spline Spatial Transformer Network with regular grids.

easy_vision.python.core.ops.ts_list

class easy_vision.python.core.ops.ts_list.TimeSegmentList(segments)[source]

Bases: easy_vision.python.core.ops.box_list.BoxList

__init__(segments)[source]

Constructs temporal segments collection.

Parameters:segments – a tensor of shape [N, 2] representing segment corners
Raises:ValueError – if invalid dimensions for segment data or if segment data is not in float32 format.
get()[source]

Convenience function for accessing time segment coordinates.

Returns:a tensor with shape [N, 2] representing time segment coordinates.
get_center_coordinates_and_sizes(scope=None)[source]

Computes the center coordinates and length of the segments.

Parameters:scope – name scope of the function.
Returns:a list of 2 1-D tensors [center, length].
get_extra_fields()[source]

Returns all non-segment fields (i.e., everything not named ‘segments’).

num_boxes()[source]

Returns number of time segments held in collection.

Returns:a tensor representing the number of time segments held in the collection.
num_boxes_static()[source]

Returns number of time segments held in collection.

This number is inferred at graph construction time rather than run-time.

Returns:
Number of time segments held in collection (integer) or None if this is not
inferrable at graph construction time.
set(segments)[source]

Convenience function for setting segment coordinates.

Parameters:segments – a tensor of shape [N, 2] representing segment corners
Raises:ValueError – if invalid dimensions for segment data

easy_vision.python.core.ops.ts_list_ops

Time Segment List operations.

Example ts operations that are supported:
  • length: compute time segments length
  • iou: pairwise intersection-over-union scores

Whenever ts_list_ops functions output a tsList, the fields of the incoming TimeSegmentList are retained unless documented otherwise.

easy_vision.python.core.ops.ts_list_ops.change_ts_coordinate_frame(tsList, window, scope=None)[source]

Change coordinate frame of the TimeSegmentList to be relative to window’s frame.

Parameters:
  • tsList – TimeSegmentList holding M_in time segments.
  • window – A rank 1 tensor [2].
  • scope – name scope.
Returns:

Returns a TimeSegmentList object with N time segments.

easy_vision.python.core.ops.ts_list_ops.clip_ts_to_window(tsList, window, filter_nonoverlapping=True, scope=None)[source]

Clip time segments to a window.

Parameters:
  • tsList – TimeSegmentList holding M_in time segments
  • window – a tensor of shape [2] representing the [t_min, t_max] window to which the op should clip time segments.
  • filter_nonoverlapping – whether to filter out boxes that do not overlap at all with the window.
  • scope – name scope.
Returns:

a TimeSegmentList holding M_out time segments where M_out <= M_in

easy_vision.python.core.ops.ts_list_ops.length(ts, scope=None)[source]

Computes length of temporal segments.

Parameters:
  • ts – holding N temporal segments
  • scope – name scope.
Returns:

a tensor with shape [N] representing time segment length.

easy_vision.python.core.ops.ts_list_ops.pad_or_clip_ts_list(tslist, num_ts, scope=None)[source]

Pads or clips all fields of a TimeSegmentList.

Parameters:
  • tslist – A TimeSegmentList with arbitrary of number of boxes.
  • num_ts – First num_ts in tslist are kept. The fields are zero-padded if num_ts is bigger than the actual number of time segments.
  • scope – name scope.
Returns:

TimeSegmentList with all fields padded or clipped.

easy_vision.python.core.ops.ts_list_ops.prune_ts_outside_window(tsList, window, scope=None)[source]

Prunes bounding boxes that fall outside a given window.

Parameters:
  • tsList – TimeSegmentList holding M_in time segments
  • window – a float tensor of shape [2] representing [tmin, tmax] of the window
  • scope – name scope.
Returns:

a tensor with shape [M_out, 2] where M_out <= M_in valid_indices: a tensor with shape [M_out] indexing the valid time segments

in the input tensor.

Return type:

pruned_corners

easy_vision.python.core.ops.ts_list_ops.to_absolute_ts_coordinates(tsList, length, check_range=True, maximum_normalized_coordinate=1.01, scope=None)[source]

Converts normalized ts coordinates to absolute pixel coordinates.

Parameters:
  • tsList – TimeSegmentList with coordinates in range [0, 1].
  • length – Maximum value for length of absolute ts coordinates.
  • check_range – If True, checks if the coordinates are normalized or not.
  • maximum_normalized_coordinate – Maximum coordinate value to be considered as normalized, default to 1.01.
  • scope – name scope.
Returns:

TimeSegmentList with absolute coordinates in terms of the video length.

easy_vision.python.core.ops.ts_list_ops.to_normalized_ts_coordinates(tsList, length, check_range=True, scope=None)[source]

Converts absolute ts coordinates to normalized coordinates in [0, 1].

Parameters:
  • tsList – TimeSegmentList with coordinates in terms of pixel-locations.
  • height – Maximum value for height of absolute ts coordinates.
  • width – Maximum value for width of absolute ts coordinates.
  • check_range – If True, checks if the coordinates are normalized or not.
  • scope – name scope.
Returns:

TimeSegmentList with normalized coordinates in [0, 1].

easy_vision.python.core.ops.ts_list_ops.ts_boolean_mask(tslist, indicator, fields=None, scope=None)[source]

Select boxes from tsList according to indicator and return new tsList.

Parameters:
  • tslist – TimeSegmentList holding N time segments
  • indicator – a rank-1 boolean tensor
  • fields – (optional) list of fields to also gather from. If None (default), all fields are gathered from. Pass an empty fields list to only gather the ts coordinates.
  • scope – name scope.
Returns:

a TimeSegmentList corresponding to the subset of the input TimeSegmentList

specified by indicator

Return type:

subtslist

Raises:

ValueError – if indicator is not a rank-1 boolean tensor.

easy_vision.python.core.ops.ts_list_ops.ts_concatenate(tslists, fields=None, scope=None)[source]

Concatenate list of TimeSegmentLists.

Parameters:
  • tslists – list of TimeSegmentList objects
  • fields – optional list of fields to also concatenate. By default, all fields from the first tsList in the list are included in the concatenation.
  • scope – name scope.
Returns:

a TimeSegmentList with number of time segments equal to

sum([tslist.num_boxes() for tslist in tsList])

Raises:

ValueError – if tslists is invalid (i.e., is not a list, is empty, or contains non TimeSegmentList objects), or if requested fields are not contained in all tslists

easy_vision.python.core.ops.ts_list_ops.ts_filter_greater_than(tslist, thresh, scope=None)[source]

Filter to keep only time segments with score exceeding a given threshold.

Parameters:
  • tslist – TimeSegmentList holding N segments. Must contain a ‘scores’ field representing detection scores.
  • thresh – scalar threshold
  • scope – name scope.
Returns:

a TimeSegmentList holding M boxes where M <= N

Raises:

ValueError – if tslist not a TimeSegmentList object or if it does not have a scores field

easy_vision.python.core.ops.ts_list_ops.ts_gather(tslist, indices, fields=None, scope=None)[source]

Gather timesegments from TimeSegmentList according to indices and return new TimeSegmentList.

Parameters:
  • tslist – TimeSegmentList holding N time segments
  • indices – a rank-1 tensor of type int32 / int64
  • fields – (optional) list of fields to also gather from. If None (default), all fields are gathered from. Pass an empty fields list to only gather the ts coordinates.
  • scope – name scope.
Returns:

a TimeSegmentList corresponding to the subset of the input TimeSegmentList specified by indices

Return type:

subtslist

Raises:

ValueError – if specified field is not contained in tslist or if the indices are not of type int32

easy_vision.python.core.ops.ts_list_ops.ts_intersection(ts1, ts2, scope=None)[source]

Compute pairwise intersection areas between temporal segments.

Parameters:
  • ts1 – holding N temporal segments
  • ts2 – holding M temporal segments
  • scope – name scope.
Returns:

a tensor with shape [N, M] representing pairwise intersections

easy_vision.python.core.ops.ts_list_ops.ts_iou(tsList1, tsList2, scope=None)[source]

Computes pairwise intersection-over-union between time segment collections.

Parameters:
  • tsList1 – tsList holding N boxes
  • tsList2 – tsList holding M boxes
  • scope – name scope.
Returns:

a tensor with shape [N, M] representing pairwise iou scores.

easy_vision.python.core.ops.ts_list_ops.ts_scale(tslist, t_scale, scope=None)[source]

scale time segment coordinates in t dimensions.

Parameters:
  • tslist – TimeSegmentList holding N time segments
  • t_scale – (float) scalar tensor
  • scope – name scope.
Returns:

TimeSegmentList holding N boxes

Return type:

tslist

easy_vision.python.core.ops.ts_list_ops.ts_sort_by_field(tslist, field, order=2, scope=None)[source]

Sort segments and associated fields according to a scalar field.

A common use case is reordering the segments according to descending scores.

Parameters:
  • tslist – TimeSegmentList holding N segments.
  • field – A TimeSegmentList field for sorting and reordering the TimeSegmentList.
  • order – (Optional) descend or ascend. Default is descend.
  • scope – name scope.
Returns:

A sorted TimeSegmentList with the field in the specified order.

Return type:

sorted_tslist

Raises:
  • ValueError – if specified field does not exist
  • ValueError – if the order is not either descend or ascend