easy_vision.python.core.preprocessing¶
easy_vision.python.core.preprocessing.autoaugment¶
AutoAugment and RandAugment policies for enhanced image preprocessing.
AutoAugment Reference: https://arxiv.org/abs/1805.09501 RandAugment Reference: https://arxiv.org/abs/1909.13719
-
easy_vision.python.core.preprocessing.autoaugment.
autocontrast
(image)[source]¶ Implements Autocontrast function from PIL using TF ops.
Parameters: image – A 3D uint8 tensor. Returns: The image after it has had autocontrast applied to it and will be of type uint8.
-
easy_vision.python.core.preprocessing.autoaugment.
blend
(image1, image2, factor)[source]¶ Blend image1 and image2 using ‘factor’.
Factor can be above 0.0. A value of 0.0 means only image1 is used. A value of 1.0 means only image2 is used. A value between 0.0 and 1.0 means we linearly interpolate the pixel values between the two images. A value greater than 1.0 “extrapolates” the difference between the two pixel values, and we clip the results to values between 0 and 255.
Parameters: - image1 – An image Tensor of type uint8.
- image2 – An image Tensor of type uint8.
- factor – A floating point value above 0.0.
Returns: A blended image Tensor of type uint8.
-
easy_vision.python.core.preprocessing.autoaugment.
brightness
(image, factor)[source]¶ Equivalent of PIL Brightness.
-
easy_vision.python.core.preprocessing.autoaugment.
build_and_apply_nas_policy
(policies, image, augmentation_hparams)[source]¶ Build a policy from the given policies passed in and apply to image.
Parameters: - policies – list of lists of tuples in the form (func, prob, level), func is a string name of the augmentation function, prob is the probability of applying the func operation, level is the input argument for func.
- image – tf.Tensor that the resulting policy will be applied to.
- augmentation_hparams – Hparams associated with the NAS learned policy.
Returns: A version of image that now has data augmentation applied to it based on the policies pass into the function.
-
easy_vision.python.core.preprocessing.autoaugment.
color
(image, factor)[source]¶ Equivalent of PIL Color.
-
easy_vision.python.core.preprocessing.autoaugment.
contrast
(image, factor)[source]¶ Equivalent of PIL Contrast.
-
easy_vision.python.core.preprocessing.autoaugment.
cutout
(image, pad_size, replace=0)[source]¶ Apply cutout (https://arxiv.org/abs/1708.04552) to image.
This operation applies a (2*pad_size x 2*pad_size) mask of zeros to a random location within img. The pixel values filled in will be of the value replace. The located where the mask will be applied is randomly chosen uniformly over the whole image.
Parameters: - image – An image Tensor of type uint8.
- pad_size – Specifies how big the zero mask that will be generated is that is applied to the image. The mask will be of size (2*pad_size x 2*pad_size).
- replace – What pixel value to fill in the image in the area that has the cutout mask applied to it.
Returns: An image Tensor that is of type uint8.
-
easy_vision.python.core.preprocessing.autoaugment.
distort_image_with_autoaugment
(image, augmentation_name)[source]¶ Applies the AutoAugment policy to image.
AutoAugment is from the paper: https://arxiv.org/abs/1805.09501.
Parameters: - image – Tensor of shape [height, width, 3] representing an image.
- augmentation_name – The name of the AutoAugment policy to use. The available options are v0 and test. v0 is the policy used for all of the results in the paper and was found to achieve the best results on the COCO dataset. v1, v2 and v3 are additional good policies found on the COCO dataset that have slight variation in what operations were used during the search procedure along with how many operations are applied in parallel to a single image (2 vs 3).
Returns: A tuple containing the augmented versions of image.
-
easy_vision.python.core.preprocessing.autoaugment.
distort_image_with_randaugment
(image, num_layers, magnitude)[source]¶ Applies the RandAugment policy to image.
RandAugment is from the paper https://arxiv.org/abs/1909.13719,
Parameters: - image – Tensor of shape [height, width, 3] representing an image.
- num_layers – Integer, the number of augmentation transformations to apply sequentially to an image. Represented as (N) in the paper. Usually best values will be in the range [1, 3].
- magnitude – Integer, shared magnitude across all augmentation operations. Represented as (M) in the paper. Usually best values are in the range [5, 30].
Returns: The augmented version of image.
-
easy_vision.python.core.preprocessing.autoaugment.
equalize
(image)[source]¶ Implements Equalize function from PIL using TF ops.
-
easy_vision.python.core.preprocessing.autoaugment.
policy_v0
()[source]¶ Autoaugment policy that was used in AutoAugment Paper.
-
easy_vision.python.core.preprocessing.autoaugment.
policy_vtest
()[source]¶ Autoaugment test policy for debugging.
-
easy_vision.python.core.preprocessing.autoaugment.
posterize
(image, bits)[source]¶ Equivalent of PIL Posterize.
-
easy_vision.python.core.preprocessing.autoaugment.
rotate
(image, degrees, replace)[source]¶ Rotates the image by degrees either clockwise or counterclockwise.
Parameters: - image – An image Tensor of type uint8.
- degrees – Float, a scalar angle in degrees to rotate all images by. If degrees is positive the image will be rotated clockwise otherwise it will be rotated counterclockwise.
- replace – A one or three value 1D tensor to fill empty pixels caused by the rotate operation.
Returns: The rotated version of image.
-
easy_vision.python.core.preprocessing.autoaugment.
select_and_apply_random_policy
(policies, image)[source]¶ Select a random policy from policies and apply it to image.
-
easy_vision.python.core.preprocessing.autoaugment.
sharpness
(image, factor)[source]¶ Implements Sharpness function from PIL using TF ops.
-
easy_vision.python.core.preprocessing.autoaugment.
shear_x
(image, level, replace)[source]¶ Equivalent of PIL Shearing in X dimension.
-
easy_vision.python.core.preprocessing.autoaugment.
shear_y
(image, level, replace)[source]¶ Equivalent of PIL Shearing in Y dimension.
-
easy_vision.python.core.preprocessing.autoaugment.
solarize_add
(image, addition=0, threshold=128)[source]¶
-
easy_vision.python.core.preprocessing.autoaugment.
translate_x
(image, pixels, replace)[source]¶ Equivalent of PIL Translate in X dimension.
-
easy_vision.python.core.preprocessing.autoaugment.
translate_y
(image, pixels, replace)[source]¶ Equivalent of PIL Translate in Y dimension.
-
easy_vision.python.core.preprocessing.autoaugment.
unwrap
(image, replace)[source]¶ Unwraps an image produced by wrap.
Where there is a 0 in the last channel for every spatial position, the rest of the three channels in that spatial dimension are grayed (set to 128). Operations like translate and shear on a wrapped Tensor will leave 0s in empty locations. Some transformations look at the intensity of values to do preprocessing, and we want these empty pixels to assume the ‘average’ value, rather than pure black.
Parameters: - image – A 3D Image Tensor with 4 channels.
- replace – A one or three value 1D tensor to fill empty pixels.
Returns: A 3D image Tensor with 3 channels.
Return type: image
easy_vision.python.core.preprocessing.cifarnet_preprocessing¶
Provides utilities to preprocess images in CIFAR-10.
-
easy_vision.python.core.preprocessing.cifarnet_preprocessing.
preprocess_for_eval
(image, output_height, output_width, add_image_summaries=True)[source]¶ Preprocesses the given image for evaluation.
Parameters: - image – A Tensor representing an image of arbitrary size.
- output_height – The height of the image after preprocessing.
- output_width – The width of the image after preprocessing.
- add_image_summaries – Enable image summaries.
Returns: A preprocessed image.
-
easy_vision.python.core.preprocessing.cifarnet_preprocessing.
preprocess_for_train
(image, output_height, output_width, padding=4, add_image_summaries=True)[source]¶ Preprocesses the given image for training.
- Note that the actual resizing scale is sampled from
- [resize_size_min, resize_size_max].
Parameters: - image – A Tensor representing an image of arbitrary size.
- output_height – The height of the image after preprocessing.
- output_width – The width of the image after preprocessing.
- padding – The amound of padding before and after each dimension of the image.
- add_image_summaries – Enable image summaries.
Returns: A preprocessed image.
-
easy_vision.python.core.preprocessing.cifarnet_preprocessing.
preprocess_image
(image, output_height, output_width, is_training=False, add_image_summaries=True)[source]¶ Preprocesses the given image.
Parameters: - image – A Tensor representing an image of arbitrary size.
- output_height – The height of the image after preprocessing.
- output_width – The width of the image after preprocessing.
- is_training – True if we’re preprocessing the image for training and False otherwise.
- add_image_summaries – Enable image summaries.
Returns: A preprocessed image.
easy_vision.python.core.preprocessing.classification_preprocess¶
-
easy_vision.python.core.preprocessing.classification_preprocess.
cifarnet_preprocessing
(image, output_height, output_width, is_training, add_image_summaries=False)[source]¶ Preprocesses the given image.
Parameters: - image – A Tensor representing an image of arbitrary size.
- output_height – The height of the image after preprocessing.
- output_width – The width of the image after preprocessing.
- is_training – True if we’re preprocessing the image for training and False otherwise.
- add_image_summaries – Enable image summaries.
Returns: A preprocessed image.
-
easy_vision.python.core.preprocessing.classification_preprocess.
classification_auto_augment
(image)[source]¶ Applies the AutoAugment policy to image.
AutoAugment is from the paper: https://arxiv.org/abs/1805.09501.
Parameters: image – Tensor of shape [height, width, 3] representing an image. Returns: The augmented version of image.
-
easy_vision.python.core.preprocessing.classification_preprocess.
classification_central_crop
(image, central_crop_fraction=0.875)[source]¶ Crop the central region of the image with an area fraction
-
easy_vision.python.core.preprocessing.classification_preprocess.
classification_random_augment
(image, num_layers=2, magnitude=10)[source]¶ Applies the RandAugment policy to image.
RandAugment is from the paper https://arxiv.org/abs/1909.13719,
Parameters: - image – Tensor of shape [height, width, 3] representing an image.
- num_layers – Integer, the number of augmentation transformations to apply sequentially to an image. Represented as (N) in the paper. Usually best values will be in the range [1, 3].
- magnitude – Integer, shared magnitude across all augmentation operations. Represented as (M) in the paper. Usually best values are in the range [5, 30].
Returns: The augmented version of image.
-
easy_vision.python.core.preprocessing.classification_preprocess.
classification_random_crop
(image, min_aspect_ratio=0.75, max_aspect_ratio=1.33, min_area=0.1, max_area=1.0)[source]¶ Randomly crops the image.
Given the input image this op randomly crops a subimage. Given a user-provided set of input constraints, the crop window is resampled until it satisfies these constraints. If within 100 trials it is unable to find a valid crop, the original image is returned.
Parameters: - min_aspect_ratio – allowed min range for aspect ratio of cropped image.
- max_aspect_ratio – allowed max range for aspect ratio of cropped image.
- min_area – allowed min area ratio between cropped image and the original image.
- max_area – allowed max area ratio between cropped image and the original image.
Returns: Image shape will be [new_height, new_width, channels].
Return type: image
-
easy_vision.python.core.preprocessing.classification_preprocess.
efficientnet_preprocessing
(image, model_name, output_height, output_width, is_training=False, augment_name='', randaug_num_layers=2, randaug_magnitude=10)[source]¶ Pre-process one image for training or evaluation.
Parameters: - image – 3-D Tensor [height, width, channels] with the image.
- model_name – efficient model name, if model_name is not empty, output_height and output_width will use default value for the model
- output_height – integer, image expected height.
- output_width – integer, image expected width.
- is_training – Boolean. If true it would transform an image for train, otherwise it would transform it for evaluation.
- augment_name – string that is the name of the augmentation method to apply to the image. autoaugment if AutoAugment is to be used or randaugment if RandAugment is to be used. If the value is empty no augmentation method will be applied applied. See autoaugment.py for more details.
- randaug_num_layers – ‘int’, if RandAug is used, what should the number of layers be. See autoaugment.py for detailed description.
- randaug_magnitude – ‘int’, if RandAug is used, what should the magnitude be. See autoaugment.py for detailed description.
-
easy_vision.python.core.preprocessing.classification_preprocess.
inception_preprocessing
(image, output_height, output_width, is_training=False, add_image_summaries=False, central_crop_fraction=None)[source]¶ Pre-process one image for training or evaluation.
Parameters: - image – 3-D Tensor [height, width, channels] with the image.
- height – integer, image expected height.
- width – integer, image expected width.
- is_training – Boolean. If true it would transform an image for train, otherwise it would transform it for evaluation.
- bbox – 3-D float Tensor of bounding boxes arranged [1, num_boxes, coords] where each coordinate is [0, 1) and the coordinates are arranged as [ymin, xmin, ymax, xmax].
- fast_mode – Optional boolean, if True avoids slower transformations.
- add_image_summaries – Enable image summaries.
- central_crop_fraction – Optional Float, fraction of the image to crop.
Returns: 3-D float Tensor containing an appropriately scaled image
Raises: ValueError
– if user does not provide bounding box
-
easy_vision.python.core.preprocessing.classification_preprocess.
lenet_preprocessing
(image, output_height, output_width, is_training)[source]¶ Preprocesses the given image.
Parameters: - image – A Tensor representing an image of arbitrary size.
- output_height – The height of the image after preprocessing.
- output_width – The width of the image after preprocessing.
- is_training – True if we’re preprocessing the image for training and False otherwise.
Returns: A preprocessed image.
-
easy_vision.python.core.preprocessing.classification_preprocess.
vgg_preprocessing
(image, output_height, output_width, is_training=False, resize_side_min=256, resize_side_max=512)[source]¶ Preprocesses the given image.
Parameters: - image – A Tensor representing an image of arbitrary size.
- output_height – The height of the image after preprocessing.
- output_width – The width of the image after preprocessing.
- is_training – True if we’re preprocessing the image for training and False otherwise.
- resize_side_min – The lower bound for the smallest side of the image for aspect-preserving resizing. If is_training is False, then this value is used for rescaling.
- resize_side_max –
The upper bound for the smallest side of the image for aspect-preserving resizing. If is_training is False, this value is ignored. Otherwise, the resize side is sampled from
[resize_size_min, resize_size_max].
Returns: A preprocessed image.
easy_vision.python.core.preprocessing.common_preprocess¶
Preprocess images and bounding boxes for detection.
We perform two sets of operations in preprocessing stage: (a) operations that are applied to both training and testing data, (b) operations that are applied only to training data for the purpose of
data augmentation.
A preprocessing function receives a set of inputs, e.g. an image and bounding boxes, performs an operation on them, and returns them. Some examples are: randomly cropping the image, randomly mirroring the image,
randomly changing the brightness, contrast, hue and randomly jittering the bounding boxes.
The preprocess function receives a tensor_dict which is a dictionary that maps different field names to their tensors. For example, tensor_dict[InputDataFields.image] holds the image tensor. The image is a rank 4 tensor: [1, height, width, channels] with dtype=tf.float32. The groundtruth_boxes is a rank 2 tensor: [N, 4] where in each row there is a box with [ymin xmin ymax xmax]. Boxes are in normalized coordinates meaning their coordinate values range in [0, 1]
To preprocess multiple images with the same operations in cases where nondeterministic operations are used, a preprocessor_cache.PreprocessorCache object can be passed into the preprocess function or individual operations. All nondeterministic operations except random_jitter_boxes support caching. E.g. Let tensor_dict{1,2,3,4,5} be copies of the same inputs. Let preprocess_options contain nondeterministic operation(s) excluding random_jitter_boxes.
cache1 = preprocessor_cache.PreprocessorCache() cache2 = preprocessor_cache.PreprocessorCache() a = preprocess(tensor_dict1, preprocess_options, preprocess_vars_cache=cache1) b = preprocess(tensor_dict2, preprocess_options, preprocess_vars_cache=cache1) c = preprocess(tensor_dict3, preprocess_options, preprocess_vars_cache=cache2) d = preprocess(tensor_dict4, preprocess_options, preprocess_vars_cache=cache2) e = preprocess(tensor_dict5, preprocess_options)
Then correspondings tensors of object pairs (a,b) and (c,d) are guaranteed to be equal element-wise, but the equality of any other object pair cannot be determined.
Important Note: In tensor_dict, images is a rank 4 tensor, but preprocessing functions receive a rank 3 tensor for processing the image. Thus, inside the preprocess function we squeeze the image to become a rank 3 tensor and then we pass it to the functions. At the end of the preprocess we expand the image back to rank 4.
-
easy_vision.python.core.preprocessing.common_preprocess.
image_to_float
(image)[source]¶ Used in Faster R-CNN. Casts image pixel values to float.
Parameters: image – input image which might be in tf.uint8 or sth else format Returns: image in tf.float32 format. Return type: image
-
easy_vision.python.core.preprocessing.common_preprocess.
letter_box_image
(image, boxes, aspect_ratio=1.0, pad_value=0.5)[source]¶ Make image letter boxing style.
Given the input image and its bounding boxes, this op padding the short edge of image to fit the target image aspect_ratio
Parameters: - image – rank 3 float32 tensor contains 1 image -> [height, width, channels] with pixel values varying between [0, 1].
- boxes – rank 2 float32 tensor containing the bounding boxes with shape [num_instances, 4]. Boxes are in normalized form meaning their coordinates vary between [0, 1]. Each row is in the form of [ymin, xmin, ymax, xmax].
- aspect_ratio – target image aspect ratio.
- pad_value – constant value to pad
Returns: Image shape will be [new_height, new_width, channels]. boxes: boxes which is the same rank as input boxes. Boxes are in normalized
form.
Return type: image
-
easy_vision.python.core.preprocessing.common_preprocess.
normalize_image
(image, original_minval, original_maxval, target_minval, target_maxval)[source]¶ Normalizes pixel values in the image.
Moves the pixel values from the current [original_minval, original_maxval] range to a the [target_minval, target_maxval] range.
Parameters: - image – rank 3 float32 tensor containing 1 image -> [height, width, channels].
- original_minval – current image minimum value.
- original_maxval – current image maximum value.
- target_minval – target image minimum value.
- target_maxval – target image maximum value.
Returns: image which is the same shape as input image.
Return type: image
-
easy_vision.python.core.preprocessing.common_preprocess.
one_hot_encoding
(labels, num_classes=None)[source]¶ One-hot encodes the multiclass labels.
- Example usage:
- labels = tf.constant([1, 4], dtype=tf.int32) one_hot = OneHotEncoding(labels, num_classes=5) one_hot.eval() # evaluates to [0, 1, 0, 0, 1]
Parameters: - labels – A tensor of shape [None] corresponding to the labels.
- num_classes – Number of classes in the dataset.
Returns: - a tensor of shape [num_classes] corresponding to the one hot
encoding of the labels.
Return type: onehot_labels
Raises: AssertionError
– if num_classes is not specified.
-
easy_vision.python.core.preprocessing.common_preprocess.
random_adjust_brightness
(image, max_delta=0.2, seed=None, preprocess_vars_cache=None)[source]¶ Randomly adjusts brightness.
Makes sure the output image is still between 0 and 255.
Parameters: - image – rank 3 float32 tensor contains 1 image -> [height, width, channels] with pixel values varying between [0, 255].
- max_delta – how much to change the brightness. A value between [0, 1).
- seed – random seed.
- preprocess_vars_cache – PreprocessorCache object that records previously performed augmentations. Updated in-place. If this function is called multiple times with the same non-null cache, it will perform deterministically.
Returns: image which is the same shape as input image. boxes: boxes which is the same shape as input boxes.
Return type: image
-
easy_vision.python.core.preprocessing.common_preprocess.
random_adjust_contrast
(image, min_delta=0.8, max_delta=1.25, seed=None, preprocess_vars_cache=None)[source]¶ Randomly adjusts contrast.
Makes sure the output image is still between 0 and 255.
Parameters: - image – rank 3 float32 tensor contains 1 image -> [height, width, channels] with pixel values varying between [0, 255].
- min_delta – see max_delta.
- max_delta – how much to change the contrast. Contrast will change with a value between min_delta and max_delta. This value will be multiplied to the current contrast of the image.
- seed – random seed.
- preprocess_vars_cache – PreprocessorCache object that records previously performed augmentations. Updated in-place. If this function is called multiple times with the same non-null cache, it will perform deterministically.
Returns: image which is the same shape as input image.
Return type: image
-
easy_vision.python.core.preprocessing.common_preprocess.
random_adjust_hue
(image, max_delta=0.02, seed=None, preprocess_vars_cache=None)[source]¶ Randomly adjusts hue.
Makes sure the output image is still between 0 and 255.
Parameters: - image – rank 3 float32 tensor contains 1 image -> [height, width, channels] with pixel values varying between [0, 255].
- max_delta – change hue randomly with a value between 0 and max_delta.
- seed – random seed.
- preprocess_vars_cache – PreprocessorCache object that records previously performed augmentations. Updated in-place. If this function is called multiple times with the same non-null cache, it will perform deterministically.
Returns: image which is the same shape as input image.
Return type: image
-
easy_vision.python.core.preprocessing.common_preprocess.
random_adjust_saturation
(image, min_delta=0.8, max_delta=1.25, seed=None, preprocess_vars_cache=None)[source]¶ Randomly adjusts saturation.
Makes sure the output image is still between 0 and 255.
Parameters: - image – rank 3 float32 tensor contains 1 image -> [height, width, channels] with pixel values varying between [0, 255].
- min_delta – see max_delta.
- max_delta – how much to change the saturation. Saturation will change with a value between min_delta and max_delta. This value will be multiplied to the current saturation of the image.
- seed – random seed.
- preprocess_vars_cache – PreprocessorCache object that records previously performed augmentations. Updated in-place. If this function is called multiple times with the same non-null cache, it will perform deterministically.
Returns: image which is the same shape as input image.
Return type: image
-
easy_vision.python.core.preprocessing.common_preprocess.
random_black_patches
(image, max_black_patches=10, probability=0.5, size_to_image_ratio=0.1, random_seed=None, preprocess_vars_cache=None)[source]¶ Randomly adds some black patches to the image.
This op adds up to max_black_patches square black patches of a fixed size to the image where size is specified via the size_to_image_ratio parameter.
Parameters: - image – rank 3 float32 tensor containing 1 image -> [height, width, channels] with pixel values varying between [0, 1].
- max_black_patches – number of times that the function tries to add a black box to the image.
- probability – at each try, what is the chance of adding a box.
- size_to_image_ratio –
Determines the ratio of the size of the black patches to the size of the image. box_size = size_to_image_ratio *
min(image_width, image_height) - random_seed – random seed.
- preprocess_vars_cache – PreprocessorCache object that records previously performed augmentations. Updated in-place. If this function is called multiple times with the same non-null cache, it will perform deterministically.
Returns: image
-
easy_vision.python.core.preprocessing.common_preprocess.
random_crop_image
(image, boxes, labels, label_scores=None, multiclass_scores=None, masks=None, keypoints=None, min_object_covered=1.0, aspect_ratio_range=(0.75, 1.33), area_range=(0.1, 1.0), overlap_thresh=0.3, random_coef=0.0, seed=None, preprocess_vars_cache=None)[source]¶ Randomly crops the image.
Given the input image and its bounding boxes, this op randomly crops a subimage. Given a user-provided set of input constraints, the crop window is resampled until it satisfies these constraints. If within 100 trials it is unable to find a valid crop, the original image is returned. See the Args section for a description of the input constraints. Both input boxes and returned Boxes are in normalized form (e.g., lie in the unit square [0, 1]). This function will return the original image with probability random_coef.
Note: boxes will be clipped to the crop. Keypoint coordinates that are outside the crop will be set to NaN, which is consistent with the original keypoint encoding for non-existing keypoints.
Parameters: - image – rank 3 float32 tensor contains 1 image -> [height, width, channels] with pixel values varying between [0, 1].
- boxes – rank 2 float32 tensor containing the bounding boxes with shape [num_instances, 4]. Boxes are in normalized form meaning their coordinates vary between [0, 1]. Each row is in the form of [ymin, xmin, ymax, xmax].
- labels – rank 1 int32 tensor containing the object classes.
- label_scores – (optional) float32 tensor of shape [num_instances]. representing the score for each box.
- multiclass_scores – (optional) float32 tensor of shape [num_instances, num_classes] representing the score for each box for each class.
- masks – (optional) rank 3 float32 tensor with shape [num_instances, height, width] containing instance masks. The masks are of the same height, width as the input image.
- keypoints – (optional) rank 3 float32 tensor with shape [num_instances, num_keypoints, 2]. The keypoints are in y-x normalized coordinates.
- min_object_covered – the cropped image must cover at least this fraction of at least one of the input bounding boxes.
- aspect_ratio_range – allowed range for aspect ratio of cropped image.
- area_range – allowed range for area ratio between cropped image and the original image.
- overlap_thresh – minimum overlap thresh with new cropped image to keep the box.
- random_coef – a random coefficient that defines the chance of getting the original image. If random_coef is 0, we will always get the cropped image, and if it is 1.0, we will always get the original image.
- seed – random seed.
- preprocess_vars_cache – PreprocessorCache object that records previously performed augmentations. Updated in-place. If this function is called multiple times with the same non-null cache, it will perform deterministically.
Returns: Image shape will be [new_height, new_width, channels]. boxes: boxes which is the same rank as input boxes. Boxes are in normalized
form.
labels: new labels.
If label_scores, multiclass_scores, masks, or keypoints is not None, the function also returns: label_scores: rank 1 float32 tensor with shape [num_instances]. multiclass_scores: rank 2 float32 tensor with shape
[num_instances, num_classes]
- masks: rank 3 float32 tensor with shape [num_instances, height, width]
containing instance masks.
- keypoints: rank 3 float32 tensor with shape
[num_instances, num_keypoints, 2]
Return type: image
-
easy_vision.python.core.preprocessing.common_preprocess.
random_crop_pad_image
(image, boxes, labels, label_scores=None, multiclass_scores=None, min_object_covered=1.0, aspect_ratio_range=(0.75, 1.33), area_range=(0.1, 1.0), overlap_thresh=0.3, random_coef=0.0, min_padded_size_ratio=(1.0, 1.0), max_padded_size_ratio=(2.0, 2.0), pad_color=None, seed=None, preprocess_vars_cache=None)[source]¶ Randomly crops and pads the image.
Given an input image and its bounding boxes, this op first randomly crops the image and then randomly pads the image with background values. Parameters min_padded_size_ratio and max_padded_size_ratio, determine the range of the final output image size. Specifically, the final image size will have a size in the range of min_padded_size_ratio * tf.shape(image) and max_padded_size_ratio * tf.shape(image). Note that these ratios are with respect to the size of the original image, so we can’t capture the same effect easily by independently applying RandomCropImage followed by RandomPadImage.
Parameters: - image – rank 3 float32 tensor containing 1 image -> [height, width, channels] with pixel values varying between [0, 1].
- boxes – rank 2 float32 tensor containing the bounding boxes -> [N, 4]. Boxes are in normalized form meaning their coordinates vary between [0, 1]. Each row is in the form of [ymin, xmin, ymax, xmax].
- labels – rank 1 int32 tensor containing the object classes.
- label_scores – rank 1 float32 containing the label scores.
- multiclass_scores – (optional) float32 tensor of shape [num_instances, num_classes] representing the score for each box for each class.
- min_object_covered – the cropped image must cover at least this fraction of at least one of the input bounding boxes.
- aspect_ratio_range – allowed range for aspect ratio of cropped image.
- area_range – allowed range for area ratio between cropped image and the original image.
- overlap_thresh – minimum overlap thresh with new cropped image to keep the box.
- random_coef – a random coefficient that defines the chance of getting the original image. If random_coef is 0, we will always get the cropped image, and if it is 1.0, we will always get the original image.
- min_padded_size_ratio – min ratio of padded image height and width to the input image’s height and width.
- max_padded_size_ratio – max ratio of padded image height and width to the input image’s height and width.
- pad_color – padding color. A rank 1 tensor of [3] with dtype=tf.float32. if set as None, it will be set to average color of the randomly cropped image.
- seed – random seed.
- preprocess_vars_cache – PreprocessorCache object that records previously performed augmentations. Updated in-place. If this function is called multiple times with the same non-null cache, it will perform deterministically.
Returns: padded image. padded_boxes: boxes which is the same rank as input boxes. Boxes are in
normalized form.
cropped_labels: cropped labels. if label_scores is not None also returns: cropped_label_scores: cropped label scores. if multiclass_scores is not None also returns: cropped_multiclass_scores: cropped_multiclass_scores.
Return type: padded_image
-
easy_vision.python.core.preprocessing.common_preprocess.
random_crop_to_aspect_ratio
(image, boxes, labels, label_scores=None, multiclass_scores=None, masks=None, keypoints=None, aspect_ratio=1.0, overlap_thresh=0.3, seed=None, preprocess_vars_cache=None)[source]¶ Randomly crops an image to the specified aspect ratio.
Randomly crops the a portion of the image such that the crop is of the specified aspect ratio, and the crop is as large as possible. If the specified aspect ratio is larger than the aspect ratio of the image, this op will randomly remove rows from the top and bottom of the image. If the specified aspect ratio is less than the aspect ratio of the image, this op will randomly remove cols from the left and right of the image. If the specified aspect ratio is the same as the aspect ratio of the image, this op will return the image.
Parameters: - image – rank 3 float32 tensor contains 1 image -> [height, width, channels] with pixel values varying between [0, 1].
- boxes – rank 2 float32 tensor containing the bounding boxes -> [N, 4]. Boxes are in normalized form meaning their coordinates vary between [0, 1]. Each row is in the form of [ymin, xmin, ymax, xmax].
- labels – rank 1 int32 tensor containing the object classes.
- label_scores – (optional) float32 tensor of shape [num_instances] representing the score for each box.
- multiclass_scores – (optional) float32 tensor of shape [num_instances, num_classes] representing the score for each box for each class.
- masks – (optional) rank 3 float32 tensor with shape [num_instances, height, width] containing instance masks. The masks are of the same height, width as the input image.
- keypoints – (optional) rank 3 float32 tensor with shape [num_instances, num_keypoints, 2]. The keypoints are in y-x normalized coordinates.
- aspect_ratio – the aspect ratio of cropped image.
- overlap_thresh – minimum overlap thresh with new cropped image to keep the box.
- seed – random seed.
- preprocess_vars_cache – PreprocessorCache object that records previously performed augmentations. Updated in-place. If this function is called multiple times with the same non-null cache, it will perform deterministically.
Returns: image which is the same rank as input image. boxes: boxes which is the same rank as input boxes.
Boxes are in normalized form.
labels: new labels.
If label_scores, masks, keypoints, or multiclass_scores is not None, the function also returns: label_scores: rank 1 float32 tensor with shape [num_instances]. masks: rank 3 float32 tensor with shape [num_instances, height, width]
containing instance masks.
- keypoints: rank 3 float32 tensor with shape
[num_instances, num_keypoints, 2]
- multiclass_scores: rank 2 float32 tensor with shape
[num_instances, num_classes]
Return type: image
Raises: AssertionError
– If image is not a 3D tensor.
-
easy_vision.python.core.preprocessing.common_preprocess.
random_distort_color
(image, color_ordering=0, fast_mode=False, preprocess_vars_cache=None)[source]¶ Randomly distorts color.
Randomly distorts color using a combination of brightness, hue, contrast and saturation changes. Makes sure the output image is still between 0 and 255.
Parameters: - image – rank 3 float32 tensor contains 1 image -> [height, width, channels] with pixel values varying between [0, 255].
- color_ordering – Python int, a type of distortion (valid values: 0, 1).
- preprocess_vars_cache – PreprocessorCache object that records previously performed augmentations. Updated in-place. If this function is called multiple times with the same non-null cache, it will perform deterministically.
Returns: image which is the same shape as input image.
Return type: image
Raises: AssertionError
– if color_ordering is not in {0, 1}.
-
easy_vision.python.core.preprocessing.common_preprocess.
random_horizontal_flip
(image, boxes=None, masks=None, keypoints=None, flow_clip=None, keypoint_flip_permutation=None, seed=None, preprocess_vars_cache=None)[source]¶ Randomly flips the image and detections horizontally.
The probability of flipping the image is 50%.
Parameters: - image – rank 3 float32 tensor with shape [height, width, channels].
- boxes – (optional) rank 2 float32 tensor with shape [N, 4] containing the bounding boxes. Boxes are in normalized form meaning their coordinates vary between [0, 1]. Each row is in the form of [ymin, xmin, ymax, xmax].
- masks – (optional) rank 3 float32 tensor with shape [num_instances, height, width] containing instance masks. The masks are of the same height, width as the input image.
- keypoints – (optional) rank 3 float32 tensor with shape [num_instances, num_keypoints, 2]. The keypoints are in y-x normalized coordinates.
- keypoint_flip_permutation – rank 1 int32 tensor containing the keypoint flip permutation.
- seed – random seed
- preprocess_vars_cache – PreprocessorCache object that records previously performed augmentations. Updated in-place. If this function is called multiple times with the same non-null cache, it will perform deterministically.
Returns: image which is the same shape as input image.
If boxes, masks, keypoints, and keypoint_flip_permutation are not None, the function also returns the following tensors.
- boxes: rank 2 float32 tensor containing the bounding boxes -> [N, 4].
Boxes are in normalized form meaning their coordinates vary between [0, 1].
- masks: rank 3 float32 tensor with shape [num_instances, height, width]
containing instance masks.
keypoints: rank 3 float32 tensor with shape [num_instances, num_keypoints, 2]
Return type: image
Raises: AssertionError
– if keypoints are provided but keypoint_flip_permutation is not.
-
easy_vision.python.core.preprocessing.common_preprocess.
random_image_scale
(image, masks=None, min_scale_ratio=0.5, max_scale_ratio=2.0, seed=None, preprocess_vars_cache=None)[source]¶ Scales the image size.
Parameters: - image – rank 3 float32 tensor contains 1 image -> [height, width, channels].
- masks – (optional) rank 3 float32 tensor containing masks with size [height, width, num_masks]. The value is set to None if there are no masks.
- min_scale_ratio – minimum scaling ratio.
- max_scale_ratio – maximum scaling ratio.
- seed – random seed.
- preprocess_vars_cache – PreprocessorCache object that records previously performed augmentations. Updated in-place. If this function is called multiple times with the same non-null cache, it will perform deterministically.
Returns: image which is the same rank as input image. masks: If masks is not none, resized masks which are the same rank as input
masks will be returned.
Return type: image
-
easy_vision.python.core.preprocessing.common_preprocess.
random_jitter_aspect_ratio
(image, masks=None, seed=None, preprocess_vars_cache=None, min_jitter_coef=0.8, max_jitter_coef=1.2, method=0, align_corners=False)[source]¶ Resizes images to the given height and width.
Parameters: - image – A 3D tensor of shape [height, width, channels]
- masks – (optional) rank 3 float32 tensor with shape [num_instances, height, width] containing instance masks.
- seed – random seed
- preprocess_vars_cache – PreprocessorCache object that records previously performed augmentations. Updated in-place. If this function is called multiple times with the same non-null cache, it will perform deterministically.
- min_jitter_coef – min image_width/image_height jitter ratio
- max_jitter_coef – max image_width/image_height jitter ratio
- method – (optional) interpolation method used in resizing. Defaults to BILINEAR.
- align_corners – bool. If true, exactly align all 4 corners of the input and output. Defaults to False.
Returns: A tensor of size [new_height, new_width, channels]. masks: If masks is not None, also outputs masks. A 3D tensor of
shape [num_instances, new_height, new_width]
Return type: image
-
easy_vision.python.core.preprocessing.common_preprocess.
random_jitter_boxes
(boxes, ratio=0.05, seed=None)[source]¶ Randomly jitter boxes in image.
Parameters: - boxes – rank 2 float32 tensor containing the bounding boxes -> [N, 4]. Boxes are in normalized form meaning their coordinates vary between [0, 1]. Each row is in the form of [ymin, xmin, ymax, xmax].
- ratio – The ratio of the box width and height that the corners can jitter. For example if the width is 100 pixels and ratio is 0.05, the corners can jitter up to 5 pixels in the x direction.
- seed – random seed.
Returns: boxes which is the same shape as input boxes.
Return type: boxes
-
easy_vision.python.core.preprocessing.common_preprocess.
random_pad_image
(image, boxes, min_ratio=None, max_ratio=None, pad_color=None, seed=None, preprocess_vars_cache=None)[source]¶ Randomly pads the image.
This function randomly pads the image with zeros. The final size of the padded image will be between min_image_size and max_image_size. if min_image_size is smaller than the input image size, min_image_size will be set to the input image size. The same for max_image_size. The input image will be located at a uniformly random location inside the padded image. The relative location of the boxes to the original image will remain the same.
Parameters: - image – rank 3 float32 tensor containing 1 image -> [height, width, channels] with pixel values varying between [0, 1].
- boxes – rank 2 float32 tensor containing the bounding boxes -> [N, 4]. Boxes are in normalized form meaning their coordinates vary between [0, 1]. Each row is in the form of [ymin, xmin, ymax, xmax].
- min_ratio – a tensor of size [min_height_ratio, min_width_ratio], type tf.float32. If passed as None, will be set to image size [1.0, 1.0].
- max_ratio – a tensor of size [max_height_ratio, max_width_ratio], type tf.float32. If passed as None, will be set to twice the image [2, 2].
- pad_color – padding color. A rank 1 tensor of [3] with dtype=tf.float32. if set as None, it will be set to average color of the input image.
- seed – random seed.
- preprocess_vars_cache – PreprocessorCache object that records previously performed augmentations. Updated in-place. If this function is called multiple times with the same non-null cache, it will perform deterministically.
Returns: Image shape will be [new_height, new_width, channels]. boxes: boxes which is the same rank as input boxes. Boxes are in normalized
form.
Return type: image
-
easy_vision.python.core.preprocessing.common_preprocess.
random_pad_to_aspect_ratio
(image, boxes, masks=None, keypoints=None, aspect_ratio=1.0, min_padded_size_ratio=(1.0, 1.0), max_padded_size_ratio=(2.0, 2.0), seed=None, preprocess_vars_cache=None)[source]¶ Randomly zero pads an image to the specified aspect ratio.
Pads the image so that the resulting image will have the specified aspect ratio without scaling less than the min_padded_size_ratio or more than the max_padded_size_ratio. If the min_padded_size_ratio or max_padded_size_ratio is lower than what is possible to maintain the aspect ratio, then this method will use the least padding to achieve the specified aspect ratio.
Parameters: - image – rank 3 float32 tensor contains 1 image -> [height, width, channels] with pixel values varying between [0, 1].
- boxes – rank 2 float32 tensor containing the bounding boxes -> [N, 4]. Boxes are in normalized form meaning their coordinates vary between [0, 1]. Each row is in the form of [ymin, xmin, ymax, xmax].
- masks – (optional) rank 3 float32 tensor with shape [num_instances, height, width] containing instance masks. The masks are of the same height, width as the input image.
- keypoints – (optional) rank 3 float32 tensor with shape [num_instances, num_keypoints, 2]. The keypoints are in y-x normalized coordinates.
- aspect_ratio – aspect ratio of the final image.
- min_padded_size_ratio – min ratio of padded image height and width to the input image’s height and width.
- max_padded_size_ratio – max ratio of padded image height and width to the input image’s height and width.
- seed – random seed.
- preprocess_vars_cache – PreprocessorCache object that records previously performed augmentations. Updated in-place. If this function is called multiple times with the same non-null cache, it will perform deterministically.
Returns: image which is the same rank as input image. boxes: boxes which is the same rank as input boxes.
Boxes are in normalized form.
labels: new labels.
If masks, or keypoints is not None, the function also returns: masks: rank 3 float32 tensor with shape [num_instances, height, width]
containing instance masks.
- keypoints: rank 3 float32 tensor with shape
[num_instances, num_keypoints, 2]
Return type: image
Raises: AssertionError
– If image is not a 3D tensor.
-
easy_vision.python.core.preprocessing.common_preprocess.
random_pixel_value_scale
(image, minval=0.9, maxval=1.1, seed=None, preprocess_vars_cache=None)[source]¶ Scales each value in the pixels of the image.
This function scales each pixel independent of the other ones. For each value in image tensor, draws a random number between minval and maxval and multiples the values with them.Parameters: - image – rank 3 float32 tensor contains 1 image -> [height, width, channels] with pixel values varying between [0, 255].
- minval – lower ratio of scaling pixel values.
- maxval – upper ratio of scaling pixel values.
- seed – random seed.
- preprocess_vars_cache – PreprocessorCache object that records previously performed augmentations. Updated in-place. If this function is called multiple times with the same non-null cache, it will perform deterministically.
Returns: image which is the same shape as input image.
Return type: image
-
easy_vision.python.core.preprocessing.common_preprocess.
random_resize_image
(image, masks=None, seed=None, preprocess_vars_cache=None, new_heights=(600, ), new_widths=(1024, ), method=0, align_corners=False)[source]¶ Resizes images to the given height and width.
Parameters: - image – A 3D tensor of shape [height, width, channels]
- masks – (optional) rank 3 float32 tensor with shape [num_instances, height, width] containing instance masks.
- seed – random seed
- preprocess_vars_cache – PreprocessorCache object that records previously performed augmentations. Updated in-place. If this function is called multiple times with the same non-null cache, it will perform deterministically.
- new_heights – (optional) (tuple or list) desired heights of the image.
- new_widths – (optional) (tuple or list) desired widths of the image.
- method – (optional) interpolation method used in resizing. Defaults to BILINEAR.
- align_corners – bool. If true, exactly align all 4 corners of the input and output. Defaults to False.
Returns: Note that the position of the resized_image_shape changes based on whether masks are present. resized_image: A tensor of size [new_height, new_width, channels]. resized_masks: If masks is not None, also outputs masks. A 3D tensor of
shape [num_instances, new_height, new_width]
- resized_image_shape: A 1D tensor of shape [3] containing the shape of the
resized image.
-
easy_vision.python.core.preprocessing.common_preprocess.
random_resize_method
(image, target_size, preprocess_vars_cache=None)[source]¶ Uses a random resize method to resize the image to target size.
Parameters: - image – a rank 3 tensor.
- target_size – a list of [target_height, target_width]
- preprocess_vars_cache – PreprocessorCache object that records previously performed augmentations. Updated in-place. If this function is called multiple times with the same non-null cache, it will perform deterministically.
Returns: resized image.
-
easy_vision.python.core.preprocessing.common_preprocess.
random_resize_to_range
(image, masks=None, seed=None, preprocess_vars_cache=None, min_sizes=[], max_sizes=[], method=0, align_corners=False)[source]¶ - Randomly resize the image to one of the scales specified by min_sizes/max_sizes.
- Each of the scales is selected with equal probability
Parameters: - image – rank 3 float32 tensor with shape [height, width, channels].
- masks – (optional) rank 3 float32 tensor with shape [num_instances, height, width] containing instance masks. The masks are of the same height, width as the input image.
- seed – random seed
- preprocess_vars_cache – PreprocessorCache object that records previously performed augmentations. Updated in-place. If this function is called multiple times with the same non-null cache, it will perform deterministically.
- min_sizes – min lengths of the short edge of the image
- max_sizes – max lengths of the long edge of the image
- method – image resize method
- align_corners – whether to align corner, passed tf.image.resize
Returns: image resized. If masks is not None, the function also returns: masks: rank 3 float32 tensor with shape [num_instances, height, width]
containing instance masks.
Return type: image
-
easy_vision.python.core.preprocessing.common_preprocess.
random_rgb_to_gray
(image, probability=0.1, seed=None, preprocess_vars_cache=None)[source]¶ Changes the image from RGB to Grayscale with the given probability.
Parameters: - image – rank 3 float32 tensor contains 1 image -> [height, width, channels] with pixel values varying between [0, 255].
- probability – the probability of returning a grayscale image. The probability should be a number between [0, 1].
- seed – random seed.
- preprocess_vars_cache – PreprocessorCache object that records previously performed augmentations. Updated in-place. If this function is called multiple times with the same non-null cache, it will perform deterministically.
Returns: image which is the same shape as input image.
Return type: image
-
easy_vision.python.core.preprocessing.common_preprocess.
random_rotation
(image, boxes=None, masks=None, keypoints=None, seed=None, preprocess_vars_cache=None, min_angle=-10, max_angle=10, use_keypoints_calc_boxes=False)[source]¶ Randomly rotates the image and detections (min_angle-max_angle) degrees counter-clockwise.
Parameters: - image – rank 3 float32 tensor with shape [height, width, channels].
- boxes – (optional) rank 2 float32 tensor with shape [N, 4] containing the bounding boxes. Boxes are in normalized form meaning their coordinates vary between [0, 1]. Each row is in the form of [ymin, xmin, ymax, xmax].
- masks – (optional) rank 3 float32 tensor with shape [num_instances, height, width] containing instance masks. The masks are of the same height, width as the input image.
- keypoints – (optional) rank 3 float32 tensor with shape [num_instances, num_keypoints, 2]. The keypoints are in y-x normalized coordinates.
- seed – random seed
- preprocess_vars_cache – PreprocessorCache object that records previously performed augmentations. Updated in-place. If this function is called multiple times with the same non-null cache, it will perform deterministically.
- min_angle – min angle of rotation range
- max_angle – max angle of rotation range
- use_keypoints_calc_boxes – if True, boxes will be bounding box of keypoints
Returns: image which is the same shape as input image.
If boxes, masks, and keypoints, are not None, the function also returns the following tensors.
- boxes: rank 2 float32 tensor containing the bounding boxes -> [N, 4].
Boxes are in normalized form meaning their coordinates vary between [0, 1].
- masks: rank 3 float32 tensor with shape [num_instances, height, width]
containing instance masks.
- keypoints: rank 3 float32 tensor with shape
[num_instances, num_keypoints, 2]
Return type: image
-
easy_vision.python.core.preprocessing.common_preprocess.
random_rotation90
(image, boxes=None, masks=None, keypoints=None, texts_direction=None, seed=None, preprocess_vars_cache=None)[source]¶ Randomly rotates the image and detections 90 degrees counter-clockwise.
The probability of rotating the image is 50%. This can be combined with random_horizontal_flip and random_vertical_flip to produce an output with a uniform distribution of the eight possible 90 degree rotation / reflection combinations.
Parameters: - image – rank 3 float32 tensor with shape [height, width, channels].
- boxes – (optional) rank 2 float32 tensor with shape [N, 4] containing the bounding boxes. Boxes are in normalized form meaning their coordinates vary between [0, 1]. Each row is in the form of [ymin, xmin, ymax, xmax].
- masks – (optional) rank 3 float32 tensor with shape [num_instances, height, width] containing instance masks. The masks are of the same height, width as the input image.
- keypoints – (optional) rank 3 float32 tensor with shape [num_instances, num_keypoints, 2]. The keypoints are in y-x normalized coordinates.
- texts_direction – (optional) rank 2 int32 tensor with shape [num_instances, 1]
- seed – random seed
- preprocess_vars_cache – PreprocessorCache object that records previously performed augmentations. Updated in-place. If this function is called multiple times with the same non-null cache, it will perform deterministically.
Returns: image which is the same shape as input image.
If boxes, masks, and keypoints, are not None, the function also returns the following tensors.
- boxes: rank 2 float32 tensor containing the bounding boxes -> [N, 4].
Boxes are in normalized form meaning their coordinates vary between [0, 1].
- masks: rank 3 float32 tensor with shape [num_instances, height, width]
containing instance masks.
- keypoints: rank 3 float32 tensor with shape
[num_instances, num_keypoints, 2]
Return type: image
-
easy_vision.python.core.preprocessing.common_preprocess.
random_vertical_flip
(image, boxes=None, masks=None, keypoints=None, keypoint_flip_permutation=None, seed=None, preprocess_vars_cache=None)[source]¶ Randomly flips the image and detections vertically.
The probability of flipping the image is 50%.
Parameters: - image – rank 3 float32 tensor with shape [height, width, channels].
- boxes – (optional) rank 2 float32 tensor with shape [N, 4] containing the bounding boxes. Boxes are in normalized form meaning their coordinates vary between [0, 1]. Each row is in the form of [ymin, xmin, ymax, xmax].
- masks – (optional) rank 3 float32 tensor with shape [num_instances, height, width] containing instance masks. The masks are of the same height, width as the input image.
- keypoints – (optional) rank 3 float32 tensor with shape [num_instances, num_keypoints, 2]. The keypoints are in y-x normalized coordinates.
- keypoint_flip_permutation – rank 1 int32 tensor containing the keypoint flip permutation.
- seed – random seed
- preprocess_vars_cache – PreprocessorCache object that records previously performed augmentations. Updated in-place. If this function is called multiple times with the same non-null cache, it will perform deterministically.
Returns: image which is the same shape as input image.
If boxes, masks, keypoints, and keypoint_flip_permutation are not None, the function also returns the following tensors.
- boxes: rank 2 float32 tensor containing the bounding boxes -> [N, 4].
Boxes are in normalized form meaning their coordinates vary between [0, 1].
- masks: rank 3 float32 tensor with shape [num_instances, height, width]
containing instance masks.
- keypoints: rank 3 float32 tensor with shape
[num_instances, num_keypoints, 2]
Return type: image
Raises: AssertionError
– if keypoints are provided but keypoint_flip_permutation is not.
-
easy_vision.python.core.preprocessing.common_preprocess.
resize_image
(image, masks=None, new_height=600, new_width=1024, method=0, align_corners=False)[source]¶ Resizes images to the given height and width.
Parameters: - image – A 3D tensor of shape [height, width, channels]
- masks – (optional) rank 3 float32 tensor with shape [num_instances, height, width] containing instance masks.
- new_height – (optional) (scalar) desired height of the image.
- new_width – (optional) (scalar) desired width of the image.
- method – (optional) interpolation method used in resizing. Defaults to BILINEAR.
- align_corners – bool. If true, exactly align all 4 corners of the input and output. Defaults to False.
Returns: Note that the position of the resized_image_shape changes based on whether masks are present. resized_image: A tensor of size [new_height, new_width, channels]. resized_masks: If masks is not None, also outputs masks. A 3D tensor of
shape [num_instances, new_height, new_width]
- resized_image_shape: A 1D tensor of shape [3] containing the shape of the
resized image.
-
easy_vision.python.core.preprocessing.common_preprocess.
resize_image_with_fixed_height
(image, new_height=32, method=0, align_corners=False)[source]¶ Resizes images to the given height and keep ratio.
Parameters: - image – A 3D tensor of shape [height, width, channels]
- new_height – (optional) (scalar) desired height of the image.
- method – (optional) interpolation method used in resizing. Defaults to BILINEAR.
- align_corners – bool. If true, exactly align all 4 corners of the input and output. Defaults to False.
Returns: A tensor of size [new_height, new_width, channels]. resized_image_shape: A 1D tensor of shape [3] containing the shape of the
resized image.
Return type: resized_image
-
easy_vision.python.core.preprocessing.common_preprocess.
resize_to_min_dimension
(image, masks=None, min_dimension=600)[source]¶ Resizes image and masks given the min size maintaining the aspect ratio.
If one of the image dimensions is smaller that min_dimension, it will scale the image such that its smallest dimension is equal to min_dimension. Otherwise, will keep the image size as is.
Parameters: - image – a tensor of size [height, width, channels].
- masks – (optional) a tensors of size [num_instances, height, width].
- min_dimension – minimum image dimension.
Returns: Note that the position of the resized_image_shape changes based on whether masks are present. resized_image: A tensor of size [new_height, new_width, channels]. resized_masks: If masks is not None, also outputs masks. A 3D tensor of
shape [num_instances, new_height, new_width]
- resized_image_shape: A 1D tensor of shape [3] containing the shape of the
resized image.
Raises: AssertionError
– if the image is not a 3D tensor.
-
easy_vision.python.core.preprocessing.common_preprocess.
resize_to_range
(image, masks=None, min_dimension=None, max_dimension=None, method=0, align_corners=False, pad_to_max_dimension=False)[source]¶ Resizes an image so its dimensions are within the provided value.
The output size can be described by two cases: 1. If the image can be rescaled so its minimum dimension is equal to the
provided value without the other dimension exceeding max_dimension, then do so.- Otherwise, resize so the largest dimension is equal to max_dimension.
Parameters: - image – A 3D tensor of shape [height, width, channels]
- masks – (optional) rank 3 float32 tensor with shape [num_instances, height, width] containing instance masks.
- min_dimension – (optional) (scalar) desired size of the smaller image dimension.
- max_dimension – (optional) (scalar) maximum allowed size of the larger image dimension.
- method – (optional) interpolation method used in resizing. Defaults to BILINEAR.
- align_corners – bool. If true, exactly align all 4 corners of the input and output. Defaults to False.
- pad_to_max_dimension – Whether to resize the image and pad it with zeros so the resulting image is of the spatial size [max_dimension, max_dimension]. If masks are included they are padded similarly.
Returns: Note that the position of the resized_image_shape changes based on whether masks are present. resized_image: A 3D tensor of shape [new_height, new_width, channels],
where the image has been resized (with bilinear interpolation) so that min(new_height, new_width) == min_dimension or max(new_height, new_width) == max_dimension.
- resized_masks: If masks is not None, also outputs masks. A 3D tensor of
shape [num_instances, new_height, new_width].
- resized_image_shape: A 1D tensor of shape [3] containing shape of the
resized image.
Raises: AssertionError
– if the image is not a 3D tensor.
-
easy_vision.python.core.preprocessing.common_preprocess.
retain_boxes_above_threshold
(boxes, labels, label_scores, multiclass_scores=None, masks=None, keypoints=None, threshold=0.0)[source]¶ Retains boxes whose label score is above a given threshold.
If the label score for a box is missing (represented by NaN), the box is retained. The boxes that don’t pass the threshold will not appear in the returned tensor.
Parameters: - boxes – float32 tensor of shape [num_instance, 4] representing boxes location in normalized coordinates.
- labels – rank 1 int32 tensor of shape [num_instance] containing the object classes.
- label_scores – float32 tensor of shape [num_instance] representing the score for each box.
- multiclass_scores – (optional) float32 tensor of shape [num_instances, num_classes] representing the score for each box for each class.
- masks – (optional) rank 3 float32 tensor with shape [num_instances, height, width] containing instance masks. The masks are of the same height, width as the input image.
- keypoints – (optional) rank 3 float32 tensor with shape [num_instances, num_keypoints, 2]. The keypoints are in y-x normalized coordinates.
- threshold – scalar python float.
Returns: [num_retained_instance, 4] retianed_labels: [num_retained_instance] retained_label_scores: [num_retained_instance]
- If multiclass_scores, masks, or keypoints are not None, the function also
returns:
retained_multiclass_scores: [num_retained_instance, num_classes] retained_masks: [num_retained_instance, height, width] retained_keypoints: [num_retained_instance, num_keypoints, 2]
Return type: retained_boxes
-
easy_vision.python.core.preprocessing.common_preprocess.
rgb_to_gray
(image)[source]¶ Converts a 3 channel RGB image to a 1 channel grayscale image.
Parameters: image – Rank 3 float32 tensor containing 1 image -> [height, width, 3] with pixel values varying between [0, 1]. Returns: A single channel grayscale image -> [image, height, 1]. Return type: image
-
easy_vision.python.core.preprocessing.common_preprocess.
scale_boxes_to_pixel_coordinates
(image, boxes, keypoints=None)[source]¶ Scales boxes from normalized to pixel coordinates.
Parameters: - image – A 3D float32 tensor of shape [height, width, channels].
- boxes – A 2D float32 tensor of shape [num_boxes, 4] containing the bounding boxes in normalized coordinates. Each row is of the form [ymin, xmin, ymax, xmax].
- keypoints – (optional) rank 3 float32 tensor with shape [num_instances, num_keypoints, 2]. The keypoints are in y-x normalized coordinates.
Returns: unchanged input image. scaled_boxes: a 2D float32 tensor of shape [num_boxes, 4] containing the
bounding boxes in pixel coordinates.
- scaled_keypoints: a 3D float32 tensor with shape
[num_instances, num_keypoints, 2] containing the keypoints in pixel coordinates.
Return type: image
-
easy_vision.python.core.preprocessing.common_preprocess.
subtract_channel_mean
(image, means=None)[source]¶ Normalizes an image by subtracting a mean from each channel.
Parameters: - image – A 3D tensor of shape [height, width, channels]
- means – float list containing a mean for each channel
Returns: a tensor of shape [height, width, channels]
Return type: normalized_images
Raises: AssertionError
– if images is not a 4D tensor or if the number of means is not equal to the number of channels.
easy_vision.python.core.preprocessing.deeplab_preprocess¶
-
easy_vision.python.core.preprocessing.deeplab_preprocess.
deeplab_random_crop
(image, mask, crop_size)[source]¶
easy_vision.python.core.preprocessing.efficientnet_preprocessing¶
efficientnet preprocessing.
-
easy_vision.python.core.preprocessing.efficientnet_preprocessing.
preprocess_for_eval
(image, use_bfloat16, height=224, width=224)[source]¶ Preprocesses the given image for evaluation.
Parameters: - image – Tensor representing an image binary of arbitrary size.
- use_bfloat16 – bool for whether to use bfloat16. height: integer, image expected height.
- width – integer, image expected width.
Returns: A preprocessed image Tensor.
-
easy_vision.python.core.preprocessing.efficientnet_preprocessing.
preprocess_for_train
(image, use_bfloat16, height=224, width=224, augment_name=None, randaug_num_layers=None, randaug_magnitude=None)[source]¶ Preprocesses the given image for evaluation.
Parameters: - image – Tensor representing an image binary of arbitrary size.
- use_bfloat16 – bool for whether to use bfloat16.
- height – integer, image expected height.
- width – integer, image expected width.
- augment_name – string that is the name of the augmentation method to apply to the image. autoaugment if AutoAugment is to be used or randaugment if RandAugment is to be used. If the value is None no augmentation method will be applied applied. See autoaugment.py for more details.
- randaug_num_layers – ‘int’, if RandAug is used, what should the number of layers be. See autoaugment.py for detailed description.
- randaug_magnitude – ‘int’, if RandAug is used, what should the magnitude be. See autoaugment.py for detailed description.
Returns: A preprocessed image Tensor.
-
easy_vision.python.core.preprocessing.efficientnet_preprocessing.
preprocess_image
(image, is_training=False, use_bfloat16=False, model_name='', height=224, width=224, augment_name=None, randaug_num_layers=None, randaug_magnitude=None)[source]¶ Preprocesses the given image.
Parameters: - image – Tensor representing an image binary of arbitrary size.
- is_training – bool for whether the preprocessing is for training.
- use_bfloat16 – bool for whether to use bfloat16.
- model_name – efficient model name
- height – integer, image expected height.
- width – integer, image expected width.
- augment_name – string that is the name of the augmentation method to apply to the image. autoaugment if AutoAugment is to be used or randaugment if RandAugment is to be used. If the value is None no augmentation method will be applied applied. See autoaugment.py for more details.
- randaug_num_layers – ‘int’, if RandAug is used, what should the number of layers be. See autoaugment.py for detailed description.
- randaug_magnitude – ‘int’, if RandAug is used, what should the magnitude be. See autoaugment.py for detailed description.
Returns: A preprocessed image Tensor with value range of [0, 255].
easy_vision.python.core.preprocessing.inception_preprocessing¶
Provides utilities to preprocess images for the Inception networks.
-
easy_vision.python.core.preprocessing.inception_preprocessing.
apply_with_random_selector
(x, func, num_cases)[source]¶ Computes func(x, sel), with sel sampled from [0…num_cases-1].
Parameters: - x – input Tensor.
- func – Python function to apply.
- num_cases – Python int32, number of cases to sample sel from.
Returns: The result of func(x, sel), where func receives the value of the selector as a python integer, but sel is sampled dynamically.
-
easy_vision.python.core.preprocessing.inception_preprocessing.
distort_color
(image, color_ordering=0, fast_mode=True, scope=None)[source]¶ Distort the color of a Tensor image.
Each color distortion is non-commutative and thus ordering of the color ops matters. Ideally we would randomly permute the ordering of the color ops. Rather then adding that level of complication, we select a distinct ordering of color ops for each preprocessing thread.
Parameters: - image – 3-D Tensor containing single image in [0, 1].
- color_ordering – Python int, a type of distortion (valid values: 0-3).
- fast_mode – Avoids slower ops (random_hue and random_contrast)
- scope – Optional scope for name_scope.
Returns: 3-D Tensor color-distorted image on range [0, 1]
Raises: ValueError
– if color_ordering not in [0, 3]
-
easy_vision.python.core.preprocessing.inception_preprocessing.
distorted_bounding_box_crop
(image, bbox, min_object_covered=0.1, aspect_ratio_range=(0.75, 1.33), area_range=(0.05, 1.0), max_attempts=100, scope=None)[source]¶ Generates cropped_image using a one of the bboxes randomly distorted.
See tf.image.sample_distorted_bounding_box for more documentation.
Parameters: - image – 3-D Tensor of image (it will be converted to floats in [0, 1]).
- bbox – 3-D float Tensor of bounding boxes arranged [1, num_boxes, coords] where each coordinate is [0, 1) and the coordinates are arranged as [ymin, xmin, ymax, xmax]. If num_boxes is 0 then it would use the whole image.
- min_object_covered – An optional float. Defaults to 0.1. The cropped area of the image must contain at least this fraction of any bounding box supplied.
- aspect_ratio_range – An optional list of floats. The cropped area of the image must have an aspect ratio = width / height within this range.
- area_range – An optional list of floats. The cropped area of the image must contain a fraction of the supplied image within in this range.
- max_attempts – An optional int. Number of attempts at generating a cropped region of the image of the specified constraints. After max_attempts failures, return the entire image.
- scope – Optional scope for name_scope.
Returns: A tuple, a 3-D Tensor cropped_image and the distorted bbox
-
easy_vision.python.core.preprocessing.inception_preprocessing.
preprocess_for_eval
(image, height, width, central_fraction=0.875, scope=None)[source]¶ Prepare one image for evaluation.
If height and width are specified it would output an image with that size by applying resize_bilinear.
If central_fraction is specified it would crop the central fraction of the input image.
Parameters: - image – 3-D Tensor of image. If dtype is tf.float32 then the range should be [0, 1], otherwise it would converted to tf.float32 assuming that the range is [0, MAX], where MAX is largest positive representable number for int(8/16/32) data type (see tf.image.convert_image_dtype for details).
- height – integer
- width – integer
- central_fraction – Optional Float, fraction of the image to crop.
- scope – Optional scope for name_scope.
Returns: 3-D float Tensor of prepared image.
-
easy_vision.python.core.preprocessing.inception_preprocessing.
preprocess_for_train
(image, height, width, bbox, fast_mode=True, scope=None, add_image_summaries=True)[source]¶ Distort one image for training a network.
Distorting images provides a useful technique for augmenting the data set during training in order to make the network invariant to aspects of the image that do not effect the label.
Additionally it would create image_summaries to display the different transformations applied to the image.
Parameters: - image – 3-D Tensor of image. pixel values must between [0,255].
- height – integer
- width – integer
- bbox – 3-D float Tensor of bounding boxes arranged [1, num_boxes, coords] where each coordinate is [0, 1) and the coordinates are arranged as [ymin, xmin, ymax, xmax].
- fast_mode – Optional boolean, if True avoids slower transformations (i.e. bi-cubic resizing, random_hue or random_contrast).
- scope – Optional scope for name_scope.
- add_image_summaries – Enable image summaries.
Returns: 3-D float Tensor of distorted image used for training with range [-1, 1].
-
easy_vision.python.core.preprocessing.inception_preprocessing.
preprocess_image
(image, height, width, is_training=False, bbox=None, fast_mode=True, add_image_summaries=True, central_crop_fraction=None)[source]¶ Pre-process one image for training or evaluation.
Parameters: - image – 3-D Tensor [height, width, channels] with the image. If dtype is tf.float32 then the range should be [0, 1], otherwise it would converted to tf.float32 assuming that the range is [0, MAX], where MAX is largest positive representable number for int(8/16/32) data type (see tf.image.convert_image_dtype for details).
- height – integer, image expected height.
- width – integer, image expected width.
- is_training – Boolean. If true it would transform an image for train, otherwise it would transform it for evaluation.
- bbox – 3-D float Tensor of bounding boxes arranged [1, num_boxes, coords] where each coordinate is [0, 1) and the coordinates are arranged as [ymin, xmin, ymax, xmax].
- fast_mode – Optional boolean, if True avoids slower transformations.
- add_image_summaries – Enable image summaries.
- central_crop_fraction – Optional Float, fraction of the image to crop.
Returns: 3-D float Tensor containing an appropriately scaled image
Raises: ValueError
– if user does not provide bounding box
easy_vision.python.core.preprocessing.lenet_preprocessing¶
Provides utilities for preprocessing.
-
easy_vision.python.core.preprocessing.lenet_preprocessing.
preprocess_image
(image, output_height, output_width, is_training)[source]¶ Preprocesses the given image.
Parameters: - image – A Tensor representing an image of arbitrary size.
- output_height – The height of the image after preprocessing.
- output_width – The width of the image after preprocessing.
- is_training – True if we’re preprocessing the image for training and False otherwise.
Returns: A preprocessed image.
easy_vision.python.core.preprocessing.preprocessing_factory¶
Contains a factory for building various models.
-
easy_vision.python.core.preprocessing.preprocessing_factory.
get_preprocessing
(name, is_training=False)[source]¶ Returns preprocessing_fn(image, height, width, **kwargs).
Parameters: - name – The name of the preprocessing function.
- is_training – True if the model is being used for training and False otherwise.
Returns: - A function that preprocessing a single image (pre-batch).
- It has the following signature:
image = preprocessing_fn(image, output_height, output_width, …).
Return type: preprocessing_fn
Raises: ValueError
– If Preprocessing name is not recognized.
easy_vision.python.core.preprocessing.preprocessor¶
-
easy_vision.python.core.preprocessing.preprocessor.
get_or_create_preprocess_rand_vars
(generator_func, function_id, preprocess_vars_cache, key='')[source]¶ Returns a tensor stored in preprocess_vars_cache or using generator_func.
If the tensor was previously generated and appears in the PreprocessorCache, the previously generated tensor will be returned. Otherwise, a new tensor is generated using generator_func and stored in the cache.
Parameters: - generator_func – A 0-argument function that generates a tensor.
- function_id – identifier for the preprocessing function used.
- preprocess_vars_cache – PreprocessorCache object that records previously performed augmentations. Updated in-place. If this function is called multiple times with the same non-null cache, it will perform deterministically.
- key – identifier for the variable stored.
Returns: The generated tensor.
-
easy_vision.python.core.preprocessing.preprocessor.
preprocess
(tensor_dict, preprocess_options, preprocess_vars_cache=None)[source]¶ Preprocess images and bounding boxes.
Various types of preprocessing (to be implemented) based on the preprocess_options dictionary e.g. “crop image” (affects image and possibly boxes), “white balance image” (affects only image), etc. If self._options is None, no preprocessing is done.
Parameters: - tensor_dict –
dictionary that contains images, boxes, and can contain other things as well. images-> rank 3 float32 tensor contains
1 image -> [height, width, 3]. with pixel values varying between [0, 1]- boxes-> rank 2 float32 tensor containing
- the bounding boxes -> [N, 4]. Boxes are in normalized form meaning their coordinates vary between [0, 1]. Each row is in the form of [ymin, xmin, ymax, xmax].
- preprocess_options – It is a list of tuples, where each tuple contains a function and a dictionary that contains arguments and their values.
- preprocess_vars_cache – PreprocessorCache object that records previously performed augmentations. Updated in-place. If this function is called multiple times with the same non-null cache, it will perform deterministically.
Returns: which contains the preprocessed images, bounding boxes, etc.
Return type: tensor_dict
Raises: AssertionError
– (a) If the arguments that a function needsdo not exist in tensor_dict.
- If image in tensor_dict is not rank 4
- tensor_dict –
easy_vision.python.core.preprocessing.preprocessor_cache¶
Records previous preprocessing operations and allows them to be repeated.
Used with core.preprocessor. Passing a PreprocessorCache into individual data augmentation functions or the general preprocess() function will store all randomly generated variables in the PreprocessorCache. When a preprocessor function is called multiple times with the same PreprocessorCache object, that function will perform the same augmentation on all calls.
-
class
easy_vision.python.core.preprocessing.preprocessor_cache.
PreprocessorCache
[source]¶ Bases:
object
Dictionary wrapper storing random variables generated during preprocessing.
-
ADD_BLACK_PATCH
= 'add_black_patch'¶
-
ADJUST_BRIGHTNESS
= 'adjust_brightness'¶
-
ADJUST_CONTRAST
= 'adjust_contrast'¶
-
ADJUST_HUE
= 'adjust_hue'¶
-
ADJUST_SATURATION
= 'adjust_saturation'¶
-
BLACK_PATCHES
= 'black_patches'¶
-
CROP_IMAGE
= 'crop_image'¶
-
CROP_TO_ASPECT_RATIO
= 'crop_to_aspect_ratio'¶
-
DISTORT_COLOR
= 'distort_color'¶
-
HORIZONTAL_FLIP
= 'horizontal_flip'¶
-
IMAGE_SCALE
= 'image_scale'¶
-
JITTER_ASPECT_RATIO
= 'jitter_aspect_ratio'¶
-
PAD_IMAGE
= 'pad_image'¶
-
PAD_TO_ASPECT_RATIO
= 'pad_to_aspect_ratio'¶
-
PIXEL_VALUE_SCALE
= 'pixel_value_scale'¶
-
RANDOM_RESIZE_IMAGE
= 'random_resize_image'¶
-
RANDOM_RESIZE_TO_RANGE
= 'random_resize_to_range'¶
-
RESIZE_METHOD
= 'resize_method'¶
-
RGB_TO_GRAY
= 'rgb_to_gray'¶
-
ROTATION
= 'rotation'¶
-
ROTATION90
= 'rotation90'¶
-
SELECTOR
= 'selector'¶
-
SELECTOR_TUPLES
= 'selector_tuples'¶
-
SSD_CROP_PAD_SELECTOR_ID
= 'ssd_crop_pad_selector_id'¶
-
SSD_CROP_SELECTOR_ID
= 'ssd_crop_selector_id'¶
-
STRICT_CROP_IMAGE
= 'strict_crop_image'¶
-
VERTICAL_FLIP
= 'vertical_flip'¶
-
get
(function_id, key)[source]¶ Gets stored value given a function id and key.
Parameters: - function_id – identifier for the preprocessing function used.
- key – identifier for the variable stored.
Returns: - the corresponding value, expected to be a tensor or
nested structure of tensors.
Return type: value
Raises: ValueError
– if function_id is not one of the 23 valid function ids.
-
update
(function_id, key, value)[source]¶ Adds a value to the dictionary.
Parameters: - function_id – identifier for the preprocessing function used.
- key – identifier for the variable stored.
- value – the value to store, expected to be a tensor or nested structure of tensors.
Raises: ValueError
– if function_id is not one of the 23 valid function ids.
-
easy_vision.python.core.preprocessing.ssd_preprocess¶
-
easy_vision.python.core.preprocessing.ssd_preprocess.
ssd_random_crop
(image, boxes, labels, label_scores=None, multiclass_scores=None, masks=None, keypoints=None, min_object_covered=(0.0, 0.1, 0.3, 0.5, 0.7, 0.9, 1.0), aspect_ratio_range=((0.5, 2.0), (0.5, 2.0), (0.5, 2.0), (0.5, 2.0), (0.5, 2.0), (0.5, 2.0), (0.5, 2.0)), area_range=((0.1, 1.0), (0.1, 1.0), (0.1, 1.0), (0.1, 1.0), (0.1, 1.0), (0.1, 1.0), (0.1, 1.0)), overlap_thresh=(0.0, 0.1, 0.3, 0.5, 0.7, 0.9, 1.0), random_coef=(0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15), seed=None, preprocess_vars_cache=None)[source]¶ Random crop preprocessing with default parameters as in SSD paper.
Liu et al., SSD: Single shot multibox detector. For further information on random crop preprocessing refer to RandomCrop function above.
Parameters: - image – rank 3 float32 tensor contains 1 image -> [height, width, channels] with pixel values varying between [0, 1].
- boxes – rank 2 float32 tensor containing the bounding boxes -> [N, 4]. Boxes are in normalized form meaning their coordinates vary between [0, 1]. Each row is in the form of [ymin, xmin, ymax, xmax].
- labels – rank 1 int32 tensor containing the object classes.
- label_scores – rank 1 float32 tensor containing the scores.
- multiclass_scores – (optional) float32 tensor of shape [num_instances, num_classes] representing the score for each box for each class.
- masks – (optional) rank 3 float32 tensor with shape [num_instances, height, width] containing instance masks. The masks are of the same height, width as the input image.
- keypoints – (optional) rank 3 float32 tensor with shape [num_instances, num_keypoints, 2]. The keypoints are in y-x normalized coordinates.
- min_object_covered – the cropped image must cover at least this fraction of at least one of the input bounding boxes.
- aspect_ratio_range – allowed range for aspect ratio of cropped image.
- area_range – allowed range for area ratio between cropped image and the original image.
- overlap_thresh – minimum overlap thresh with new cropped image to keep the box.
- random_coef – a random coefficient that defines the chance of getting the original image. If random_coef is 0, we will always get the cropped image, and if it is 1.0, we will always get the original image.
- seed – random seed.
- preprocess_vars_cache – PreprocessorCache object that records previously performed augmentations. Updated in-place. If this function is called multiple times with the same non-null cache, it will perform deterministically.
Returns: image which is the same rank as input image. boxes: boxes which is the same rank as input boxes.
Boxes are in normalized form.
labels: new labels.
If label_scores, multiclass_scores, masks, or keypoints is not None, the function also returns: label_scores: rank 1 float32 tensor with shape [num_instances]. multiclass_scores: rank 2 float32 tensor with shape
[num_instances, num_classes]
- masks: rank 3 float32 tensor with shape [num_instances, height, width]
containing instance masks.
- keypoints: rank 3 float32 tensor with shape
[num_instances, num_keypoints, 2]
Return type: image
-
easy_vision.python.core.preprocessing.ssd_preprocess.
ssd_random_crop_fixed_aspect_ratio
(image, boxes, labels, label_scores=None, multiclass_scores=None, masks=None, keypoints=None, min_object_covered=(0.0, 0.1, 0.3, 0.5, 0.7, 0.9, 1.0), aspect_ratio=1.0, area_range=((0.1, 1.0), (0.1, 1.0), (0.1, 1.0), (0.1, 1.0), (0.1, 1.0), (0.1, 1.0), (0.1, 1.0)), overlap_thresh=(0.0, 0.1, 0.3, 0.5, 0.7, 0.9, 1.0), random_coef=(0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15), seed=None, preprocess_vars_cache=None)[source]¶ Random crop preprocessing with default parameters as in SSD paper.
Liu et al., SSD: Single shot multibox detector. For further information on random crop preprocessing refer to RandomCrop function above.
The only difference is that the aspect ratio of the crops are fixed.
Parameters: - image – rank 3 float32 tensor contains 1 image -> [height, width, channels] with pixel values varying between [0, 1].
- boxes – rank 2 float32 tensor containing the bounding boxes -> [N, 4]. Boxes are in normalized form meaning their coordinates vary between [0, 1]. Each row is in the form of [ymin, xmin, ymax, xmax].
- labels – rank 1 int32 tensor containing the object classes.
- label_scores – (optional) float32 tensor of shape [num_instances] representing the score for each box.
- multiclass_scores – (optional) float32 tensor of shape [num_instances, num_classes] representing the score for each box for each class.
- masks – (optional) rank 3 float32 tensor with shape [num_instances, height, width] containing instance masks. The masks are of the same height, width as the input image.
- keypoints – (optional) rank 3 float32 tensor with shape [num_instances, num_keypoints, 2]. The keypoints are in y-x normalized coordinates.
- min_object_covered – the cropped image must cover at least this fraction of at least one of the input bounding boxes.
- aspect_ratio – aspect ratio of the cropped image.
- area_range – allowed range for area ratio between cropped image and the original image.
- overlap_thresh – minimum overlap thresh with new cropped image to keep the box.
- random_coef – a random coefficient that defines the chance of getting the original image. If random_coef is 0, we will always get the cropped image, and if it is 1.0, we will always get the original image.
- seed – random seed.
- preprocess_vars_cache – PreprocessorCache object that records previously performed augmentations. Updated in-place. If this function is called multiple times with the same non-null cache, it will perform deterministically.
Returns: image which is the same rank as input image. boxes: boxes which is the same rank as input boxes.
Boxes are in normalized form.
labels: new labels.
- If mulitclass_scores, masks, or keypoints is not None, the function also
returns:
- multiclass_scores: rank 2 float32 tensor with shape
[num_instances, num_classes]
- masks: rank 3 float32 tensor with shape [num_instances, height, width]
containing instance masks.
- keypoints: rank 3 float32 tensor with shape
[num_instances, num_keypoints, 2]
Return type: image
-
easy_vision.python.core.preprocessing.ssd_preprocess.
ssd_random_crop_pad
(image, boxes, labels, label_scores=None, multiclass_scores=None, min_object_covered=(0.1, 0.3, 0.5, 0.7, 0.9, 1.0), aspect_ratio_range=((0.5, 2.0), (0.5, 2.0), (0.5, 2.0), (0.5, 2.0), (0.5, 2.0), (0.5, 2.0)), area_range=((0.1, 1.0), (0.1, 1.0), (0.1, 1.0), (0.1, 1.0), (0.1, 1.0), (0.1, 1.0)), overlap_thresh=(0.1, 0.3, 0.5, 0.7, 0.9, 1.0), random_coef=(0.15, 0.15, 0.15, 0.15, 0.15, 0.15), min_padded_size_ratio=((1.0, 1.0), (1.0, 1.0), (1.0, 1.0), (1.0, 1.0), (1.0, 1.0), (1.0, 1.0)), max_padded_size_ratio=((2.0, 2.0), (2.0, 2.0), (2.0, 2.0), (2.0, 2.0), (2.0, 2.0), (2.0, 2.0)), pad_color=(None, None, None, None, None, None), seed=None, preprocess_vars_cache=None)[source]¶ Random crop preprocessing with default parameters as in SSD paper.
Liu et al., SSD: Single shot multibox detector. For further information on random crop preprocessing refer to RandomCrop function above.
Parameters: - image – rank 3 float32 tensor containing 1 image -> [height, width, channels] with pixel values varying between [0, 1].
- boxes – rank 2 float32 tensor containing the bounding boxes -> [N, 4]. Boxes are in normalized form meaning their coordinates vary between [0, 1]. Each row is in the form of [ymin, xmin, ymax, xmax].
- labels – rank 1 int32 tensor containing the object classes.
- label_scores – float32 tensor of shape [num_instances] representing the score for each box.
- multiclass_scores – (optional) float32 tensor of shape [num_instances, num_classes] representing the score for each box for each class.
- min_object_covered – the cropped image must cover at least this fraction of at least one of the input bounding boxes.
- aspect_ratio_range – allowed range for aspect ratio of cropped image.
- area_range – allowed range for area ratio between cropped image and the original image.
- overlap_thresh – minimum overlap thresh with new cropped image to keep the box.
- random_coef – a random coefficient that defines the chance of getting the original image. If random_coef is 0, we will always get the cropped image, and if it is 1.0, we will always get the original image.
- min_padded_size_ratio – min ratio of padded image height and width to the input image’s height and width.
- max_padded_size_ratio – max ratio of padded image height and width to the input image’s height and width.
- pad_color – padding color. A rank 1 tensor of [3] with dtype=tf.float32. if set as None, it will be set to average color of the randomly cropped image.
- seed – random seed.
- preprocess_vars_cache – PreprocessorCache object that records previously performed augmentations. Updated in-place. If this function is called multiple times with the same non-null cache, it will perform deterministically.
Returns: Image shape will be [new_height, new_width, channels]. boxes: boxes which is the same rank as input boxes. Boxes are in normalized
form.
new_labels: new labels. new_label_scores: new label scores.
Return type: image
-
easy_vision.python.core.preprocessing.ssd_preprocess.
ssd_random_crop_pad_fixed_aspect_ratio
(image, boxes, labels, label_scores=None, multiclass_scores=None, masks=None, keypoints=None, min_object_covered=(0.0, 0.1, 0.3, 0.5, 0.7, 0.9, 1.0), aspect_ratio=1.0, aspect_ratio_range=((0.5, 2.0), (0.5, 2.0), (0.5, 2.0), (0.5, 2.0), (0.5, 2.0), (0.5, 2.0), (0.5, 2.0)), area_range=((0.1, 1.0), (0.1, 1.0), (0.1, 1.0), (0.1, 1.0), (0.1, 1.0), (0.1, 1.0), (0.1, 1.0)), overlap_thresh=(0.0, 0.1, 0.3, 0.5, 0.7, 0.9, 1.0), random_coef=(0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15), min_padded_size_ratio=(1.0, 1.0), max_padded_size_ratio=(2.0, 2.0), seed=None, preprocess_vars_cache=None)[source]¶ Random crop and pad preprocessing with default parameters as in SSD paper.
Liu et al., SSD: Single shot multibox detector. For further information on random crop preprocessing refer to RandomCrop function above.
The only difference is that after the initial crop, images are zero-padded to a fixed aspect ratio instead of being resized to that aspect ratio.
Parameters: - image – rank 3 float32 tensor contains 1 image -> [height, width, channels] with pixel values varying between [0, 1].
- boxes – rank 2 float32 tensor containing the bounding boxes -> [N, 4]. Boxes are in normalized form meaning their coordinates vary between [0, 1]. Each row is in the form of [ymin, xmin, ymax, xmax].
- labels – rank 1 int32 tensor containing the object classes.
- label_scores – (optional) float32 tensor of shape [num_instances] representing the score for each box.
- multiclass_scores – (optional) float32 tensor of shape [num_instances, num_classes] representing the score for each box for each class.
- masks – (optional) rank 3 float32 tensor with shape [num_instances, height, width] containing instance masks. The masks are of the same height, width as the input image.
- keypoints – (optional) rank 3 float32 tensor with shape [num_instances, num_keypoints, 2]. The keypoints are in y-x normalized coordinates.
- min_object_covered – the cropped image must cover at least this fraction of at least one of the input bounding boxes.
- aspect_ratio – the final aspect ratio to pad to.
- aspect_ratio_range – allowed range for aspect ratio of cropped image.
- area_range – allowed range for area ratio between cropped image and the original image.
- overlap_thresh – minimum overlap thresh with new cropped image to keep the box.
- random_coef – a random coefficient that defines the chance of getting the original image. If random_coef is 0, we will always get the cropped image, and if it is 1.0, we will always get the original image.
- min_padded_size_ratio – min ratio of padded image height and width to the input image’s height and width.
- max_padded_size_ratio – max ratio of padded image height and width to the input image’s height and width.
- seed – random seed.
- preprocess_vars_cache – PreprocessorCache object that records previously performed augmentations. Updated in-place. If this function is called multiple times with the same non-null cache, it will perform deterministically.
Returns: image which is the same rank as input image. boxes: boxes which is the same rank as input boxes.
Boxes are in normalized form.
labels: new labels.
If multiclass_scores, masks, or keypoints is not None, the function also returns:
multiclass_scores: rank 2 with shape [num_instances, num_classes] masks: rank 3 float32 tensor with shape [num_instances, height, width]
containing instance masks.
- keypoints: rank 3 float32 tensor with shape
[num_instances, num_keypoints, 2]
Return type: image
easy_vision.python.core.preprocessing.text_preprocess¶
-
easy_vision.python.core.preprocessing.text_preprocess.
random_crop_text_image
(image, boxes, labels, label_scores=None, multiclass_scores=None, masks=None, keypoints=None, texts=None, texts_ids=None, texts_direction=None, min_object_covered=1.0, min_aspect_ratio=0.2, max_aspect_ratio=5, min_area=0.1, max_area=1.0, random_coef=0.1, seed=None, preprocess_vars_cache=None)[source]¶ Randomly crops the image.
Given the input image and its bounding boxes, this op randomly crops a subimage. Given a user-provided set of input constraints, the crop window is resampled until it satisfies these constraints. If within 100 trials it is unable to find a valid crop, the original image is returned. See the Args section for a description of the input constraints. Both input boxes and returned Boxes are in normalized form (e.g., lie in the unit square [0, 1]). This function will return the original image with probability random_coef.
Parameters: - image – rank 3 float32 tensor contains 1 image -> [height, width, channels] with pixel values varying between [0, 1].
- boxes – rank 2 float32 tensor containing the bounding boxes with shape [num_instances, 4]. Boxes are in normalized form meaning their coordinates vary between [0, 1]. Each row is in the form of [ymin, xmin, ymax, xmax].
- labels – rank 1 int32 tensor containing the object classes.
- label_scores – (optional) float32 tensor of shape [num_instances]. representing the score for each box.
- multiclass_scores – (optional) float32 tensor of shape [num_instances, num_classes] representing the score for each box for each class.
- masks – (optional) rank 3 float32 tensor with shape [num_instances, height, width] containing instance masks. The masks are of the same height, width as the input image.
- keypoints – (optional) rank 3 float32 tensor with shape [num_instances, num_keypoints, 2]. The keypoints are in y-x normalized coordinates.
- texts – (optional) rank 1 string tensor with shape [num_instances]
- texts_ids – (optional) rank 1 string tensor with shape [num_instances, max_text_length]
- texts_direction – (optional) rank 1 int32 tensor with shape [num_instances]
- min_object_covered – the cropped image must cover at least this fraction of at least one of the input bounding boxes.
- min_aspect_ratio – allowed min range for aspect ratio of cropped image.
- max_aspect_ratio – allowed max range for aspect ratio of cropped image.
- min_area – allowed min area ratio between cropped image and the original image.
- max_area – allowed max area ratio between cropped image and the original image.
- random_coef – a random coefficient that defines the chance of getting the original image. If random_coef is 0, we will always get the cropped image, and if it is 1.0, we will always get the original image.
- seed – random seed.
- preprocess_vars_cache – PreprocessorCache object that records previously performed augmentations. Updated in-place. If this function is called multiple times with the same non-null cache, it will perform deterministically.
Returns: Image shape will be [new_height, new_width, channels]. boxes: boxes which is the same rank as input boxes. Boxes are in normalized
form.
labels: new labels.
If label_scores, multiclass_scores, masks, or keypoints is not None, the function also returns: label_scores: rank 1 float32 tensor with shape [num_instances]. multiclass_scores: rank 2 float32 tensor with shape
[num_instances, num_classes]
- masks: rank 3 float32 tensor with shape [num_instances, height, width]
containing instance masks.
- keypoints: rank 3 float32 tensor with shape
[num_instances, num_keypoints, 2]
texts: rank 1 string tensor with shape [num_instances]
Return type: image
-
easy_vision.python.core.preprocessing.text_preprocess.
random_crop_text_region
(image, text_keypoints, seed=None, preprocess_vars_cache=None)[source]¶ Randomly crop text region image, the bounding text keypoints must inside the crop window. text recognition and text rectification use.
Parameters: - image – rank 3 float32 tensor contains 1 image -> [height, width, channels] with pixel values varying between [0, 1].
- text_keypoints – rank 2 float32 tensor with shape [num_keypoints, 2]
- seed – random seed
- preprocess_vars_cache – PreprocessorCache object that records previously performed augmentations. Updated in-place. If this function is called multiple times with the same non-null cache, it will perform deterministically.
Returns: Image shape will be [new_height, new_width, channels]. text_keypoints: rank 2 float32 tensor with shape [num_keypoints, 2]
Return type: image
-
easy_vision.python.core.preprocessing.text_preprocess.
random_rotate_text_region
(image, text_keypoints, text_direction=None, min_angle=-10, max_angle=10, rot90=True, seed=None, preprocess_vars_cache=None)[source]¶ Randomly rotates the text region image counter-clockwise. text recognition and text rectification use.
Parameters: - image – rank 3 float32 tensor with shape [height, width, channels].
- text_keypoints – rank 2 float32 tensor with shape [num_keypoints, 2]
- text_direction – (optional) float32 scalar tensor
- min_angle – min angle of rotation range
- max_angle – max angle of rotation range
- rot90 – random rotate image 90 degree or not
- seed – random seed
- preprocess_vars_cache – PreprocessorCache object that records previously performed augmentations. Updated in-place. If this function is called multiple times with the same non-null cache, it will perform deterministically.
Returns: image shape will be [new_height, new_width, channels]. text_keypoints: rank 2 float32 tensor with shape [num_keypoints, 2]. text_direction: (optional) float32 scalar tensor
Return type: image
easy_vision.python.core.preprocessing.vgg_preprocessing¶
Provides utilities to preprocess images.
The preprocessing steps for VGG were introduced in the following technical report:
Very Deep Convolutional Networks For Large-Scale Image Recognition Karen Simonyan and Andrew Zisserman arXiv technical report, 2015 PDF: http://arxiv.org/pdf/1409.1556.pdf ILSVRC 2014 Slides: http://www.robots.ox.ac.uk/~karen/pdf/ILSVRC_2014.pdf CC-BY-4.0
More information can be obtained from the VGG website: www.robots.ox.ac.uk/~vgg/research/very_deep/
-
easy_vision.python.core.preprocessing.vgg_preprocessing.
preprocess_for_eval
(image, output_height, output_width, resize_side)[source]¶ Preprocesses the given image for evaluation.
Parameters: - image – A Tensor representing an image of arbitrary size.
- output_height – The height of the image after preprocessing.
- output_width – The width of the image after preprocessing.
- resize_side – The smallest side of the image for aspect-preserving resizing.
Returns: A preprocessed image.
-
easy_vision.python.core.preprocessing.vgg_preprocessing.
preprocess_for_train
(image, output_height, output_width, resize_side_min=256, resize_side_max=512)[source]¶ Preprocesses the given image for training.
- Note that the actual resizing scale is sampled from
- [resize_size_min, resize_size_max].
Parameters: - image – A Tensor representing an image of arbitrary size.
- output_height – The height of the image after preprocessing.
- output_width – The width of the image after preprocessing.
- resize_side_min – The lower bound for the smallest side of the image for aspect-preserving resizing.
- resize_side_max – The upper bound for the smallest side of the image for aspect-preserving resizing.
Returns: A preprocessed image.
-
easy_vision.python.core.preprocessing.vgg_preprocessing.
preprocess_image
(image, output_height, output_width, is_training=False, resize_side_min=256, resize_side_max=512)[source]¶ Preprocesses the given image.
Parameters: - image – A Tensor representing an image of arbitrary size.
- output_height – The height of the image after preprocessing.
- output_width – The width of the image after preprocessing.
- is_training – True if we’re preprocessing the image for training and False otherwise.
- resize_side_min – The lower bound for the smallest side of the image for aspect-preserving resizing. If is_training is False, then this value is used for rescaling.
- resize_side_max –
The upper bound for the smallest side of the image for aspect-preserving resizing. If is_training is False, this value is ignored. Otherwise, the resize side is sampled from
[resize_size_min, resize_size_max].
Returns: A preprocessed image.
easy_vision.python.core.preprocessing.video_preprocess¶
-
easy_vision.python.core.preprocessing.video_preprocess.
action_detection_preprocessing
(video, length, crop_size, frame_height=128, frame_width=171, pixel_means=[114.7748, 107.7354, 99.475], norm_values=[38.7568578, 37.88248729, 40.02898126], is_flip=True, is_random_crop=True, seed=None)[source]¶ video: list of frames, string length: input length frame_hegiht: target frame height frame_width: target frame width crop_size: crop size of each frame pixel_means: Pixel mean values (RGB order) as a (1, 1, 3) array
-
easy_vision.python.core.preprocessing.video_preprocess.
central_crop
(tensor, target_size, name=None)[source]¶
-
easy_vision.python.core.preprocessing.video_preprocess.
kinetics_preprocessing
(clip, sample_duration=16, input_c=3, initial_scale=1, n_scales=5, scale_step=0.84089641525, train_crop='corner', sample_size=112, n_samples_for_each_video=1, is_spatial_transform=True, is_training=True)[source]¶ Preprocess method in kinetics dataset. :param clip: clip of video :param sample_duration: video length of each sample, default 16. :param input_c: channel of each frame :param initial_scale: specifying the initial scale for multiscale cropping :param n_scales: specifying the number of scales for multiscale cropping :param scale_step: specifying the scale step for multiscale cropping :param train_crop: specifying the cropping method, [‘random’, ‘corner’, ‘center’] :param sample_size: specifying the crop size :param n_samples_for_each_video: clip num of each video :param is_spatial_transform: spatial transform or not. E.g,
transforms.RandomCrop
:type is_spatial_transform: callable, optionalReturns: processed clip
-
easy_vision.python.core.preprocessing.video_preprocess.
temporal_center_crop
(video, frame_num, flow_encoded=None, sample_duration=16, sample_stride=1)[source]¶
-
easy_vision.python.core.preprocessing.video_preprocess.
temporal_random_crop
(video, frame_num, flow_encoded=None, sample_duration=16, sample_stride=1)[source]¶