easy_vision.python.dataset_tools¶
easy_vision.python.dataset_tools.custom_generator_example¶
-
class
easy_vision.python.dataset_tools.custom_generator_example.
CustomGenerator
(data_config, start_cls_id=0)[source]¶ Bases:
easy_vision.python.dataset_tools.tf_record_generator.TFRecordGenerator
-
classmethod
create_class
(name)¶
-
classmethod
easy_vision.python.dataset_tools.data_reader¶
-
class
easy_vision.python.dataset_tools.data_reader.
DataReader
(oss_config=None)[source]¶ Bases:
object
a wrapper class that could read data from local path, http url(with retry in case of failure) and oss path.
-
HTTP_MAX_NUM_IMG_READ_TRY
= 10¶
-
easy_vision.python.dataset_tools.dataset_info¶
-
class
easy_vision.python.dataset_tools.dataset_info.
ClassificationDatasetInfo
[source]¶ Bases:
easy_vision.python.dataset_tools.dataset_info.DatasetInfo
-
class
easy_vision.python.dataset_tools.dataset_info.
DataInfoFieldType
[source]¶ Bases:
object
-
CONST
= 'const'¶
-
COUNT
= 'count'¶
-
FILE
= 'file'¶
-
STATISTIC
= 'statistic'¶
-
-
class
easy_vision.python.dataset_tools.dataset_info.
DataInfoFields
[source]¶ Bases:
object
-
aspect_ratio
= 'aspect_ratio'¶
-
bbox_area
= 'bbox_area'¶
-
bbox_aspect_ratio
= 'bbox_aspect_ratio'¶
-
char_dict_path
= 'char_dict_path'¶
-
label_map_path
= 'label_map_path'¶
-
num_classes
= 'num_classes'¶
-
num_images
= 'num_images'¶
-
num_keypoints
= 'num_keypoints'¶
-
-
class
easy_vision.python.dataset_tools.dataset_info.
DatasetInfo
(fields_info=None)[source]¶ Bases:
object
-
add_single
(single_info)[source]¶ add single image and label info into dataset, single_info keys is same as tfrecord keys.
-
add_statistic_field
(name, value)[source]¶ add one dataset statistic field info, like aspect_ratio etc.
-
load
(info_path_pattern)[source]¶ load dataset info from json files :param info_path_pattern: str or list, glob info file pattern
-
update_const_field
(name, field_info)[source]¶ update one dataset const field info, like num_classes etc.
-
update_count_field
(name, field_info)[source]¶ add one dataset count field info, like num_images etc.
-
-
class
easy_vision.python.dataset_tools.dataset_info.
DetectionDatasetInfo
[source]¶ Bases:
easy_vision.python.dataset_tools.dataset_info.ClassificationDatasetInfo
-
class
easy_vision.python.dataset_tools.dataset_info.
SegmentationDatasetInfo
[source]¶ Bases:
easy_vision.python.dataset_tools.dataset_info.ClassificationDatasetInfo
-
class
easy_vision.python.dataset_tools.dataset_info.
TextDetectionDatasetInfo
[source]¶ Bases:
easy_vision.python.dataset_tools.dataset_info.DetectionDatasetInfo
-
class
easy_vision.python.dataset_tools.dataset_info.
TextEnd2EndDatasetInfo
[source]¶ Bases:
easy_vision.python.dataset_tools.dataset_info.DetectionDatasetInfo
easy_vision.python.dataset_tools.pai_converter¶
-
class
easy_vision.python.dataset_tools.pai_converter.
PaiConverter
(data_config)[source]¶ Bases:
easy_vision.python.dataset_tools.tf_record_converter.MultiProcConverter
-
classmethod
create_class
(name)¶
-
split_train_test
(input_path, output_dir, test_ratio)[source]¶ split into train test file with test_ratio
Parameters: - input_path – string, file to split
- output_dir – output directory to save splited files
- test_ratio – number of test data / number of total data
Returns: train data file path test_input_path: test data file path
Return type: train_input_path
-
classmethod
-
class
easy_vision.python.dataset_tools.pai_converter.
PaiRow
(row, task_id=None)[source]¶ Bases:
easy_vision.python.dataset_tools.tf_record_converter.DataRow
easy_vision.python.dataset_tools.qince_converter¶
-
class
easy_vision.python.dataset_tools.qince_converter.
QinceConverter
(data_config)[source]¶ Bases:
easy_vision.python.dataset_tools.tf_record_converter.MultiProcConverter
implement methods for qince format # check input file has any error _check_lines(self, file_path) # load from easy_vision.python.input_path, wrapped with QinceRow, push into self._input_queue _load_input(self, input_path) # save error data in self._remark_map to file_name for correction. _save_for_fix(self, file_name)
-
CREATE_TASK_URL
= 'https://qince.taobao.com/label/createCutout.htm?template_type=103&device=2'¶
-
classmethod
create_class
(name)¶
-
split_train_test
(input_path, output_dir, test_ratio)[source]¶ split into train test file with test_ratio
Parameters: - input_path – string, file to split
- output_dir – output directory to save splited files
- test_ratio – number of test data / number of total data
Returns: train data file path test_input_path: test data file path
Return type: train_input_path
-
-
class
easy_vision.python.dataset_tools.qince_converter.
QinceRow
(row, separator=None)[source]¶ Bases:
easy_vision.python.dataset_tools.tf_record_converter.DataRow
parse an qince row to get image url and anwser 0.数据ID 1.原始数据 2.答案 3.打标人 4.打标时间 5.题目类型 6.是否质检 7.质检人 The following methods of DataRow are reimplemented:
iter_objs(self) get_cls(self)-
ANWS
= 2¶
-
CHECK
= 6¶
-
CHECK_PERSON
= 7¶
-
DRECTION_MAP
= {u'\u5e95\u90e8\u671d\u4e0a': 2, u'\u5e95\u90e8\u671d\u4e0b': 0, u'\u5e95\u90e8\u671d\u53f3': 3, u'\u5e95\u90e8\u671d\u5de6': 1}¶
-
ID
= 0¶
-
PERSON
= 3¶
-
QUESTION_TYPE
= 5¶
-
RAW_DATA
= 1¶
-
TIME
= 4¶
-
URL_PATTERN
= '^(https?|oss|file)://[-A-Za-z0-9+&@#/%?=~_|!:,.;]+[-A-Za-z0-9+&@#/%=~_|]$'¶
-
easy_vision.python.dataset_tools.tf_record_converter¶
-
class
easy_vision.python.dataset_tools.tf_record_converter.
DataRow
(row)[source]¶ Bases:
object
Base class for extract tf record information from input. One subclass is QinceRow. all subclass must implement:
get_img_url(self) get_anws(self)- for classification tasks, one more method:
- get_cls(self)
- for detection tasks, one more method:
- iter_objs(self)
for other tasks, to be extend.
-
get_anws
()[source]¶ must be implemented for all tasks if data correction is necessary. :returns:
- any type data if you want to use easy-vision generators , an json object
- should be returned
-
get_cls
()[source]¶ must be implemented for CLASSIFICATION task :returns: a list of string, class_name, or class_name + separator + class_description
-
get_text
()[source]¶ must be implemented for TEXT_RECOGNITION task when use text_recognition style data :returns: string, text content
-
get_url_data
(url)[source]¶ get binary data read from url :param url: data url
Returns: binary data of url
-
iter_objs
()[source]¶ must be implemented for DETECTION task :returns: an iterator, enumerate it to get class_name, bounding box
label could be class_name or class_name + separator + class_description boundingbox is an array of numbers [left_x, top_y, right_x, bottom_y]
-
iter_objs_with_texts
()[source]¶ must be implemented for TEXT_END2END task, or TEXT_RECOGNITION task when use text_end2end style data :returns: an iterator, enumerate it to get class_name, corners, text_content,
direction_id, difficult
-
class
easy_vision.python.dataset_tools.tf_record_converter.
DictReduceType
[source]¶ Bases:
enum.Enum
-
ADD
= 0¶
-
UPDATE
= 1¶
-
-
class
easy_vision.python.dataset_tools.tf_record_converter.
MultiProcConverter
(data_config)[source]¶ Bases:
object
- Base class for parallel converter for all kinds of input format:
- such as qince format.
- subclass must implement the following methods:
_check_lines(self, input_path) _load_input(self, input_path) _save_for_fix(self, file_name), optional if you want to save
error labeled data for fix.split_train_test(self, input_path, output_dir, test_ratio)
-
__init__
(data_config)[source]¶ init a parallel convert object :param data_config: a DataConfig object file, specifies params
for tf record generationParameters: proc_num – number of process to parallel convert
-
classmethod
create_class
(name)¶
-
parallel_convert
(input_path, output_prefix, output_label_path, output_chardict_path)[source]¶ create object detection tf record using qince marked data :param input_path: .csv format, exported from qince :param output_prefix: output path to save tfrecord :param output_label_path: output path to save label_map, a
protos.string_int_label_map_pb2 text format fileParameters: output_chardict_path – output path to save char_dict Returns: None :raises: ValueError, if input_path does not exist
-
reduce_dicts
(dicts, output_dir, dict_name, op=<DictReduceType.UPDATE: 1>)[source]¶ gather dicts from all hosts
Parameters: - dicts – dict type data
- output_dir – save_dir, string
- dict_name – dict name, an unique tag for this type of dict
- op – default to update, if ‘ADD’, add elementwise
Returns: dict gathered from all hosts
-
split_train_test
(input_path, output_dir, test_ratio)[source]¶ split into train test file with test_ratio :param input_path: string, file to split :param output_dir: output directory to save splited files :param test_ratio: number of test data / number of total data
Returns: train data file path test_input_path: test data file path Return type: train_input_path
easy_vision.python.dataset_tools.tf_record_generator¶
-
class
easy_vision.python.dataset_tools.tf_record_generator.
ClassificationGenerator
(data_config, start_cls_id=0)[source]¶ Bases:
easy_vision.python.dataset_tools.tf_record_generator.TFRecordGenerator
-
__init__
(data_config, start_cls_id=0)[source]¶ init from data_config
Parameters: - DataConfig instance (data_config,) –
- encode classes from start_cls_id, for classification (start_cls_id,) – tasks, usually start from 0, for detection tasks, usually start from 1
-
classmethod
create_class
(name)¶
-
-
class
easy_vision.python.dataset_tools.tf_record_generator.
DetectionGenerator
(data_config, start_cls_id=1)[source]¶ Bases:
easy_vision.python.dataset_tools.tf_record_generator.TFRecordGenerator
-
__init__
(data_config, start_cls_id=1)[source]¶ init from data_config
Parameters: - DataConfig instance (data_config,) –
- encode classes from start_cls_id, for classification (start_cls_id,) – tasks, usually start from 0, for detection tasks, usually start from 1
-
classmethod
create_class
(name)¶
-
-
class
easy_vision.python.dataset_tools.tf_record_generator.
SegmentationGenerator
(data_config, start_cls_id=0, use_polygon=False)[source]¶ Bases:
easy_vision.python.dataset_tools.tf_record_generator.TFRecordGenerator
-
__init__
(data_config, start_cls_id=0, use_polygon=False)[source]¶ init from data_config
Parameters: - DataConfig instance (data_config,) –
- encode classes from start_cls_id, for classification (start_cls_id,) – tasks, usually start from 0, for detection tasks, usually start from 1
-
classmethod
create_class
(name)¶
-
-
class
easy_vision.python.dataset_tools.tf_record_generator.
TFRecordGenerator
(data_config, start_cls_id=0)[source]¶ Bases:
object
Base converter class for all tasks: classification, detection and so on. sub class must implement:
_to_tf_example(self, img, anws, url)-
ERROR_CLASS
= -2¶
-
IGNORE_CLASS
= -3¶
-
INVALID_CLASS
= -1¶
-
__init__
(data_config, start_cls_id=0)[source]¶ init from data_config
Parameters: - DataConfig instance (data_config,) –
- encode classes from start_cls_id, for classification (start_cls_id,) – tasks, usually start from 0, for detection tasks, usually start from 1
-
classmethod
create_class
(name)¶
-
-
class
easy_vision.python.dataset_tools.tf_record_generator.
TextDetectionGenerator
(data_config, start_cls_id=1)[source]¶ Bases:
easy_vision.python.dataset_tools.tf_record_generator.TFRecordGenerator
-
__init__
(data_config, start_cls_id=1)[source]¶ init from data_config
Parameters: - DataConfig instance (data_config,) –
- encode classes from start_cls_id, for classification (start_cls_id,) – tasks, usually start from 0, for detection tasks, usually start from 1
-
classmethod
create_class
(name)¶
-
-
class
easy_vision.python.dataset_tools.tf_record_generator.
TextEnd2EndGenerator
(data_config, start_cls_id=1)[source]¶ Bases:
easy_vision.python.dataset_tools.tf_record_generator.TFRecordGenerator
-
__init__
(data_config, start_cls_id=1)[source]¶ init from data_config
Parameters: - DataConfig instance (data_config,) –
- encode classes from start_cls_id, for classification (start_cls_id,) – tasks, usually start from 0, for detection tasks, usually start from 1
-
classmethod
create_class
(name)¶
-
-
class
easy_vision.python.dataset_tools.tf_record_generator.
TextRecognitionGenerator
(data_config)[source]¶ Bases:
easy_vision.python.dataset_tools.tf_record_generator.TFRecordGenerator
-
__init__
(data_config)[source]¶ init from data_config
Parameters: - DataConfig instance (data_config,) –
- encode classes from start_cls_id, for classification (start_cls_id,) – tasks, usually start from 0, for detection tasks, usually start from 1
-
classmethod
create_class
(name)¶
-
-
class
easy_vision.python.dataset_tools.tf_record_generator.
TextRectificationGenerator
(data_config)[source]¶ Bases:
easy_vision.python.dataset_tools.tf_record_generator.TFRecordGenerator
-
__init__
(data_config)[source]¶ init from data_config
Parameters: - DataConfig instance (data_config,) –
- encode classes from start_cls_id, for classification (start_cls_id,) – tasks, usually start from 0, for detection tasks, usually start from 1
-
classmethod
create_class
(name)¶
-
-
class
easy_vision.python.dataset_tools.tf_record_generator.
VideoClassificationGenerator
(data_config, start_cls_id=0)[source]¶ Bases:
easy_vision.python.dataset_tools.tf_record_generator.TFRecordGenerator
-
__init__
(data_config, start_cls_id=0)[source]¶ init from data_config
Parameters: - DataConfig instance (data_config,) –
- encode classes from start_cls_id, for classification (start_cls_id,) – tasks, usually start from 0, for detection tasks, usually start from 1
-
classmethod
create_class
(name)¶
-
easy_vision.python.dataset_tools.tf_record_writer¶
-
class
easy_vision.python.dataset_tools.tf_record_writer.
SingleWriter
(input_que, output_prefix, line_num, gen_proc_num)[source]¶ -
-
num_good_sam
¶
-
easy_vision.python.dataset_tools.video_converter¶
-
class
easy_vision.python.dataset_tools.video_converter.
VideoClassificationConverter
(data_config)[source]¶ Bases:
easy_vision.python.dataset_tools.tf_record_converter.MultiProcConverter
implement methods for converting a text line file into tfrecord required by image segmentation models.
-
classmethod
create_class
(name)¶
-
classmethod
-
class
easy_vision.python.dataset_tools.video_converter.
VideoClassificationRow
(row)[source]¶ Bases:
easy_vision.python.dataset_tools.tf_record_converter.DataRow
parse an image_path mask_path line to get image url and anwser The following methods of DataRow are reimplemented:
get_mask
easy_vision.python.dataset_tools.voc_converter¶
-
class
easy_vision.python.dataset_tools.voc_converter.
VocConverter
(data_config)[source]¶ Bases:
easy_vision.python.dataset_tools.tf_record_converter.MultiProcConverter
implement methods for voc format
-
classmethod
create_class
(name)¶
-
classmethod
-
class
easy_vision.python.dataset_tools.voc_converter.
VocDataRow
(row)[source]¶ Bases:
easy_vision.python.dataset_tools.tf_record_converter.DataRow
parse image name to get image url and anwser img_path xml_path