easy_vision.python.dataset_tools¶
easy_vision.python.dataset_tools.custom_generator_example¶
-
class
easy_vision.python.dataset_tools.custom_generator_example.CustomGenerator(data_config, start_cls_id=0)[source]¶ Bases:
easy_vision.python.dataset_tools.tf_record_generator.TFRecordGenerator-
classmethod
create_class(name)¶
-
classmethod
easy_vision.python.dataset_tools.data_reader¶
-
class
easy_vision.python.dataset_tools.data_reader.DataReader(oss_config=None)[source]¶ Bases:
objecta wrapper class that could read data from local path, http url(with retry in case of failure) and oss path.
-
HTTP_MAX_NUM_IMG_READ_TRY= 10¶
-
easy_vision.python.dataset_tools.dataset_info¶
-
class
easy_vision.python.dataset_tools.dataset_info.ClassificationDatasetInfo[source]¶ Bases:
easy_vision.python.dataset_tools.dataset_info.DatasetInfo
-
class
easy_vision.python.dataset_tools.dataset_info.DataInfoFieldType[source]¶ Bases:
object-
CONST= 'const'¶
-
COUNT= 'count'¶
-
FILE= 'file'¶
-
STATISTIC= 'statistic'¶
-
-
class
easy_vision.python.dataset_tools.dataset_info.DataInfoFields[source]¶ Bases:
object-
aspect_ratio= 'aspect_ratio'¶
-
bbox_area= 'bbox_area'¶
-
bbox_aspect_ratio= 'bbox_aspect_ratio'¶
-
char_dict_path= 'char_dict_path'¶
-
label_map_path= 'label_map_path'¶
-
num_classes= 'num_classes'¶
-
num_images= 'num_images'¶
-
num_keypoints= 'num_keypoints'¶
-
-
class
easy_vision.python.dataset_tools.dataset_info.DatasetInfo(fields_info=None)[source]¶ Bases:
object-
add_single(single_info)[source]¶ add single image and label info into dataset, single_info keys is same as tfrecord keys.
-
add_statistic_field(name, value)[source]¶ add one dataset statistic field info, like aspect_ratio etc.
-
load(info_path_pattern)[source]¶ load dataset info from json files :param info_path_pattern: str or list, glob info file pattern
-
update_const_field(name, field_info)[source]¶ update one dataset const field info, like num_classes etc.
-
update_count_field(name, field_info)[source]¶ add one dataset count field info, like num_images etc.
-
-
class
easy_vision.python.dataset_tools.dataset_info.DetectionDatasetInfo[source]¶ Bases:
easy_vision.python.dataset_tools.dataset_info.ClassificationDatasetInfo
-
class
easy_vision.python.dataset_tools.dataset_info.SegmentationDatasetInfo[source]¶ Bases:
easy_vision.python.dataset_tools.dataset_info.ClassificationDatasetInfo
-
class
easy_vision.python.dataset_tools.dataset_info.TextDetectionDatasetInfo[source]¶ Bases:
easy_vision.python.dataset_tools.dataset_info.DetectionDatasetInfo
-
class
easy_vision.python.dataset_tools.dataset_info.TextEnd2EndDatasetInfo[source]¶ Bases:
easy_vision.python.dataset_tools.dataset_info.DetectionDatasetInfo
easy_vision.python.dataset_tools.pai_converter¶
-
class
easy_vision.python.dataset_tools.pai_converter.PaiConverter(data_config)[source]¶ Bases:
easy_vision.python.dataset_tools.tf_record_converter.MultiProcConverter-
classmethod
create_class(name)¶
-
split_train_test(input_path, output_dir, test_ratio)[source]¶ split into train test file with test_ratio
Parameters: - input_path – string, file to split
- output_dir – output directory to save splited files
- test_ratio – number of test data / number of total data
Returns: train data file path test_input_path: test data file path
Return type: train_input_path
-
classmethod
-
class
easy_vision.python.dataset_tools.pai_converter.PaiRow(row, task_id=None)[source]¶ Bases:
easy_vision.python.dataset_tools.tf_record_converter.DataRow
easy_vision.python.dataset_tools.qince_converter¶
-
class
easy_vision.python.dataset_tools.qince_converter.QinceConverter(data_config)[source]¶ Bases:
easy_vision.python.dataset_tools.tf_record_converter.MultiProcConverterimplement methods for qince format # check input file has any error _check_lines(self, file_path) # load from easy_vision.python.input_path, wrapped with QinceRow, push into self._input_queue _load_input(self, input_path) # save error data in self._remark_map to file_name for correction. _save_for_fix(self, file_name)
-
CREATE_TASK_URL= 'https://qince.taobao.com/label/createCutout.htm?template_type=103&device=2'¶
-
classmethod
create_class(name)¶
-
split_train_test(input_path, output_dir, test_ratio)[source]¶ split into train test file with test_ratio
Parameters: - input_path – string, file to split
- output_dir – output directory to save splited files
- test_ratio – number of test data / number of total data
Returns: train data file path test_input_path: test data file path
Return type: train_input_path
-
-
class
easy_vision.python.dataset_tools.qince_converter.QinceRow(row, separator=None)[source]¶ Bases:
easy_vision.python.dataset_tools.tf_record_converter.DataRowparse an qince row to get image url and anwser 0.数据ID 1.原始数据 2.答案 3.打标人 4.打标时间 5.题目类型 6.是否质检 7.质检人 The following methods of DataRow are reimplemented:
iter_objs(self) get_cls(self)-
ANWS= 2¶
-
CHECK= 6¶
-
CHECK_PERSON= 7¶
-
DRECTION_MAP= {u'\u5e95\u90e8\u671d\u4e0a': 2, u'\u5e95\u90e8\u671d\u4e0b': 0, u'\u5e95\u90e8\u671d\u53f3': 3, u'\u5e95\u90e8\u671d\u5de6': 1}¶
-
ID= 0¶
-
PERSON= 3¶
-
QUESTION_TYPE= 5¶
-
RAW_DATA= 1¶
-
TIME= 4¶
-
URL_PATTERN= '^(https?|oss|file)://[-A-Za-z0-9+&@#/%?=~_|!:,.;]+[-A-Za-z0-9+&@#/%=~_|]$'¶
-
easy_vision.python.dataset_tools.tf_record_converter¶
-
class
easy_vision.python.dataset_tools.tf_record_converter.DataRow(row)[source]¶ Bases:
objectBase class for extract tf record information from input. One subclass is QinceRow. all subclass must implement:
get_img_url(self) get_anws(self)- for classification tasks, one more method:
- get_cls(self)
- for detection tasks, one more method:
- iter_objs(self)
for other tasks, to be extend.
-
get_anws()[source]¶ must be implemented for all tasks if data correction is necessary. :returns:
- any type data if you want to use easy-vision generators , an json object
- should be returned
-
get_cls()[source]¶ must be implemented for CLASSIFICATION task :returns: a list of string, class_name, or class_name + separator + class_description
-
get_text()[source]¶ must be implemented for TEXT_RECOGNITION task when use text_recognition style data :returns: string, text content
-
get_url_data(url)[source]¶ get binary data read from url :param url: data url
Returns: binary data of url
-
iter_objs()[source]¶ must be implemented for DETECTION task :returns: an iterator, enumerate it to get class_name, bounding box
label could be class_name or class_name + separator + class_description boundingbox is an array of numbers [left_x, top_y, right_x, bottom_y]
-
iter_objs_with_texts()[source]¶ must be implemented for TEXT_END2END task, or TEXT_RECOGNITION task when use text_end2end style data :returns: an iterator, enumerate it to get class_name, corners, text_content,
direction_id, difficult
-
class
easy_vision.python.dataset_tools.tf_record_converter.DictReduceType[source]¶ Bases:
enum.Enum-
ADD= 0¶
-
UPDATE= 1¶
-
-
class
easy_vision.python.dataset_tools.tf_record_converter.MultiProcConverter(data_config)[source]¶ Bases:
object- Base class for parallel converter for all kinds of input format:
- such as qince format.
- subclass must implement the following methods:
_check_lines(self, input_path) _load_input(self, input_path) _save_for_fix(self, file_name), optional if you want to save
error labeled data for fix.split_train_test(self, input_path, output_dir, test_ratio)
-
__init__(data_config)[source]¶ init a parallel convert object :param data_config: a DataConfig object file, specifies params
for tf record generationParameters: proc_num – number of process to parallel convert
-
classmethod
create_class(name)¶
-
parallel_convert(input_path, output_prefix, output_label_path, output_chardict_path)[source]¶ create object detection tf record using qince marked data :param input_path: .csv format, exported from qince :param output_prefix: output path to save tfrecord :param output_label_path: output path to save label_map, a
protos.string_int_label_map_pb2 text format fileParameters: output_chardict_path – output path to save char_dict Returns: None :raises: ValueError, if input_path does not exist
-
reduce_dicts(dicts, output_dir, dict_name, op=<DictReduceType.UPDATE: 1>)[source]¶ gather dicts from all hosts
Parameters: - dicts – dict type data
- output_dir – save_dir, string
- dict_name – dict name, an unique tag for this type of dict
- op – default to update, if ‘ADD’, add elementwise
Returns: dict gathered from all hosts
-
split_train_test(input_path, output_dir, test_ratio)[source]¶ split into train test file with test_ratio :param input_path: string, file to split :param output_dir: output directory to save splited files :param test_ratio: number of test data / number of total data
Returns: train data file path test_input_path: test data file path Return type: train_input_path
easy_vision.python.dataset_tools.tf_record_generator¶
-
class
easy_vision.python.dataset_tools.tf_record_generator.ClassificationGenerator(data_config, start_cls_id=0)[source]¶ Bases:
easy_vision.python.dataset_tools.tf_record_generator.TFRecordGenerator-
__init__(data_config, start_cls_id=0)[source]¶ init from data_config
Parameters: - DataConfig instance (data_config,) –
- encode classes from start_cls_id, for classification (start_cls_id,) – tasks, usually start from 0, for detection tasks, usually start from 1
-
classmethod
create_class(name)¶
-
-
class
easy_vision.python.dataset_tools.tf_record_generator.DetectionGenerator(data_config, start_cls_id=1)[source]¶ Bases:
easy_vision.python.dataset_tools.tf_record_generator.TFRecordGenerator-
__init__(data_config, start_cls_id=1)[source]¶ init from data_config
Parameters: - DataConfig instance (data_config,) –
- encode classes from start_cls_id, for classification (start_cls_id,) – tasks, usually start from 0, for detection tasks, usually start from 1
-
classmethod
create_class(name)¶
-
-
class
easy_vision.python.dataset_tools.tf_record_generator.SegmentationGenerator(data_config, start_cls_id=0, use_polygon=False)[source]¶ Bases:
easy_vision.python.dataset_tools.tf_record_generator.TFRecordGenerator-
__init__(data_config, start_cls_id=0, use_polygon=False)[source]¶ init from data_config
Parameters: - DataConfig instance (data_config,) –
- encode classes from start_cls_id, for classification (start_cls_id,) – tasks, usually start from 0, for detection tasks, usually start from 1
-
classmethod
create_class(name)¶
-
-
class
easy_vision.python.dataset_tools.tf_record_generator.TFRecordGenerator(data_config, start_cls_id=0)[source]¶ Bases:
objectBase converter class for all tasks: classification, detection and so on. sub class must implement:
_to_tf_example(self, img, anws, url)-
ERROR_CLASS= -2¶
-
IGNORE_CLASS= -3¶
-
INVALID_CLASS= -1¶
-
__init__(data_config, start_cls_id=0)[source]¶ init from data_config
Parameters: - DataConfig instance (data_config,) –
- encode classes from start_cls_id, for classification (start_cls_id,) – tasks, usually start from 0, for detection tasks, usually start from 1
-
classmethod
create_class(name)¶
-
-
class
easy_vision.python.dataset_tools.tf_record_generator.TextDetectionGenerator(data_config, start_cls_id=1)[source]¶ Bases:
easy_vision.python.dataset_tools.tf_record_generator.TFRecordGenerator-
__init__(data_config, start_cls_id=1)[source]¶ init from data_config
Parameters: - DataConfig instance (data_config,) –
- encode classes from start_cls_id, for classification (start_cls_id,) – tasks, usually start from 0, for detection tasks, usually start from 1
-
classmethod
create_class(name)¶
-
-
class
easy_vision.python.dataset_tools.tf_record_generator.TextEnd2EndGenerator(data_config, start_cls_id=1)[source]¶ Bases:
easy_vision.python.dataset_tools.tf_record_generator.TFRecordGenerator-
__init__(data_config, start_cls_id=1)[source]¶ init from data_config
Parameters: - DataConfig instance (data_config,) –
- encode classes from start_cls_id, for classification (start_cls_id,) – tasks, usually start from 0, for detection tasks, usually start from 1
-
classmethod
create_class(name)¶
-
-
class
easy_vision.python.dataset_tools.tf_record_generator.TextRecognitionGenerator(data_config)[source]¶ Bases:
easy_vision.python.dataset_tools.tf_record_generator.TFRecordGenerator-
__init__(data_config)[source]¶ init from data_config
Parameters: - DataConfig instance (data_config,) –
- encode classes from start_cls_id, for classification (start_cls_id,) – tasks, usually start from 0, for detection tasks, usually start from 1
-
classmethod
create_class(name)¶
-
-
class
easy_vision.python.dataset_tools.tf_record_generator.TextRectificationGenerator(data_config)[source]¶ Bases:
easy_vision.python.dataset_tools.tf_record_generator.TFRecordGenerator-
__init__(data_config)[source]¶ init from data_config
Parameters: - DataConfig instance (data_config,) –
- encode classes from start_cls_id, for classification (start_cls_id,) – tasks, usually start from 0, for detection tasks, usually start from 1
-
classmethod
create_class(name)¶
-
-
class
easy_vision.python.dataset_tools.tf_record_generator.VideoClassificationGenerator(data_config, start_cls_id=0)[source]¶ Bases:
easy_vision.python.dataset_tools.tf_record_generator.TFRecordGenerator-
__init__(data_config, start_cls_id=0)[source]¶ init from data_config
Parameters: - DataConfig instance (data_config,) –
- encode classes from start_cls_id, for classification (start_cls_id,) – tasks, usually start from 0, for detection tasks, usually start from 1
-
classmethod
create_class(name)¶
-
easy_vision.python.dataset_tools.tf_record_writer¶
-
class
easy_vision.python.dataset_tools.tf_record_writer.SingleWriter(input_que, output_prefix, line_num, gen_proc_num)[source]¶ -
-
num_good_sam¶
-
easy_vision.python.dataset_tools.video_converter¶
-
class
easy_vision.python.dataset_tools.video_converter.VideoClassificationConverter(data_config)[source]¶ Bases:
easy_vision.python.dataset_tools.tf_record_converter.MultiProcConverterimplement methods for converting a text line file into tfrecord required by image segmentation models.
-
classmethod
create_class(name)¶
-
classmethod
-
class
easy_vision.python.dataset_tools.video_converter.VideoClassificationRow(row)[source]¶ Bases:
easy_vision.python.dataset_tools.tf_record_converter.DataRowparse an image_path mask_path line to get image url and anwser The following methods of DataRow are reimplemented:
get_mask
easy_vision.python.dataset_tools.voc_converter¶
-
class
easy_vision.python.dataset_tools.voc_converter.VocConverter(data_config)[source]¶ Bases:
easy_vision.python.dataset_tools.tf_record_converter.MultiProcConverterimplement methods for voc format
-
classmethod
create_class(name)¶
-
classmethod
-
class
easy_vision.python.dataset_tools.voc_converter.VocDataRow(row)[source]¶ Bases:
easy_vision.python.dataset_tools.tf_record_converter.DataRowparse image name to get image url and anwser img_path xml_path