# 特征配置 在 rank 阶段,需要调用打分模型服务。在这之前,需要从特征存储源里获取到 user 或者 item 的特征数据。在某些情况下,获取到的特征数据还需要进一步做处理,比如特征工程,根据已有的特征生成新的特征,根据现有特征进行组合等,这些需要 FeatureOp 实现。 ### Hologres 通过一个具体的例子,来详细的说明。 下面的配置中,live_feed 指的是场景名称。FeatureConfs 支持配置多个场景信息,在例子中,只配置了一个场景。 FeatureLoadConfs 定义了具体的特征获取逻辑,本身是一个列表,可以配置多个特征获取步骤。每个步骤又包括获取逻辑和特征处理逻辑。FeatureDaoConf 定义特征获取逻辑,Features 定义特征变换逻辑。 FeatureDaoConf 提供了配置存在哪里,以及获取具体的获取逻辑。 * AdapterType 数据源类型,目前支持 hologres, redis, tablestore * HologresName 配置的 hologres 名称,名称可以从 HologresConfs 找到 * FeatureKey 根据哪个值,去从表里查找特征数据。FeatureKey 标明查找的值来自于 user 或者 item 的哪个字段。比如,`user:uid` 获取 user 的 uid 属性值,`item:pair_id` 获取 item 的 pair_id 的属性值。 * UserFeatureKeyName 表里的字段名称,这个字段的值就是 FeatureKey * HologresTableName 表名称 * UserSelectFields 获取 user 特征的字段列表, * 说明获取表的所有字段 * ItemSelectFields 获取 item 特征的字段列表, * 说明获取表的所有字段 * FeatureStore 特征获取到之后,存储到哪里。user 和 item 都有属性的字段 Properties, 这个 Properties 是个 map, 用来存储特征数据。 获取 user 特征可以这样理解 SELECT \${UserSelectFields} FROM ​\${HologresTableName} WHERE ​\${UserFeatureKeyName} = ${FeatureKey} 。 这里的 UserSelectFields , ItemSelectFields 是根据样本的特征来的。在调用模型之前,需要构造出和样本数据一样的特征。 在获取特征的时候,是根据现有的数据来获取的,一般是 uid 或者 itemid, 分别这样表示 user:uid 和 item:id。 如果根据其他的字段,那么字段必须存在在 Properties 字段中。像示例中的 item:pair_id 和 item:matchmaker_id, pair_id 和 matchmaker_id 是存在 item 的 Properties 中的。 AsynLoadFeature = true 代码异步并发调用 多个 FeatureLoadConfs, 可以减少 RT 耗时。如果 FeatureLoadConfs 的获取逻辑独立,AsynLoadFeature 变为 true 可以提高获取性能。 ```bash "FeatureConfs" :{ "live_feed" :{ "AsynLoadFeature" : true, "FeatureLoadConfs": [ { "FeatureDaoConf": { "AdapterType": "hologres", "HologresName": "holo-pai", "FeatureKey": "user:uid", "UserFeatureKeyName" :"uid", "HologresTableName": "recom_user_features_processed_holo_online", "UserSelectFields":"rids_count,sex,alladdfriendnum,allpayrosenum,getgiftnum7d,friendnum7d,talknum7d,start_age,end_age,start_height,end_height,lowest_education,lowest_salary,height,wealth,age,living_condition,education,headstatus,marriage,professionid,provinceid,role,salary,socialtag,facevalue", "FeatureStore":"user" }, "Features" :[ ] }, { "FeatureDaoConf": { "AdapterType": "hologres", "HologresName": "holo-pai", "ItemFeatureKeyName" :"uid", "FeatureKey": "item:pair_id", "HologresTableName": "recom_user_features_processed_holo_online", "ItemSelectFields":"uid, rids_count as rids2_count,sex as guestsex,alladdfriendnum as alladdfriendnum2,allpayrosenum as allpayrosenum2, getgiftnum7d as getgiftnum7d2,friendnum7d as friendnum7d2,talknum7d as talknum7d2,start_age as start_age2,end_age as end_age2,start_height as start_height2,end_height as end_height2,lowest_education as lowest_education2,lowest_salary as lowest_salary2,height as height2,wealth as wealth2,age as age2,living_condition as living_condition2,education as education2,headstatus as headstatus2,marriage as marriage2,professionid as professionid2,provinceid as provinceid2,role as role2,salary as salary2,socialtag as socialtag2,facevalue as facevalue2", "FeatureStore":"item" }, "Features" :[ ] }, { "FeatureDaoConf": { "AdapterType": "hologres", "HologresName": "holo-pai", "ItemFeatureKeyName" :"cupid_id", "FeatureKey": "item:matchmaker_id", "HologresTableName": "recom_red_features_processed_holo_online", "ItemSelectFields":"cupid_id, cupid_id as redid,sex as redsex,role as role1,good_num as good_num1,mid_num as mid_num1,bad_num as bad_num1,total_access as total_access1,duration as duration1,jubaohongniangshu as jubaohongniangshu1,jubaohongniangzongshu as jubaohongniangzongshu1", "FeatureStore":"item" }, "Features" :[ ] } ] } } ``` > *注意:在获取 item 的特征时候,ItemSelectFields 的第一个值一定是 ItemFeatureKeyName 配置的值。item 是个列表,在并发获取数据的时候,是通过 ItemFeatureKeyName 来找到匹配关系的。* 在获取到特征之后,还可以对特征进行进一步处理,生成新特征或者特征的预处理等。 ```bash "Features":[ { "FeatureType":"raw_feature", "FeatureName":"article_id", "FeatureSource":"item:id", "FeatureStore":"item" }, { "FeatureType":"raw_feature", "FeatureName":"item_elapse_time", "FeatureSource":"item:item_ctime", "Normalizer":"time_ln", "RemoveFeatureSource":true, "FeatureStore":"item" } ] ``` * FeatureType FeatureOp 名称,raw_feature 指的是从原始特征直接生成第三方特征 * FeatureName 新生成的特征名称 * FeatureSource 原始特征 * FeatureStore 新生成的特征存到哪个对象里,user 或者 item * Normalizer 可选,归一化处理操作名称,目前只有 time_ln * RemoveFeatureSource 可选,是否删除原始特征 ### Redis 在来看个 redis 的例子 redis 中存储特征使用 key value 形式,value 形式为 "key1:value2,key2:value2" 。 - AdapterType 数据源为 redis - RedisName redis 数据源名称,可以从 RedisConfs 里找到 - RedisPrefix key 前缀 - FeatureKey 根据哪个值,去构造 redis key 值。FeatureKey 标明查找的值来自于 user 或者 item 的哪个字段。比如,`user:uid` 获取 user 的 uid 属性值,`item:pair_id` 获取 item 的 pair_id 的属性值。 - FeatureStore 特征获取到之后,存储到哪里。user 和 item 都有属性的字段 Properties, 这个 Properties 是个 map, 用来存储特征数据。 ```json "FeatureConfs" :{ "home_feed" :{ "AsynLoadFeature" : true, "FeatureLoadConfs": [ { "FeatureDaoConf": { "AdapterType": "redis", "RedisName": "user_redis", "RedisPrefix": "UF_V2_", "FeatureKey": "user:uid", "FeatureStore":"user" }, "Features" :[] }, { "FeatureDaoConf": { "AdapterType": "redis", "RedisName": "item_redis", "RedisPrefix": "IF_V2_FM_", "FeatureKey": "item:id", "FeatureStore":"item" }, "Features" :[ { "FeatureType": "raw_feature", "FeatureName" : "article_id", "FeatureSource" : "item:id", "FeatureStore":"item" }, { "FeatureType": "raw_feature", "FeatureName" : "item_elapse_time", "FeatureSource" : "item:item_ctime", "Normalizer": "time_ln", "RemoveFeatureSource" : true, "FeatureStore":"item" } ] } ] } } ``` 在上面的例子中,还有 Features 的变化操作。 - FeatureType FeatureOp 名称,raw_feature 指的是从原始特征直接生成第三方特征 - FeatureName 新生成的特征名称 - FeatureSource 原始特征 - FeatureStore 新生成的特征存到哪个对象里,user 或者 item - Normalizer 可选,归一化处理操作名称 - RemoveFeatureSource 可选,是否删除原始特征 ### OTS(tablestore) OTS 的配置与 Hologres 类似, 样例配置如下: ```json "FeatureConfs" :{ "home_feed" :{ "AsynLoadFeature" : true, "FeatureLoadConfs": [ { "FeatureDaoConf": { "AdapterType": "tablestore", "TableStoreName": "", "FeatureKey": "user:uid", "UserFeatureKeyName" :"uid", "TableStoreTableName" : "", "UserSelectFields":"", "FeatureStore":"user" }, "Features" :[] }, { "FeatureDaoConf": { "AdapterType": "tablestore", "TableStoreName": "", "FeatureKey": "item:id", "ItemFeatureKeyName" :"item_id", "TableStoreTableName" : "", "ItemSelectFields":"", "FeatureStore":"item" }, "Features" :[] } ] } } ``` * AdapterType 数据源类型,此处固定值 tablestore * TableStoreName 配置的 tablestore 名称,名称可以从 TableStoreConfs 找到 * FeatureKey 根据哪个值,去从表里查找特征数据。FeatureKey 标明查找的值来自于 user 或者 item 的哪个字段。 * UserFeatureKeyName 表里的字段名称,这个字段的值就是 FeatureKey * TableStoreTableName 表名称 * UserSelectFields 获取 user 特征的字段列表, * 说明获取表的所有字段。 FeatureStore = user 必填 * ItemSelectFields 获取 item 特征的字段列表, * 说明获取表的所有字段。FeatureStore = item 必填 * FeatureStore 特征获取到之后,存储到哪里。user 和 item 都有属性的字段 Properties, 这个 Properties 是个 map, 用来存储特征数据。 ### FeatureStore(特征平台) ```json { "FeatureConfs" :{ "rank_v1": { "AsynLoadFeature": true, "FeatureLoadConfs": [ { "FeatureDaoConf": { "AdapterType": "featurestore", "FeatureStoreName": "pairec-fs", "FeatureKey": "user:uid", "FeatureStoreModelName": "rank_v1", "FeatureStoreEntityName": "user", "FeatureStore": "user" } } ] } } } ``` * AdapterType 数据源类型,此处固定值 featurestore * FeatureStoreName 配置的 featurestore 名称,名称可以从 FeatureStoreConfs 找到 * FeatureKey 根据哪个值,去特征平台获取特征数据。 是设置成 FeatureStoreEntityName 的 join_id 的值去查找数据 * FeatureStoreModelName 特征平台的 model 名称 * FeatureStoreEntityName 特征平台的实体名称 * FeatureStore 特征获取到之后,存储到哪里。user 和 item 都有属性的字段 Properties, 这个 Properties 是个 map, 用来存储特征数据。 ## 特征变化 上面提到 , Features 可以做些特征转化的工作。 来看下常用到的具体几个例子 经常会用到 day_h, week_day 两个特征, 这两个特征是实时生成的。 ```json { "FeatureType": "new_feature", "FeatureName": "day_h", "Normalizer": "hour_in_day", "FeatureStore": "user" } ``` ```json { "FeatureType": "new_feature", "FeatureName": "week_day", "Normalizer": "weekday", "FeatureStore": "user" } ``` * FeatureType = new_feature 新生成特征 * FeatureName 特征名称 生成随机数, 有些时候会用到随机数进行概率判断。 下面会生成 rand_int_v 特征,区间在 [0 - 100) ```json { "FeatureType": "new_feature", "FeatureName": "rand_int_v", "Normalizer": "random", "FeatureStore": "user" } ``` 生成固定值, 生成 alg 特征名称,值为 ALRC ```json { "FeatureType": "new_feature", "FeatureStore": "user", "Normalizer": "const_value", "FeatureValue": "ALRC", "FeatureName": "alg" } ``` 根据表达式生成特征, 下面生成了 bool 的特征, is_retarget, 通过判断 recall_name 是否在数组中。 bool 的特征的值,实际用 1 or 0 表示。 ```json { "FeatureType": "new_feature", "FeatureStore": "item", "FeatureSource": "item:recall_name", "Normalizer": "expression", "Expression": "recall_name in ('retarget_u2i','realtime_retarget_click')", "FeatureName": "is_retarget" } ``` * Expression 表达式, 表达式规则,可以参考 [https://github.com/Knetic/govaluate](https://github.com/Knetic/govaluate) * FeatureSource 表示特征值来源于哪里, item:recall_name, 说明来源于 item 的 recall_name 特征。如果 Expression 中有多个 item 的属性值,FeatureSource 可以不设置,会把 item 的所有属性值传入到 Expression 进行计算。