特征配置

在 rank 阶段,需要调用打分模型服务。在这之前,需要从特征存储源里获取到 user 或者 item 的特征数据。在某些情况下,获取到的特征数据还需要进一步做处理,比如特征工程,根据已有的特征生成新的特征,根据现有特征进行组合等,这些需要 FeatureOp 实现。

Hologres

通过一个具体的例子,来详细的说明。

下面的配置中,live_feed 指的是场景名称。FeatureConfs 支持配置多个场景信息,在例子中,只配置了一个场景。

FeatureLoadConfs 定义了具体的特征获取逻辑,本身是一个列表,可以配置多个特征获取步骤。每个步骤又包括获取逻辑和特征处理逻辑。FeatureDaoConf 定义特征获取逻辑,Features 定义特征变换逻辑。

FeatureDaoConf 提供了配置存在哪里,以及获取具体的获取逻辑。

  • AdapterType 数据源类型,目前支持 hologres, redis, tablestore

  • HologresName 配置的 hologres 名称,名称可以从 HologresConfs 找到

  • FeatureKey 根据哪个值,去从表里查找特征数据。FeatureKey 标明查找的值来自于 user 或者 item 的哪个字段。比如,user:uid 获取 user 的 uid 属性值,item:pair_id 获取 item 的 pair_id 的属性值。

  • UserFeatureKeyName 表里的字段名称,这个字段的值就是 FeatureKey

  • HologresTableName 表名称

  • UserSelectFields 获取 user 特征的字段列表, * 说明获取表的所有字段

  • ItemSelectFields 获取 item 特征的字段列表, * 说明获取表的所有字段

  • FeatureStore 特征获取到之后,存储到哪里。user 和 item 都有属性的字段 Properties, 这个 Properties 是个 map, 用来存储特征数据。

获取 user 特征可以这样理解

SELECT ${UserSelectFields} FROM ​${HologresTableName} WHERE ​${UserFeatureKeyName} = ${FeatureKey} 。

这里的 UserSelectFields , ItemSelectFields 是根据样本的特征来的。在调用模型之前,需要构造出和样本数据一样的特征。

在获取特征的时候,是根据现有的数据来获取的,一般是 uid 或者 itemid, 分别这样表示 user:uid 和 item:id。 如果根据其他的字段,那么字段必须存在在 Properties 字段中。像示例中的 item:pair_id 和 item:matchmaker_id, pair_id 和 matchmaker_id 是存在 item 的 Properties 中的。

AsynLoadFeature = true 代码异步并发调用 多个 FeatureLoadConfs, 可以减少 RT 耗时。如果 FeatureLoadConfs 的获取逻辑独立,AsynLoadFeature 变为 true 可以提高获取性能。

 "FeatureConfs" :{
        "live_feed" :{
           "AsynLoadFeature" : true,
           "FeatureLoadConfs": [
                {
                    "FeatureDaoConf": {
                        "AdapterType": "hologres",
                        "HologresName": "holo-pai",
                        "FeatureKey": "user:uid",
                        "UserFeatureKeyName" :"uid",
                        "HologresTableName": "recom_user_features_processed_holo_online",
                        "UserSelectFields":"rids_count,sex,alladdfriendnum,allpayrosenum,getgiftnum7d,friendnum7d,talknum7d,start_age,end_age,start_height,end_height,lowest_education,lowest_salary,height,wealth,age,living_condition,education,headstatus,marriage,professionid,provinceid,role,salary,socialtag,facevalue",
                        "FeatureStore":"user"
                    },
                    "Features" :[
                    ]
                },
                {
                    "FeatureDaoConf": {
                        "AdapterType": "hologres",
                        "HologresName": "holo-pai",
                        "ItemFeatureKeyName" :"uid",
                        "FeatureKey": "item:pair_id",
                        "HologresTableName": "recom_user_features_processed_holo_online",
                        "ItemSelectFields":"uid, rids_count as rids2_count,sex as guestsex,alladdfriendnum as alladdfriendnum2,allpayrosenum as allpayrosenum2, getgiftnum7d as getgiftnum7d2,friendnum7d as friendnum7d2,talknum7d as talknum7d2,start_age as start_age2,end_age as end_age2,start_height as start_height2,end_height as end_height2,lowest_education as lowest_education2,lowest_salary as lowest_salary2,height as height2,wealth as wealth2,age as age2,living_condition as living_condition2,education as education2,headstatus as headstatus2,marriage as marriage2,professionid as professionid2,provinceid as provinceid2,role as role2,salary as salary2,socialtag as socialtag2,facevalue as facevalue2",
                        "FeatureStore":"item"
                    },
                    "Features" :[
                    ]
                },
                {
                    "FeatureDaoConf": {
                        "AdapterType": "hologres",
                        "HologresName": "holo-pai",
                        "ItemFeatureKeyName" :"cupid_id",
                        "FeatureKey": "item:matchmaker_id",
                        "HologresTableName": "recom_red_features_processed_holo_online",
                        "ItemSelectFields":"cupid_id, cupid_id as redid,sex as redsex,role as role1,good_num as good_num1,mid_num as mid_num1,bad_num as bad_num1,total_access as total_access1,duration as duration1,jubaohongniangshu as jubaohongniangshu1,jubaohongniangzongshu as jubaohongniangzongshu1",
                        "FeatureStore":"item"
                    },
                    "Features" :[
                    ]
                }
           ]
        }
   }

注意:在获取 item 的特征时候,ItemSelectFields 的第一个值一定是 ItemFeatureKeyName 配置的值。item 是个列表,在并发获取数据的时候,是通过 ItemFeatureKeyName 来找到匹配关系的。

在获取到特征之后,还可以对特征进行进一步处理,生成新特征或者特征的预处理等。

 "Features":[
        {
            "FeatureType":"raw_feature",
            "FeatureName":"article_id",
            "FeatureSource":"item:id",
            "FeatureStore":"item"
        },
        {
            "FeatureType":"raw_feature",
            "FeatureName":"item_elapse_time",
            "FeatureSource":"item:item_ctime",
            "Normalizer":"time_ln",
            "RemoveFeatureSource":true,
            "FeatureStore":"item"
        }
    ]
  • FeatureType FeatureOp 名称,raw_feature 指的是从原始特征直接生成第三方特征

  • FeatureName 新生成的特征名称

  • FeatureSource 原始特征

  • FeatureStore 新生成的特征存到哪个对象里,user 或者 item

  • Normalizer 可选,归一化处理操作名称,目前只有 time_ln

  • RemoveFeatureSource 可选,是否删除原始特征

Redis

在来看个 redis 的例子

redis 中存储特征使用 key value 形式,value 形式为 “key1:value2,key2:value2” 。

  • AdapterType 数据源为 redis

  • RedisName redis 数据源名称,可以从 RedisConfs 里找到

  • RedisPrefix key 前缀

  • FeatureKey 根据哪个值,去构造 redis key 值。FeatureKey 标明查找的值来自于 user 或者 item 的哪个字段。比如,user:uid 获取 user 的 uid 属性值,item:pair_id 获取 item 的 pair_id 的属性值。

  • FeatureStore 特征获取到之后,存储到哪里。user 和 item 都有属性的字段 Properties, 这个 Properties 是个 map, 用来存储特征数据。

 "FeatureConfs" :{
        "home_feed" :{
           "AsynLoadFeature" : true,
           "FeatureLoadConfs": [
                {
                    "FeatureDaoConf": {
                        "AdapterType": "redis",
                        "RedisName": "user_redis",
                        "RedisPrefix": "UF_V2_",
                        "FeatureKey": "user:uid",
                        "FeatureStore":"user"
                    },
                    "Features" :[]
                },
                {
                    "FeatureDaoConf": {
                        "AdapterType": "redis",
                        "RedisName": "item_redis",
                        "RedisPrefix": "IF_V2_FM_",
                        "FeatureKey": "item:id",
                        "FeatureStore":"item"
                    },
                    "Features" :[
                        {
                            "FeatureType": "raw_feature",
                            "FeatureName" : "article_id",
                            "FeatureSource" : "item:id",
                            "FeatureStore":"item"
                        },
                        {
                            "FeatureType": "raw_feature",
                            "FeatureName" : "item_elapse_time",
                            "FeatureSource" : "item:item_ctime",
                            "Normalizer": "time_ln",
                            "RemoveFeatureSource" : true,
                            "FeatureStore":"item"
                        }
                    ]
                }
           ]
        }
   }

在上面的例子中,还有 Features 的变化操作。

  • FeatureType FeatureOp 名称,raw_feature 指的是从原始特征直接生成第三方特征

  • FeatureName 新生成的特征名称

  • FeatureSource 原始特征

  • FeatureStore 新生成的特征存到哪个对象里,user 或者 item

  • Normalizer 可选,归一化处理操作名称

  • RemoveFeatureSource 可选,是否删除原始特征

OTS(tablestore)

OTS 的配置与 Hologres 类似, 样例配置如下:

"FeatureConfs" :{
        "home_feed" :{
           "AsynLoadFeature" : true,
           "FeatureLoadConfs": [
                {
                    "FeatureDaoConf": {
                        "AdapterType": "tablestore",
                        "TableStoreName": "",
                        "FeatureKey": "user:uid",
                        "UserFeatureKeyName" :"uid",
                        "TableStoreTableName" : "",
                        "UserSelectFields":"",
                        "FeatureStore":"user"
                    },
                    "Features" :[]
                },
                {
                    "FeatureDaoConf": {
                         "AdapterType": "tablestore",
                        "TableStoreName": "",
                        "FeatureKey": "item:id",
                        "ItemFeatureKeyName" :"item_id",
                        "TableStoreTableName" : "",
                        "ItemSelectFields":"",
                        "FeatureStore":"item"
                    },
                    "Features" :[]
                }
           ]
        }
   }
  • AdapterType 数据源类型,此处固定值 tablestore

  • TableStoreName 配置的 tablestore 名称,名称可以从 TableStoreConfs 找到

  • FeatureKey 根据哪个值,去从表里查找特征数据。FeatureKey 标明查找的值来自于 user 或者 item 的哪个字段。

  • UserFeatureKeyName 表里的字段名称,这个字段的值就是 FeatureKey

  • TableStoreTableName 表名称

  • UserSelectFields 获取 user 特征的字段列表, * 说明获取表的所有字段。 FeatureStore = user 必填

  • ItemSelectFields 获取 item 特征的字段列表, * 说明获取表的所有字段。FeatureStore = item 必填

  • FeatureStore 特征获取到之后,存储到哪里。user 和 item 都有属性的字段 Properties, 这个 Properties 是个 map, 用来存储特征数据。

FeatureStore(特征平台)

{
 "FeatureConfs" :{
     "rank_v1": {
         "AsynLoadFeature": true,
         "FeatureLoadConfs": [
         {
             "FeatureDaoConf": {
                 "AdapterType": "featurestore",
                 "FeatureStoreName": "pairec-fs",
                 "FeatureKey": "user:uid",
                 "FeatureStoreModelName": "rank_v1",
                 "FeatureStoreEntityName": "user",
                 "FeatureStore": "user"
            }
         }
         ]
     }
 }
}
  • AdapterType 数据源类型,此处固定值 featurestore

  • FeatureStoreName 配置的 featurestore 名称,名称可以从 FeatureStoreConfs 找到

  • FeatureKey 根据哪个值,去特征平台获取特征数据。 是设置成 FeatureStoreEntityName 的 join_id 的值去查找数据

  • FeatureStoreModelName 特征平台的 model 名称

  • FeatureStoreEntityName 特征平台的实体名称

  • FeatureStore 特征获取到之后,存储到哪里。user 和 item 都有属性的字段 Properties, 这个 Properties 是个 map, 用来存储特征数据。

特征变化

上面提到 , Features 可以做些特征转化的工作。 来看下常用到的具体几个例子

经常会用到 day_h, week_day 两个特征, 这两个特征是实时生成的。

{
 		"FeatureType": "new_feature",
    "FeatureName": "day_h",
    "Normalizer": "hour_in_day",
    "FeatureStore": "user"
}
{
    "FeatureType": "new_feature",
    "FeatureName": "week_day",
    "Normalizer": "weekday",
    "FeatureStore": "user"
}
  • FeatureType = new_feature 新生成特征

  • FeatureName 特征名称

生成随机数, 有些时候会用到随机数进行概率判断。 下面会生成 rand_int_v 特征,区间在 [0 - 100)

{
    "FeatureType": "new_feature",
    "FeatureName": "rand_int_v",
    "Normalizer": "random",
    "FeatureStore": "user"
}

生成固定值, 生成 alg 特征名称,值为 ALRC

{
    "FeatureType": "new_feature",
    "FeatureStore": "user",
    "Normalizer": "const_value",
    "FeatureValue": "ALRC",
    "FeatureName": "alg"
}

根据表达式生成特征, 下面生成了 bool 的特征, is_retarget, 通过判断 recall_name 是否在数组中。 bool 的特征的值,实际用 1 or 0 表示。

{
    "FeatureType": "new_feature",
    "FeatureStore": "item",
    "FeatureSource": "item:recall_name",
    "Normalizer": "expression",
    "Expression": "recall_name in ('retarget_u2i','realtime_retarget_click')",
    "FeatureName": "is_retarget"
}
  • Expression 表达式, 表达式规则,可以参考 https://github.com/Knetic/govaluate

  • FeatureSource 表示特征值来源于哪里, item:recall_name, 说明来源于 item 的 recall_name 特征。如果 Expression 中有多个 item 的属性值,FeatureSource 可以不设置,会把 item 的所有属性值传入到 Expression 进行计算。