User 特征预取

背景

在某些场景下,当接收到推荐请求后,首先会预取 user 特征,然后才进入推荐流程中。

  • 阻塞获取 user 特征, 因为 user 特征会在召回或者过滤阶段用到, 必须先取到才能走接下来的流程。

  • user 特征预取, 为了性能考虑,需要异步取 user 特征, 之后才在粗排或者精排里用到。

同时, user 特征的预取也需要 AB 实验参数的支持。

User 特征的获取,配置上是与特征获取相同的。

阻塞获取 User 特征

使用 UserFeatureConfs 进行配置。 也是支持多场景配置, home_feed 是场景名称。 可以看到和 FeatureConfs 配置是完全类似的。 UserFeatureConfs 会在整个推荐流程之前执行,并且是阻塞式的,加载完特征之后才进行推荐流程。

{
  "UserFeatureConfs": {
    "home_feed": {
      "AsynLoadFeature": true,
      "FeatureLoadConfs": [
        {
          "FeatureDaoConf": {
            "AdapterType": "hologres",
            "HologresName": "pairec-holo",
            "FeatureKey": "user:uid",
            "UserFeatureKeyName": "client_str",
            "HologresTableName": "dwd_ali_user_all_feature_v2_holo",
            "UserSelectFields": "*",
            "FeatureStore": "user"
          },
          "Features": [
            {
              "FeatureType": "new_feature",
              "FeatureName": "day_h",
              "Normalizer": "hour_in_day",
              "FeatureStore": "user"
            },
            {
              "FeatureType": "new_feature",
              "FeatureName": "week_day",
              "Normalizer": "weekday",
              "FeatureStore": "user"
            }
          ]
        },
        {
          "FeatureDaoConf": {
            "AdapterType": "be",
            "BeName": "be-pairec",
            "BizName": "shihuo_sequence_feature",
            "FeatureKey": "user:uid",
            "UserFeatureKeyName": "user_id",
            "ItemFeatureKeyName": "item_id",
            "BeItemFeatureKeyName": "item_id",
            "TimestampFeatureKeyName": "timestamp",
            "BeTimestampFeatureKeyName": "event_time",
            "BeEventFeatureKeyName": "event_type",
            "FeatureType": "sequence_feature",
            "NoUsePlayTimeField": true,
            "SequencePlayTime": "",
            "SequenceLength": 50,
            "SequenceDelim": ";",
            "SequenceDimFields": "",
            "SequenceName": "click_50_seq",
            "FeatureStore": "user",
            "SequenceEvent": "click"
          },
          "Features": []
        }
      ]
    }
   }
}
   

非阻塞获取 User 特征

如果想获取 user 特征而不阻塞推荐流程,可以加 FeatureAsyncLoad 配置为 true 。

下面配置了获取了两类特征,从 dwd_ali_user_all_feature_v2_holo 获取 和获取 click_50_seq 序列特征。 我们看到获取 dwd_ali_user_all_feature_v2_holo 没有配置 FeatureAsyncLoad, 那么这些特征是阻塞式获取的,获取 click_50_seq 序列特征配置了 FeatureAsyncLoad=true, 那么就是异步获取的。

{
  "UserFeatureConfs": {
    "home_feed": {
      "AsynLoadFeature": true,
      "FeatureLoadConfs": [
        {
          "FeatureDaoConf": {
            "AdapterType": "hologres",
            "HologresName": "pairec-holo",
            "FeatureKey": "user:uid",
            "UserFeatureKeyName": "client_str",
            "HologresTableName": "dwd_ali_user_all_feature_v2_holo",
            "UserSelectFields": "*",
            "FeatureStore": "user"
          },
          "Features": [
            {
              "FeatureType": "new_feature",
              "FeatureName": "day_h",
              "Normalizer": "hour_in_day",
              "FeatureStore": "user"
            },
            {
              "FeatureType": "new_feature",
              "FeatureName": "week_day",
              "Normalizer": "weekday",
              "FeatureStore": "user"
            }
          ]
        },
        {
          "FeatureDaoConf": {
            "AdapterType": "be",
            "FeatureAsyncLoad": true,
            "BeName": "be-pairec",
            "BizName": "shihuo_sequence_feature",
            "FeatureKey": "user:uid",
            "UserFeatureKeyName": "user_id",
            "ItemFeatureKeyName": "item_id",
            "BeItemFeatureKeyName": "item_id",
            "TimestampFeatureKeyName": "timestamp",
            "BeTimestampFeatureKeyName": "event_time",
            "BeEventFeatureKeyName": "event_type",
            "FeatureType": "sequence_feature",
            "NoUsePlayTimeField": true,
            "SequencePlayTime": "",
            "SequenceLength": 50,
            "SequenceDelim": ";",
            "SequenceDimFields": "",
            "SequenceName": "click_50_seq",
            "FeatureStore": "user",
            "SequenceEvent": "click"
          },
          "Features": []
        }
      ]
    }
    }
}

预取 User 特征并放入 cache

为了性能考虑,在执行推荐流程之前,我们可以异步的获取 user 特征,并且把获取到的特征暂存起来。如果后面用到的话,才真正放到 user 特征里。

前面提到,设置 FeatureAsyncLoad = true 就可以异步的获取特征。

下面的配置中,新增了 “CacheFeaturesName” :”test”, 获取到的特征会存到名称为 test 的 cache map 中。而不是真正放入到 user 特征中。后面用到特征,才需要引用。每组特征都可以使用不同的 CacheFeaturesName 名称。

{
  "UserFeatureConfs": {
    "home_feed": {
      "AsynLoadFeature": true,
      "FeatureLoadConfs": [
        {
         "FeatureDaoConf": {
            "AdapterType": "hologres",
            "HologresName": "pairec-holo",
            "FeatureKey": "user:uid",
            "UserFeatureKeyName": "client_str",
            "FeatureAsyncLoad": true,
            "HologresTableName": "dwd_ali_user_all_feature_v2_holo",
            "UserSelectFields": "*",
            "CacheFeaturesName" :"test",
            "FeatureStore": "user"
          },
          "Features": [
            {
              "FeatureType": "new_feature",
              "FeatureName": "day_h",
              "Normalizer": "hour_in_day",
              "FeatureStore": "user"
            },
            {
              "FeatureType": "new_feature",
              "FeatureName": "week_day",
              "Normalizer": "weekday",
              "FeatureStore": "user"
            }
          ]
        },

      
        {
          "FeatureDaoConf": {
            "AdapterType": "be",
            "FeatureAsyncLoad": true,
            "BeName": "be-pairec",
            "BizName": "shihuo_sequence_feature",
            "FeatureKey": "user:uid",
            "UserFeatureKeyName": "user_id",
            "ItemFeatureKeyName": "item_id",
            "BeItemFeatureKeyName": "item_id",
            "TimestampFeatureKeyName": "timestamp",
            "BeTimestampFeatureKeyName": "event_time",
            "BeEventFeatureKeyName": "event_type",
            "FeatureType": "sequence_feature",
            "NoUsePlayTimeField": true,
            "SequencePlayTime": "",
            "SequenceLength": 50,
            "SequenceDelim": ";",
            "SequenceDimFields": "",
            "SequenceName": "click_50_seq",
            "CacheFeaturesName" :"test",
            "FeatureStore": "user",
            "SequenceEvent": "click"
          },
          "Features": []
        }
      ]
    }
    }
}

引用 User 的 cache 特征

在粗排或者精排引用预加载的 user 特征,引用的特征需要在 FeatureConfs 里配置。

引用特征,不需要并发的获取,设置 AsynLoadFeature = false 即可。

LoadFromCacheFeaturesName 表明从哪个 cache map 中获取特征。 名称与 CacheFeaturesName 保持一致。

{
    "FeatureConfs":{
        "home_feed_rebuild_v22":{
            "AsynLoadFeature":false,
            "FeatureLoadConfs":[
                {
                    "FeatureDaoConf":{
                        "LoadFromCacheFeaturesName":"test",
                        "FeatureStore":"user"
                    }
                },
                {
                    "FeatureDaoConf":{
                        "LoadFromCacheFeaturesName":"test1",
                        "FeatureStore":"user"
                    }
                }
            ]
        }
    }
}

实验参数支持

与 features.scene.name 用法类似,使用 user_features.scene.name 进行实验参数配置,来获取不同的 user 特征。

多阶段加载特征的支持

在一个推荐请求中,会在多个阶段中用到 user 特征,比如 召回阶段,dssm 或者 mind 召回里会用到。打分阶段, 粗排(generalrank), 精排(rank) 都会使用。 如果在每个阶段做各种实验的话,而且实验都需要不同的特征, 使用同一个 user_features.scene.name 来加载特征无法很好的支持。在预加载 user 特征阶段,需要根据不同阶段的需要来加载特征。 那么每个阶段只需要关心本阶段的 scene name 即可。如果支持多阶段加载特征,需要实验配置里设置

{
  "user_features.multistage.on": true
}

三个阶段的名称分别是:

  • user_features.stage.recall.scene.name

  • user_features.stage.generalrank.scene.name

  • user_features.stage.rank.scene.name

如果某些特征加载逻辑在不同的阶段共用,可以放置到

  • user_features.stage.base.scene.name

举个例子说明下。 如果在 rank 阶段,配置了两个实验,两个实验需要不同的 user 特征。 在引擎配置里,可以这样配置

{
  // UserFeatureConfs 配置
  "UserFeatureConfs": {
    "home_feed_base": {  // 通用特征
      "AsynLoadFeature": true,
      "FeatureLoadConfs": [
        {
         "FeatureDaoConf": {
 
            "FeatureAsyncLoad": true,
            "CacheFeaturesName" :"home_feed_base",
            "FeatureStore": "user"
          },
          "Features": []
        }
      ]
    },
    "home_feed_rank_v1": {
      "AsynLoadFeature": true,
      "FeatureLoadConfs": [
        {
         "FeatureDaoConf": {
             // 特征加载具体配置
            "FeatureAsyncLoad": true,
            "CacheFeaturesName" :"home_feed_rank_v1",
            "FeatureStore": "user"
          },
          "Features": []
        },
        {
         "FeatureDaoConf": {
             // 特征加载具体配置
            "FeatureAsyncLoad": true,
            "CacheFeaturesName" :"home_feed_rank_v1",
            "FeatureStore": "user"
          },
          "Features": []
        }
      ]
    },
    
    "home_feed_rank_v2": {
      "AsynLoadFeature": true,
      "FeatureLoadConfs": [
        {
         "FeatureDaoConf": {
             // 特征加载具体配置
            "FeatureAsyncLoad": true,
            "CacheFeaturesName" :"home_feed_rank_v2",
            "FeatureStore": "user"
          },
          "Features": []
        },
        {
         "FeatureDaoConf": {
             // 特征加载具体配置
            "FeatureAsyncLoad": true,
            "CacheFeaturesName" :"home_feed_rank_v2",
            "FeatureStore": "user"
          },
          "Features": []
        }
      ]
    }
    },
  // FeatureConfs 配置
      "FeatureConfs":{
        "home_feed_rank_v1":{
            "AsynLoadFeature":false,
            "FeatureLoadConfs":[
                {
                    "FeatureDaoConf":{
                        "LoadFromCacheFeaturesName":"home_feed_rank_v1",
                        "FeatureStore":"user"
                    }
                },
                {
                    "FeatureDaoConf":{
                        "LoadFromCacheFeaturesName":"home_feed_base",
                        "FeatureStore":"user"
                    }
                }
            ]
        },
        "home_feed_rank_v2":{
            "AsynLoadFeature":false,
            "FeatureLoadConfs":[
                {
                    "FeatureDaoConf":{
                        "LoadFromCacheFeaturesName":"home_feed_rank_v2",
                        "FeatureStore":"user"
                    }
                },
                {
                    "FeatureDaoConf":{
                        "LoadFromCacheFeaturesName":"home_feed_base",
                        "FeatureStore":"user"
                    }
                }
            ]
        }
    }
}

UserFeatureConfs 里配置 home_feed_rank_v1 和 home_feed_rank_v2 用于预加载不同的特征, home_feed_base 加载通用的特征。FeatureConfs 表示如何加载特征,这里是引用 user cache 里的特征, home_feed_rank_v1 和 home_feed_rank_v2 分别引用不同的特征, home_feed_base 引用共同的特征。

在 rank 实验组里, 实验组配置可以是

{
  "user_features.multistage.on": true,
  "user_features.stage.base.scene.name": "home_feed_base"
}

rank 实验1的 配置是

{
	"user_features.stage.rank.scene.name": "home_feed_rank_v1",
	"features.scene.name" :"home_feed_rank_v1"
}

Rank 实验2的配置是

{
	"user_features.stage.rank.scene.name": "home_feed_rank_v2",
	"features.scene.name" :"home_feed_rank_v2"
}