User 特征预取¶
背景¶
在某些场景下,当接收到推荐请求后,首先会预取 user 特征,然后才进入推荐流程中。
阻塞获取 user 特征, 因为 user 特征会在召回或者过滤阶段用到, 必须先取到才能走接下来的流程。
user 特征预取, 为了性能考虑,需要异步取 user 特征, 之后才在粗排或者精排里用到。
同时, user 特征的预取也需要 AB 实验参数的支持。
User 特征的获取,配置上是与特征获取相同的。
阻塞获取 User 特征¶
使用 UserFeatureConfs 进行配置。 也是支持多场景配置, home_feed 是场景名称。 可以看到和 FeatureConfs 配置是完全类似的。 UserFeatureConfs 会在整个推荐流程之前执行,并且是阻塞式的,加载完特征之后才进行推荐流程。
{
"UserFeatureConfs": {
"home_feed": {
"AsynLoadFeature": true,
"FeatureLoadConfs": [
{
"FeatureDaoConf": {
"AdapterType": "hologres",
"HologresName": "pairec-holo",
"FeatureKey": "user:uid",
"UserFeatureKeyName": "client_str",
"HologresTableName": "dwd_ali_user_all_feature_v2_holo",
"UserSelectFields": "*",
"FeatureStore": "user"
},
"Features": [
{
"FeatureType": "new_feature",
"FeatureName": "day_h",
"Normalizer": "hour_in_day",
"FeatureStore": "user"
},
{
"FeatureType": "new_feature",
"FeatureName": "week_day",
"Normalizer": "weekday",
"FeatureStore": "user"
}
]
},
{
"FeatureDaoConf": {
"AdapterType": "be",
"BeName": "be-pairec",
"BizName": "shihuo_sequence_feature",
"FeatureKey": "user:uid",
"UserFeatureKeyName": "user_id",
"ItemFeatureKeyName": "item_id",
"BeItemFeatureKeyName": "item_id",
"TimestampFeatureKeyName": "timestamp",
"BeTimestampFeatureKeyName": "event_time",
"BeEventFeatureKeyName": "event_type",
"FeatureType": "sequence_feature",
"NoUsePlayTimeField": true,
"SequencePlayTime": "",
"SequenceLength": 50,
"SequenceDelim": ";",
"SequenceDimFields": "",
"SequenceName": "click_50_seq",
"FeatureStore": "user",
"SequenceEvent": "click"
},
"Features": []
}
]
}
}
}
非阻塞获取 User 特征¶
如果想获取 user 特征而不阻塞推荐流程,可以加 FeatureAsyncLoad 配置为 true 。
下面配置了获取了两类特征,从 dwd_ali_user_all_feature_v2_holo 获取 和获取 click_50_seq 序列特征。 我们看到获取 dwd_ali_user_all_feature_v2_holo 没有配置 FeatureAsyncLoad, 那么这些特征是阻塞式获取的,获取 click_50_seq 序列特征配置了 FeatureAsyncLoad=true, 那么就是异步获取的。
{
"UserFeatureConfs": {
"home_feed": {
"AsynLoadFeature": true,
"FeatureLoadConfs": [
{
"FeatureDaoConf": {
"AdapterType": "hologres",
"HologresName": "pairec-holo",
"FeatureKey": "user:uid",
"UserFeatureKeyName": "client_str",
"HologresTableName": "dwd_ali_user_all_feature_v2_holo",
"UserSelectFields": "*",
"FeatureStore": "user"
},
"Features": [
{
"FeatureType": "new_feature",
"FeatureName": "day_h",
"Normalizer": "hour_in_day",
"FeatureStore": "user"
},
{
"FeatureType": "new_feature",
"FeatureName": "week_day",
"Normalizer": "weekday",
"FeatureStore": "user"
}
]
},
{
"FeatureDaoConf": {
"AdapterType": "be",
"FeatureAsyncLoad": true,
"BeName": "be-pairec",
"BizName": "shihuo_sequence_feature",
"FeatureKey": "user:uid",
"UserFeatureKeyName": "user_id",
"ItemFeatureKeyName": "item_id",
"BeItemFeatureKeyName": "item_id",
"TimestampFeatureKeyName": "timestamp",
"BeTimestampFeatureKeyName": "event_time",
"BeEventFeatureKeyName": "event_type",
"FeatureType": "sequence_feature",
"NoUsePlayTimeField": true,
"SequencePlayTime": "",
"SequenceLength": 50,
"SequenceDelim": ";",
"SequenceDimFields": "",
"SequenceName": "click_50_seq",
"FeatureStore": "user",
"SequenceEvent": "click"
},
"Features": []
}
]
}
}
}
预取 User 特征并放入 cache¶
为了性能考虑,在执行推荐流程之前,我们可以异步的获取 user 特征,并且把获取到的特征暂存起来。如果后面用到的话,才真正放到 user 特征里。
前面提到,设置 FeatureAsyncLoad = true 就可以异步的获取特征。
下面的配置中,新增了 “CacheFeaturesName” :”test”, 获取到的特征会存到名称为 test 的 cache map 中。而不是真正放入到 user 特征中。后面用到特征,才需要引用。每组特征都可以使用不同的 CacheFeaturesName 名称。
{
"UserFeatureConfs": {
"home_feed": {
"AsynLoadFeature": true,
"FeatureLoadConfs": [
{
"FeatureDaoConf": {
"AdapterType": "hologres",
"HologresName": "pairec-holo",
"FeatureKey": "user:uid",
"UserFeatureKeyName": "client_str",
"FeatureAsyncLoad": true,
"HologresTableName": "dwd_ali_user_all_feature_v2_holo",
"UserSelectFields": "*",
"CacheFeaturesName" :"test",
"FeatureStore": "user"
},
"Features": [
{
"FeatureType": "new_feature",
"FeatureName": "day_h",
"Normalizer": "hour_in_day",
"FeatureStore": "user"
},
{
"FeatureType": "new_feature",
"FeatureName": "week_day",
"Normalizer": "weekday",
"FeatureStore": "user"
}
]
},
{
"FeatureDaoConf": {
"AdapterType": "be",
"FeatureAsyncLoad": true,
"BeName": "be-pairec",
"BizName": "shihuo_sequence_feature",
"FeatureKey": "user:uid",
"UserFeatureKeyName": "user_id",
"ItemFeatureKeyName": "item_id",
"BeItemFeatureKeyName": "item_id",
"TimestampFeatureKeyName": "timestamp",
"BeTimestampFeatureKeyName": "event_time",
"BeEventFeatureKeyName": "event_type",
"FeatureType": "sequence_feature",
"NoUsePlayTimeField": true,
"SequencePlayTime": "",
"SequenceLength": 50,
"SequenceDelim": ";",
"SequenceDimFields": "",
"SequenceName": "click_50_seq",
"CacheFeaturesName" :"test",
"FeatureStore": "user",
"SequenceEvent": "click"
},
"Features": []
}
]
}
}
}
引用 User 的 cache 特征¶
在粗排或者精排引用预加载的 user 特征,引用的特征需要在 FeatureConfs 里配置。
引用特征,不需要并发的获取,设置 AsynLoadFeature = false 即可。
LoadFromCacheFeaturesName 表明从哪个 cache map 中获取特征。 名称与 CacheFeaturesName 保持一致。
{
"FeatureConfs":{
"home_feed_rebuild_v22":{
"AsynLoadFeature":false,
"FeatureLoadConfs":[
{
"FeatureDaoConf":{
"LoadFromCacheFeaturesName":"test",
"FeatureStore":"user"
}
},
{
"FeatureDaoConf":{
"LoadFromCacheFeaturesName":"test1",
"FeatureStore":"user"
}
}
]
}
}
}
实验参数支持¶
与 features.scene.name 用法类似,使用 user_features.scene.name 进行实验参数配置,来获取不同的 user 特征。
多阶段加载特征的支持¶
在一个推荐请求中,会在多个阶段中用到 user 特征,比如 召回阶段,dssm 或者 mind 召回里会用到。打分阶段, 粗排(generalrank), 精排(rank) 都会使用。 如果在每个阶段做各种实验的话,而且实验都需要不同的特征, 使用同一个 user_features.scene.name 来加载特征无法很好的支持。在预加载 user 特征阶段,需要根据不同阶段的需要来加载特征。 那么每个阶段只需要关心本阶段的 scene name 即可。如果支持多阶段加载特征,需要实验配置里设置
{
"user_features.multistage.on": true
}
三个阶段的名称分别是:
user_features.stage.recall.scene.name
user_features.stage.generalrank.scene.name
user_features.stage.rank.scene.name
如果某些特征加载逻辑在不同的阶段共用,可以放置到
user_features.stage.base.scene.name
举个例子说明下。 如果在 rank 阶段,配置了两个实验,两个实验需要不同的 user 特征。 在引擎配置里,可以这样配置
{
// UserFeatureConfs 配置
"UserFeatureConfs": {
"home_feed_base": { // 通用特征
"AsynLoadFeature": true,
"FeatureLoadConfs": [
{
"FeatureDaoConf": {
"FeatureAsyncLoad": true,
"CacheFeaturesName" :"home_feed_base",
"FeatureStore": "user"
},
"Features": []
}
]
},
"home_feed_rank_v1": {
"AsynLoadFeature": true,
"FeatureLoadConfs": [
{
"FeatureDaoConf": {
// 特征加载具体配置
"FeatureAsyncLoad": true,
"CacheFeaturesName" :"home_feed_rank_v1",
"FeatureStore": "user"
},
"Features": []
},
{
"FeatureDaoConf": {
// 特征加载具体配置
"FeatureAsyncLoad": true,
"CacheFeaturesName" :"home_feed_rank_v1",
"FeatureStore": "user"
},
"Features": []
}
]
},
"home_feed_rank_v2": {
"AsynLoadFeature": true,
"FeatureLoadConfs": [
{
"FeatureDaoConf": {
// 特征加载具体配置
"FeatureAsyncLoad": true,
"CacheFeaturesName" :"home_feed_rank_v2",
"FeatureStore": "user"
},
"Features": []
},
{
"FeatureDaoConf": {
// 特征加载具体配置
"FeatureAsyncLoad": true,
"CacheFeaturesName" :"home_feed_rank_v2",
"FeatureStore": "user"
},
"Features": []
}
]
}
},
// FeatureConfs 配置
"FeatureConfs":{
"home_feed_rank_v1":{
"AsynLoadFeature":false,
"FeatureLoadConfs":[
{
"FeatureDaoConf":{
"LoadFromCacheFeaturesName":"home_feed_rank_v1",
"FeatureStore":"user"
}
},
{
"FeatureDaoConf":{
"LoadFromCacheFeaturesName":"home_feed_base",
"FeatureStore":"user"
}
}
]
},
"home_feed_rank_v2":{
"AsynLoadFeature":false,
"FeatureLoadConfs":[
{
"FeatureDaoConf":{
"LoadFromCacheFeaturesName":"home_feed_rank_v2",
"FeatureStore":"user"
}
},
{
"FeatureDaoConf":{
"LoadFromCacheFeaturesName":"home_feed_base",
"FeatureStore":"user"
}
}
]
}
}
}
UserFeatureConfs 里配置 home_feed_rank_v1 和 home_feed_rank_v2 用于预加载不同的特征, home_feed_base 加载通用的特征。FeatureConfs 表示如何加载特征,这里是引用 user cache 里的特征, home_feed_rank_v1 和 home_feed_rank_v2 分别引用不同的特征, home_feed_base 引用共同的特征。
在 rank 实验组里, 实验组配置可以是
{
"user_features.multistage.on": true,
"user_features.stage.base.scene.name": "home_feed_base"
}
rank 实验1的 配置是
{
"user_features.stage.rank.scene.name": "home_feed_rank_v1",
"features.scene.name" :"home_feed_rank_v1"
}
Rank 实验2的配置是
{
"user_features.stage.rank.scene.name": "home_feed_rank_v2",
"features.scene.name" :"home_feed_rank_v2"
}