Generative Long-term User Interest Modeling for Click-Through Rate Prediction
生成长期用户兴趣建模用于点击通过率预测
Jiangli Shao, Kaifu Zheng, Hao Fang, Huimu Ye, Zhiwei Liu, Bo Zhang, Shu Han, Xingxing Wang
AI总结 本文提出GenLI模型,通过生成兴趣模块、行为检索模块和兴趣融合模块,提升CTR预测的准确性和效率,解决传统方法中长期兴趣建模不完整和效率低的问题。
详情
通过大规模历史用户行为建模长期用户兴趣可提升广告和推荐系统中点击通过率(CTR)预测性能。通常采用两阶段框架,其中通用搜索单元(GSU)首先检索目标物品的相关行为,精确搜索单元(ESU)通过定制注意力生成兴趣特征。然而,当前以目标为中心的GSU会忽略其他潜在用户兴趣,导致兴趣特征不完整和偏差。此外,GSU中的匹配基于检索过程依赖于目标物品与每个历史行为之间的成对相似度分数,这不仅使在线服务在用户行为增长时变得耗时,还忽略了用户行为间的交互信息。为解决这些问题,我们提出了一种名为GenLI的生成长期用户兴趣模型用于CTR预测。GenLI包括兴趣生成模块(IGM)、行为检索模块(BRM)和兴趣融合模块(IFM)。IGM生成多个兴趣分布以表示实时用户兴趣的不同方面,该模块是目标无关的,并且结合行为间的交互信息,确保兴趣特征的完整和多样化。BRM通过简单的查找操作选择相关行为,将加权每个行为的时间复杂度降低到O(1)。最后,IFM使用精细的门控机制生成兴趣特征。基于生成过程,GenLI提高了用户兴趣的多样性,避免了基于匹配的行为检索,实现了CTR预测在准确性和效率之间的更好平衡。
Modeling long-term user interests with massive historical user behaviors enhances click-through rate (CTR) prediction performance in advertising and recommendation systems. Typically, a two-stage framework is widely adopted, where a general search unit (GSU) first retrieves top-$k$ relevant behaviors towards the target item, and an exact search unit (ESU) generates interest features via tailored attention. However, current target-centered GSU would ignore other latent user interests, leading to incomplete and biased interest features. Additionally, the matching-based retrieval process in GSUs depends on the pairwise similarity score between target item and each historical behavior, which not only becomes time-consuming for online services as user behaviors continue to grow, but also overlooks the interaction information among user behaviors. To combat these problems, we propose a \textbf{Gen}erative \textbf{L}ong-term user \textbf{I}nterest model named GenLI for CTR prediction. GenLI consists of an interest generation module (IGM), a behavior retrieval module (BRM), and an interest fusion module (IFM). The IGM generates multiple interest distributions to indicate different aspects of real-time user interests, which is target-independent and incorporates interaction information among behaviors, ensuring complete and diverse interest features. The BRM selects related behaviors via a simple lookup operation, reducing the time complexity for weighting each behavior to $O(1)$. Finally, the IFM uses delicate gating mechanisms to generate interest features. Based on the generation process, GenLI improves the diversity of user interests and avoids complex matching-based behavioral retrieval, achieving a better balance between accuracy and efficiency for CTR prediction.