华东师范大学学报(自然科学版) ›› 2018, Vol. 2018 ›› Issue (3): 77-87.doi: 10.3969/j.issn.1000-5641.2018.03.009

• 计算机科学 • 上一篇    下一篇

基于特征工程的视频点击率预测算法

匡俊1, 唐卫红2, 陈雷慧1, 陈辉3, 曾炜3, 董启民4, 高明1   

  1. 1. 华东师范大学 数据科学与工程学院, 上海 200062;
    2. 上海市农业技术推广服务中心, 上海 201103;
    3. 深圳腾讯计算机系统有限公司, 北京 100080;
    4. 林西县职业技术教育中心, 内蒙古 林西 025250
  • 收稿日期:2017-05-19 出版日期:2018-05-25 发布日期:2018-05-29
  • 通讯作者: 董启民,男,中学一级教师,研究方向为信息处理技术.E-mail:418976195@qq.com E-mail:董启民,男,中学一级教师,研究方向为信息处理技术.E-mail:418976195@qq.com
  • 作者简介:匡俊,男,硕士研究生,研究方向为用户行为分析、点击率预测.E-mail:15001830063@163.com.
  • 基金资助:
    国家重点研发计划(2016YFB1000905);国家自然科学基金广东省联合重点项目(U1401256);国家自然科学基金(61672234,61502236,61472321)

Algorithm for video click-through rate prediction

KUANG Jun1, TANG Wei-hong2, CHEN Lei-hui1, CHEN Hui3, ZENG Wei3, DONG Qi-min4, GAO Ming1   

  1. 1. School of Data Science and Engineering, East China Normal University, Shanghai 200062, China;
    2. Shanghai Agricultural Technology Extension and Service Center, Shanghai 201103, China;
    3. Shenzhen Tencent Computer System Co. Ltd., Beijing 100080, China;
    4. Vocational and Technical Education Center of Linxi County, Linxi Inner Mongolia 025250, China
  • Received:2017-05-19 Online:2018-05-25 Published:2018-05-29

摘要: 点击率预测技术在视频推荐系统中具有重要的作用.视频推荐系统可以根据点击率预测的结果调整投放顺序,从而提高用户的真实点击率.在点击率预测问题中,由于数据存在海量性以及不平衡性等问题,点击率预测的精确度一般都较低.针对以上问题,使用特征工程和机器学习相结合的方法,有效地改进了现有的视频点击率预测算法的性能.首先,使用特征工程方法,从原始数据中提取特征,并使用矩阵分解等方法生成交叉特征;然后,分别基于逻辑回归、因子分解机和梯度提升决策树-逻辑回归实现点击率预测模型.实验结果表明,基于因子分解机模型和基于梯度提升决策树-逻辑回归模型的预测精度要优于基于逻辑回归的模型,并且将用户特征和视频特征进行交叉组合能够改进点击率预测的精度.

关键词: 点击率预测, 特征工程, 因子分解机, 梯度提升决策树

Abstract: Click-through rate prediction has played an important role in video recommendation systems. A video recommendation system can suggest media to users based on the results of click-through rate prediction. In this way, users may be more likely to click the videos recommended by platforms. However, given the volume and imbalance of data in some applications, the accuracy of click-through rate prediction may be very low. To improve the performance, this paper proposes an integrated approach by combining feature engineering with techniques from machine learning. In the first stage, the algorithm uses feature engineering to extract user, video, and combinational features from the original dataset. In the second stage, the algorithm predicts the click-through rate by employing supervised models of logistic regression, factorization machine, and gradient boosting decision tree combined with logistic regression. The experimental results illustrate that the prediction accuracy of the factorization machine model and the gradient boosting decision tree combined with logistic regression model are better than the logistic regression model. Moreover, the cross combination of user and video features can improve the accuracy of the click-through rate prediction.

Key words: click-through rate prediction, feature engineering, factorization machine, gradient boosting decision tree

中图分类号: