点击率预测技术在视频推荐系统中具有重要的作用.视频推荐系统可以根据点击率预测的结果调整投放顺序,从而提高用户的真实点击率.在点击率预测问题中,由于数据存在海量性以及不平衡性等问题,点击率预测的精确度一般都较低.针对以上问题,使用特征工程和机器学习相结合的方法,有效地改进了现有的视频点击率预测算法的性能.首先,使用特征工程方法,从原始数据中提取特征,并使用矩阵分解等方法生成交叉特征;然后,分别基于逻辑回归、因子分解机和梯度提升决策树-逻辑回归实现点击率预测模型.实验结果表明,基于因子分解机模型和基于梯度提升决策树-逻辑回归模型的预测精度要优于基于逻辑回归的模型,并且将用户特征和视频特征进行交叉组合能够改进点击率预测的精度.
Click-through rate prediction has played an important role in video recommendation systems. A video recommendation system can suggest media to users based on the results of click-through rate prediction. In this way, users may be more likely to click the videos recommended by platforms. However, given the volume and imbalance of data in some applications, the accuracy of click-through rate prediction may be very low. To improve the performance, this paper proposes an integrated approach by combining feature engineering with techniques from machine learning. In the first stage, the algorithm uses feature engineering to extract user, video, and combinational features from the original dataset. In the second stage, the algorithm predicts the click-through rate by employing supervised models of logistic regression, factorization machine, and gradient boosting decision tree combined with logistic regression. The experimental results illustrate that the prediction accuracy of the factorization machine model and the gradient boosting decision tree combined with logistic regression model are better than the logistic regression model. Moreover, the cross combination of user and video features can improve the accuracy of the click-through rate prediction.
[1] RENDLE S. Factorization machines[C]//IEEE International Conference on Data Mining. IEEE Computer Society, 2010:995-1000.
[2] FRIEDMAN J H. Greedy function approximation:A gradient boosting machine[J]. Annals of Statistics, 2001, 29(5):1189-1232.
[3] HE X, PAN J, JIN O, et al. Practical lessons from predicting clicks on ads at Facebook[C]//Proceedings of the 8th International Workshop on Data Mining for Online Advertising. ACM, 2014:1-9.
[4] 纪文迪, 王晓玲, 周傲英. 广告点击率估算技术综述[J]. 华东师范大学学报(自然科学版), 2013(3):1-14.
[5] RICHARDSON M, DOMINOWSKA E, RAGNO R. Predicting clicks:Estimating the click-through rate for new ads[C]//International Conference on World Wide Web. ACM, 2007:521-530.
[6] CHAPELLE O, ZHANG Y. A dynamic bayesian network click model for web search ranking[C]//International Conference on World Wide Web. ACM, 2009:1-10.
[7] GRAEPEL T, CANDELA J Q, BORCHERT T, et al. Web-scale Bayesian click-through rate prediction for sponsored search advertising in Microsoft's Bing Search engine[C]//International Conference on Machine Learning. DBLP, 2010:13-20.
[8] JOACHIMS T. Optimizing search engines using click-through data[C]//Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2002:133-142.
[9] SHAN L, LIN L, SUN C, et al. Predicting ad click-through rates via feature-based fully coupled interaction tensor factorization[J]. Electronic Commerce Research & Applications, 2016, 16(C):30-42.
[10] YAN L, LI W J, XUE G R, et al. Coupled group lasso for web-scale CTR prediction in display advertising[C]//International Conference on Machine Learning. 2014:802-810.
[11] AGARWAL D, LONG B, TRAUPMAN J, et al. LASER:A scalable response prediction platform for online advertising[C]//ACM International Conference on Web Search and Data Mining. ACM, 2014:173-182.
[12] AQUIAR E, NAGRECHA S, CHAWLA N V. Predicting online video engagement using clickstreams[C]//IEEE International Conference on Data Science and Advanced Analytics (DSAA). IEEE, 2015. DOI:10.1109/DSAA.2015.7344873.
[13] 李思琴, 林磊, 孙承杰. 基于卷积神经网络的搜索广告点击率预测[J]. 智能计算机与应用, 2015(5):22-25.
[14] SCHAPIRE R E. A brief introduction to boosting[C]//16th International Joint Conference on Artificial Intelligence.[S.l.]:Morgan Kaufmann Publishers Inc, 1999:1401-1406.
[15] QUINLAN J R. Induction on decision tree[J]. Machine Learning, 1986(1):81-106.
[16] HARTIGAN J A, WONG M A. Algorithm AS 136:A k-means clustering algorithm[J]. Applied Statistics, 1979, 28(1):100-108.
[17] BREIMAN L. Out-of-bag estimation[R]. Berkeley:University of California, 1996.
[18] BREIMAN L. Bagging Predictors[M].[S.l.]:Kluwer Academic Publishers, 1996.
[19] CHEN T, GUESTRIN C. XGBoost:A scalable tree boosting system[C]//ACM SIGKDD International Conference. ACM, 2016:785-794.