计算机科学

基于特征优化的广告点击率预测模型研究

  • 贺小娟 ,
  • 郭新顺
展开
  • 上海对外经贸大学 统计与信息学院, 上海 201620

收稿日期: 2019-08-01

  网络出版日期: 2020-07-20

基金资助

教育部人文社科青年基金(18YJC630205)

Research on an advertising click-through rate prediction model based on feature optimization

  • HE Xiaojuan ,
  • GUO Xinshun
Expand
  • School of Statistics and Information, Shanghai University of International Business and Economics, Shanghai 201620, China

Received date: 2019-08-01

  Online published: 2020-07-20

摘要

针对互联网广告数据具有高维稀疏性的特点, 在现有的点击率(Click-Through Rate, CTR)预测问题的相关理论和技术基础上, 给出了一种基于梯度提升决策树(Gradient Boosting Decision Tree, GBDT)的卷积神经网络(Convolutional Neural Networks, CNN)在线广告特征提取模型(CNN Based on GBDT, CNN+). CNN+模型不仅能从原始数据中提取出深度高阶特征, 还能解决卷积神经网络在稀疏、高维特征中提取特征困难的问题. 在真实数据集上的实验结果表明, 与主成分分析(Principal Component Analysis, PCA)和梯度提升决策树这两种特征提取方法相比, CNN+模型提取的特征更加有效.

本文引用格式

贺小娟 , 郭新顺 . 基于特征优化的广告点击率预测模型研究[J]. 华东师范大学学报(自然科学版), 2020 , 2020(4) : 147 -155 . DOI: 10.3969/j.issn.1000-5641.201921007

Abstract

This paper proposes an online advertising feature extraction model of CNN (Convolutional Neural Networks) based on GBDT (Gradient Boosting Decision Tree) aimed at solving challenges with high-dimensional sparseness in Internet advertising data based on existing theories and technologies for click-through rate (CRT) prediction. The proposed model, CNN+, is able to extract deep, high-order features from raw data and solve the issues that convolutional neural networks face in extracting sparse and high-dimensional features. Experimental results on real datasets show that the features extracted by the CNN+ model are more effective than two other feature extraction methods studied, namely principal component analysis (PCA) and GBDT.

参考文献

[1] 高驰, 卢志茂. 在线广告发展态势与特性分析 [J]. 哈尔滨工业大学学报(社会科学版), 2003, 5(2): 122-125
[2] 周傲英, 周敏奇, 宫学庆. 计算广告: 以数据为核心的Web综合应用 [J]. 计算机学报, 2011, 34(10): 1805-1819
[3] RICHARDSON M, DOMINOWSKA E, RAGNO R. Predicting clicks: Estimating the click-through rate for new ads [C]// Proceedings of the 16th International Conference on World Wide Web. ACM, 2007: 521-530.
[4] 沈方瑶, 戴国骏, 代成雷, 等. 基于特征关联模型的广告点击率预测 [J]. 清华大学学报(自然科学版), 2018, 58(4): 374-379
[5] 李春红, 吴英, 覃朝勇. 基于LASSO变量选择方法的网络广告点击率预测模型研究 [J]. 数理统计与管理, 2016, 35(5): 803-809
[6] YAN L, LI W J, XUE G R, et al. Coupled group lasso for Web-scale CTR prediction in display advertising [J]. Proceedings of Machine Learning Research, 2014, 32(2): 802-810.
[7] HE X R, PAN J F, JIN O, et al. Practical lessons from predicting clicks on ads at Facebook [C]// Proceedings of the 8th International Workshop on Data Mining for Online Advertising, ADKDD 2014. ACM, 2014: 5:1-5:9.
[8] 魏晓航, 于重重, 田嫦丽, 等. 大数据平台下的互联网广告点击率预估模型 [J]. 计算机工程与设计, 2017, 38(9): 2504-2508
[9] 张志强, 周永, 谢晓芹, 等. 基于特征学习的广告点击率预估技术研究 [J]. 计算机学报, 2016, 39(4): 780-794. DOI: 10.11897/SP.J.1016.2016.00780
[10] 杨长春, 梅佳俊, 吴云, 等. 基于特征降维和DBN的广告点击率预测 [J]. 计算机工程与设计, 2018, 39(12): 3700-3704
[11] CHENG H T, KOC L, HARMSEN J, et al. Wide & deep learning for recommender systems [C]// DLRS 2016: Proceedings of the 1st Workshop on Deep Learning for Recommender Systems. ACM, 2016: 7-10. DOI: 10.1145/2988450.2988454.
[12] ABDI H, WILLIAMS L. Principal component analysis [J]. Wiley Interdisciplinary Reviews: Computational Statistics, 2010, 2(4): 433-459. DOI: 10.1002/wics.101.
[13] 肖垚, 毕军芳, 韩易, 等. 在线广告中点击率预测研究 [J]. 华东师范大学学报(自然科学版), 2017(5): 80-86. DOI: 10.3969/j.issn.1000-5641.2017.05.008
[14] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks [C]//NIPS’12: Proceedings of the 25th International Conference on Neural Information Processing Systems- Volume 1. New York:Curran Associates Inc., 2012: 1097-1105.
[15] MA L, LU Z D, SHANG L F, et al. Multimodal convolutional neural networks for matching image and sentence [C]// 2015 IEEE International Conference on Computer Vision (ICCV). IEEE, 2015: 2623-2631. DOI: 10.1109/ICCV.2015.301.
[16] LOBO J M, JIMÉNEZ-VALVERDE A, REAL R. AUC: A misleading measure of the performance of predictive distribution models [J]. Global Ecology and Biogeography, 2008, 17(2): 145-151. DOI: 10.1111/j.1466-8238.2007.00358.x.
文章导航

/