华东师范大学学报(自然科学版) ›› 2020, Vol. 2020 ›› Issue (4): 147-155.doi: 10.3969/j.issn.1000-5641.201921007

• 计算机科学 • 上一篇    下一篇

基于特征优化的广告点击率预测模型研究

贺小娟, 郭新顺   

  1. 上海对外经贸大学 统计与信息学院, 上海 201620
  • 收稿日期:2019-08-01 发布日期:2020-07-20
  • 通讯作者: 郭新顺,男,教授,研究生导师,研究方向为数据分析与计算机应用.E-mail:gxs@suibe.edu.cn E-mail:gxs@suibe.edu.cn
  • 基金资助:
    教育部人文社科青年基金(18YJC630205)

Research on an advertising click-through rate prediction model based on feature optimization

HE Xiaojuan, GUO Xinshun   

  1. School of Statistics and Information, Shanghai University of International Business and Economics, Shanghai 201620, China
  • Received:2019-08-01 Published:2020-07-20

摘要: 针对互联网广告数据具有高维稀疏性的特点, 在现有的点击率(Click-Through Rate, CTR)预测问题的相关理论和技术基础上, 给出了一种基于梯度提升决策树(Gradient Boosting Decision Tree, GBDT)的卷积神经网络(Convolutional Neural Networks, CNN)在线广告特征提取模型(CNN Based on GBDT, CNN+). CNN+模型不仅能从原始数据中提取出深度高阶特征, 还能解决卷积神经网络在稀疏、高维特征中提取特征困难的问题. 在真实数据集上的实验结果表明, 与主成分分析(Principal Component Analysis, PCA)和梯度提升决策树这两种特征提取方法相比, CNN+模型提取的特征更加有效.

关键词: 广告点击率预测, 梯度提升决策树, 卷积神经网络, 特征学习

Abstract: This paper proposes an online advertising feature extraction model of CNN (Convolutional Neural Networks) based on GBDT (Gradient Boosting Decision Tree) aimed at solving challenges with high-dimensional sparseness in Internet advertising data based on existing theories and technologies for click-through rate (CRT) prediction. The proposed model, CNN+, is able to extract deep, high-order features from raw data and solve the issues that convolutional neural networks face in extracting sparse and high-dimensional features. Experimental results on real datasets show that the features extracted by the CNN+ model are more effective than two other feature extraction methods studied, namely principal component analysis (PCA) and GBDT.

Key words: advertising click-through rate prediction, gradient boosting decision tree (GBDT), convolutional neural networks (CNN), feature learning

中图分类号: