华东师范大学学报(自然科学版) ›› 2017, Vol. 2017 ›› Issue (5): 138-153.doi: 10.3969/j.issn.1000-5641.2017.05.013

• 用户行为分析 • 上一篇    下一篇

基于隐变量模型的多维用户偏好建模

王珊蕾1, 岳昆1, 武浩1, 田凯琳2   

  1. 1. 云南大学 信息学院, 昆明 650500;
    2. 西南林业大学 图书馆, 昆明, 650224
  • 收稿日期:2017-05-01 出版日期:2017-09-25 发布日期:2017-09-25
  • 通讯作者: 岳昆,男,博士,教授,博士生导师,研究方向为海量数据分析与知识发现.E-mail:kyue@ynu.edu.cn E-mail:kyue@ynu.edu.cn
  • 作者简介:王珊蕾,男,硕士研究生,研究方向为海量数据分析与知识发现.E-mail:407773704@qq.com
  • 基金资助:
    国家自然科学基金(61472345,61562090);教育部"云数融合、科教创新"基金(2017B000第二批"云岭学者"培养项目(C6153001);云南省应用基础研究计划重点项目(2014FA023);云南大学青年英才培育计划(WX173602);中国博士后科研基金(2016M592721)

Modeling multi-dimensional user preference based on the latent variable model

WANG Shan-lei1, YUE Kun1, WU Hao1, TIAN Kai-lin2   

  1. 1. School of Information Science and Engineering, Yunnan University, Kunming 650500, China;
    2. Library of South West Forestry University, Kunming 650224, China
  • Received:2017-05-01 Online:2017-09-25 Published:2017-09-25

摘要: 从用户行为数据构建用户偏好模型,是解决个性化服务、评分预测和用户行为定向等问题的重要基础.本文从用户的评分数据出发,以多个隐变量分别描述用户在评分对象多个维度的偏好,以含有多个隐变量的贝叶斯网(简称隐变量模型)作为表示用户偏好的基本知识框架.首先根据用户偏好和隐变量的特定含义给出模型构建的约束条件,进而提出基于约束条件的模型构建方法,使用约束条件下的EM算法来计算模型参数,约束条件下的SEM算法来构建模型结构.针对多隐变量情形下模型构建过程中产生大量中间数据带来的计算复杂度急剧上升的问题,本文使用Spark计算框架实现模型构建的方法.建立在Movielens数据集上的实验表明,本文提出的方法是有效的.

关键词: 评分数据, 多维偏好, 隐变量, 贝叶斯网, Spark

Abstract: Modeling user preference from user behavior data is the basis of personalization service, score prediction, user behavior targeting, etc. In this paper, multi-dimensional preferences from rating data are described by multiple latent variables and the Bayesian network with multiple latent variables is adopted as the preliminary knowledge framework of user preference. Constraint conditions are given according to the inherence of user preference and latent variables, upon which we propose a method for modeling user preference. Parameters are computed by EM algorithm and structure is established by SEM algorithm with respect to the given constraints. In the case of multiple latent variables, a large amount of intermediate data is generated in modeling, which causes the increasing computational complexity. Therefore, we implement the modeling method with Spark computing framework. Experiments results on the Movielens dataset verify that the method proposed in this paper is effective.

Key words: rating data, multi-dimensional preference, latent variable, Bayesian network, Spark

中图分类号: