华东师范大学学报(自然科学版) ›› 2019, Vol. 2019 ›› Issue (5): 113-122,167.doi: 10.3969/j.issn.1000-5641.2019.05.009

• 新兴应用中的计算机智能 • 上一篇    下一篇

基于自注意力机制的冗长商品名称精简方法

傅裕, 李优, 林煜明, 周娅   

  1. 桂林电子科技大学 广西可信软件重点实验室, 广西 桂林 541004
  • 收稿日期:2019-07-28 出版日期:2019-09-25 发布日期:2019-10-11
  • 通讯作者: 李优,女,副教授,硕士生导师,研究方向为Web数据分析、观点挖掘.E-mail:liyou@guet.edu.cn. E-mail:liyou@guet.edu.cn
  • 作者简介:傅裕,男,硕士研究生,研究方向为海量数据管理.E-mail:fuzzyu@foxmail.com.
  • 基金资助:
    国家自然科学基金(61562014,U1501252,U1811264);广西自然科学基金重点项目(2018GXNSFDA281049);桂林电子科技大学研究生优秀论文培养项目(17YJPYSS17);广西可信软件重点实验室研究课题(kx201916)

Self-attention based neural networks for product titles compression

FU Yu, LI You, LIN Yu-ming, ZHOU Ya   

  1. Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin Guangxi 541004, China
  • Received:2019-07-28 Online:2019-09-25 Published:2019-10-11

摘要: 大部分电子商务网站为了吸引用户的关注,通常将商品的很多属性也纳入到商品名称中,使得商品名称中包括了冗余的信息,并产生不一致性.为解决这一的问题,提出了一个基于自注意力机制的商品名称精简模型,并针对自注意力机制网络无法直接捕捉商品名称序列特征的问题,利用门控循环单元的时序特性对自注意力机制进行了时序增强,以较小的计算代价换取了商品命名精简任务整体性能的提升.在公开商品短标题数据集LESD4EC的基础上,构造了商品名称精简数据集LESD4EC_L和LESD4EC_S,并进行了模型验证.一系列的实验结果表明本,所提出的自注意力机制冗长商品名称精简方法相对于其他商品名称精简方法在效果上有较大的提升.

关键词: 自注意力机制, 商品名称精简, 门控循环单元

Abstract: E-commerce product title compression has received significant attention in recent years, since it can facilitate more specific information for cross-platform knowledge alignment and multi-source data fusion. Product titles usually contain redundant descriptions, which can lead to inconsistencies. In this paper, we propose self-attention based neural networks for this task. Given the fact that self-attention mechanism networks cannot directly capture sequence features of product names, we enhance the mapping networks with a dot-attention structure, which was computed for the query and key-value pairs by a gated recurrent unit (GRU) based recurrent neural network. The proposed method improves the analytical capability of the model at a lower relative computational cost. Based on data from LESD4EC, we built two E-commerce datasets of product core phrases named LESD4EC L and LESD4EC S; we subsequently tested the model on these two datasets. A series of experiments show that the proposed model achieves better performance in product title compression than existing techniques.

Key words: self-attention mechanism, product titles compression, gated recurrent units

中图分类号: