华东师范大学学报(自然科学版) ›› 2022, Vol. 2022 ›› Issue (5): 126-135.doi: 10.3969/j.issn.1000-5641.2022.05.011

• 供应链知识图谱构建与分析 • 上一篇    

基于多维特征表示的文本语义匹配

王明, 李特, 黄定江*()   

  1. 华东师范大学 数据科学与工程学院, 上海 200062
  • 收稿日期:2022-07-20 接受日期:2022-07-20 出版日期:2022-09-25 发布日期:2022-09-26
  • 通讯作者: 黄定江 E-mail:djhuang@dase.ecnu.edu.cn
  • 基金资助:
    国家自然科学基金(U1711262, 62072185, U1811264)

Text matching based on multi-dimensional feature representation

Ming WANG, Te LI, Dingjiang HUANG*()   

  1. School of Data Science and Engineering, East China Normal University, Shanghai 200062, China
  • Received:2022-07-20 Accepted:2022-07-20 Online:2022-09-25 Published:2022-09-26
  • Contact: Dingjiang HUANG E-mail:djhuang@dase.ecnu.edu.cn

摘要:

文本语义匹配是很多自然语言处理任务的基础. 在很多场景中都需要文本语义匹配技术, 如搜索、问答系统等. 在实际运用场景中, 对文本语义匹配的效率有很高的要求. 虽然表征学习型语义匹配模型相较于交互型模型的准确率有所下降, 但效率极高. 而表征学习型语义匹配模型提升性能的关键是抽取具有高层语义特征的句向量. 针对该问题, 本文在ERINE模型的基础上, 设计了特征融合模块及特征抽取模块, 以获取具有多维语义特征的句向量, 并通过设计语义预测的损失函数, 进一步提升模型获取语义信息的性能, 从而提高文本语义匹配的准确率. 最终在百度千言文本相似度数据集上的准确率达到85.1%, 表现出较好的性能.

关键词: 文本语义匹配, 预训练模型, 句向量, 语义特征

Abstract:

Text semantic matching is the basis of many natural language processing tasks. Text semantic matching techniques are required in many scenarios, such as search, question, and answer systems. In practical application scenarios, the efficiency of text semantic matching is crucial. Although the representational learning semantic-matching model is less accurate than the interactive model, it is more efficient. The key to improve the performance of learning-based semantic-matching models is to extract sentence vectors with high-level semantic features. On this basis, this paper presents the design of a feature-fusion module and feature-extraction module based on the ERINE model to obtain sentence vectors with multidimensional semantic features. Further, the performance of the model is improved to obtain semantic information by designing a loss function of semantic prediction. Finally, the accuracy on the Baidu Qianyan dataset reaches 0.851, which indicates good performance.

Key words: text matching, pre-training model, sentence embeddings, semantic features

中图分类号: