华东师范大学学报(自然科学版) ›› 2023, Vol. 2023 ›› Issue (6): 95-107.doi: 10.3969/j.issn.1000-5641.2023.06.009

• 计算机科学 • 上一篇    下一篇

融合多粒度语义特征的中文情感分析方法

任菊香1(), 刘忠宝2,3,*()   

  1. 1. 山西工程科技职业大学 信息工程学院, 山西 晋中 030619
    2. 北京语言大学 信息科学学院, 北京 100083
    3. 泉州信息工程学院 软件学院, 福建 泉州 362000
  • 收稿日期:2022-06-20 出版日期:2023-11-25 发布日期:2023-11-23
  • 通讯作者: 刘忠宝 E-mail:63896887@qq.com;liuzb@nuc.edu.cn
  • 作者简介:任菊香, 女, 硕士, 副教授, 主要研究方向为管理统计与数据挖掘. E-mail: 63896887@qq.com
  • 基金资助:
    福建省社会科学基金 (FJ2021B126, FJ2022A018)

Integrating multi-granularity semantic features into the Chinese sentiment analysis method

Juxiang REN1(), Zhongbao LIU2,3,*()   

  1. 1. School Information Engineering, Shanxi Vocational University of Engineering Science and Technology, Jinzhong, Shanxi 030619, China
    2. School of Information Science, Beijing Language and Culture University, Beijing 100083, China
    3. School of Software, Quanzhou University of Information Engineering, Quanzhou, Fujian 362000, China
  • Received:2022-06-20 Online:2023-11-25 Published:2023-11-23
  • Contact: Zhongbao LIU E-mail:63896887@qq.com;liuzb@nuc.edu.cn

摘要:

中文情感分析是自然语言处理的重要研究内容, 旨在探究中文文本中蕴含的情感倾向. 近年来, 中文情感分析研究取得了长足进步, 但鲜有研究根据语言本身特征和下游任务需求进行探讨. 鉴于此, 针对中文文本的特殊性以及情感分析的实际需求, 在字、词特征的基础上, 引入部首特征和情感词性特征, 利用双向长短期记忆网络、注意力机制、循环卷积神经网络等模型, 提出了融合字、词、部首、词性等多粒度语义特征的中文文本情感分析方法. 在融合各类特征的基础上, 利用softmax函数进行情感预测. 数据集NLPECC (natural language processing and Chinese computing)上的对比实验结果表明, 所提方法的F1值均达到84.80%, 一定程度上提高了已有方法的性能, 较好地完成了中文文本情感分析任务.

关键词: 中文文本, 多粒度语义特征, 情感分析, 大数据环境

Abstract:

Chinese sentiment analysis is one of important researches in natural language processing, which aims to discover the sentimental tendencies in the Chinese text. In recent years, research on Chinese text sentiment analysis has made great progress in efficiencies, but few studies have explored the characteristics of the language and downstream task requirements. Therefore, in view of the particularity of Chinese text and the requirements of sentiment analysis, using the Chinese text sentiment analysis method that integrates multi-granularity semantic features, such as characters, words, radicals, and part-of-speech is proposed. This introduces radical features and emotional part-of-speech features based on character and word features. Additionally, this integration uses bidirectional the long short-term memory network (BLSTM), attention mechanism and recurrent convolutional neural network (RCNN). The softmax function is used to predict the sentimental tendencies by integrating multi-granularity semantic features. The comparative experiment results on the NLPECC (natural language processing and Chinese computing) dataset showed that the F1 score of the proposed method was 84.80%, which improved the performance of the existing methods to some extent and completed the Chinese text sentiment analysis task.

Key words: Chinese text, multi-granularity semantic features, sentiment analysis, big data environment

中图分类号: