华东师范大学学报(自然科学版) ›› 2014, Vol. 2014 ›› Issue (4): 62-68, 87.

• 计算机科学 • 上一篇    下一篇

基于词典与语料结合的中文微博主观句抽取方法

朱海欢, 余青松   

  1. 华东师范大学~~计算中心, 上海 200062
  • 收稿日期:2013-07-01 修回日期:2013-10-01 出版日期:2014-07-25 发布日期:2014-07-25

Study on the extraction of Chinese microblog subjective sentences based on lexicon and corpus

ZHU Hai-huan, YU Qing-song   

  1. Computer Center, East China Normal University, Shanghai 200062, China
  • Received:2013-07-01 Revised:2013-10-01 Online:2014-07-25 Published:2014-07-25

摘要: 提出一种基于词典与语料结合的中文微博主观句抽取方法,
通过判断句子中是否包含情感表达文本来判断句子是否为主观句. 首先,
从现有的情感词典中挑选出情感倾向较为固定的情感词构建了一个高可信情感词典,
用于抽取句子中的情感表达文本, 保证情感表达文本抽取的准确率;
然后提出~N-POSW~模型,
并基于~2-POS~W模型通过语料学习的方法较为准确地抽取句子中的剩余情感表达文本,
保证了情感表达文本抽取的召回率. 实验结果表明,
相比于传统的基于大规模情感词典的方法,
本文方法主观句抽取的F值提高了7%.

关键词: 情感词典, 高可信情感词典, N-POSW模型, 主观句

Abstract: In this paper, we propose a new method for the extraction
of Chinese microblog subjective sentence, which is based on a
combination of lexicon and corpus. By determining whether the
sentence contains emotional expressions, it can be classified as a
subjective or objective sentence. Firstly, a highly credible
sentiment lexicon was built based on the words whose emotional
orientation is fixed from the existing sentiment dictionary. Based
on the highly credible sentiment lexicon, sentiment expressions can
be extracted with assurance of accuracy. Finally, a N-POSW model was
proposed for the corpus-based learning method. Through the 2-POSW
model, the remained sentiment expressions in the sentence can be
extracted, thus guaranteeing the overall recall rate. Experimental
results show that the F Value in this paper increases 7{\%} compared
with the traditional method, which is based on the large-scale
sentiment lexicon.

Key words: sentiment lexicon, highly credible lexicon, N-POSW model, subjective sentence

中图分类号: