Study on the extraction of Chinese microblog subjective sentences based on lexicon and corpus

ZHU Hai-huan, YU Qing-song   

  1. Computer Center, East China Normal University, Shanghai 200062, China
  Received:2013-07-01 Revised:2013-10-01 Online:2014-07-25 Published:2014-07-25

Abstract: In this paper, we propose a new method for the extraction
of Chinese microblog subjective sentence, which is based on a
combination of lexicon and corpus. By determining whether the
sentence contains emotional expressions, it can be classified as a
subjective or objective sentence. Firstly, a highly credible
sentiment lexicon was built based on the words whose emotional
orientation is fixed from the existing sentiment dictionary. Based
on the highly credible sentiment lexicon, sentiment expressions can
be extracted with assurance of accuracy. Finally, a N-POSW model was
proposed for the corpus-based learning method. Through the 2-POSW
model, the remained sentiment expressions in the sentence can be
extracted, thus guaranteeing the overall recall rate. Experimental
results show that the F Value in this paper increases 7{\%} compared
with the traditional method, which is based on the large-scale
sentiment lexicon.

Key words: sentiment lexicon, highly credible lexicon, N-POSW model, subjective sentence

