收稿日期: 2020-11-09
网络出版日期: 2022-03-28
Bi-directional long short-term memory and bi-directional gated attention networks for text classification
Received date: 2020-11-09
Online published: 2022-03-28
首先, 提出构建双向的全连接结构用于更好提取上下文的信息; 然后, 利用双向的注意力机制将包含丰富文本特征的矩阵压缩成一个向量; 最后, 将双向的全连接结构和门控制结构相结合. 通过实验验证了上述结构对于提升文本分类的准确率具有积极的作用. 将这3种结构和双向的循环网络进行结合, 组成了所提出的文本分类模型. 通过在7个常用的文本分类数据集(AG、DBP、Yelp.P、Yelp.F、Yah.A、Ama.F、Ama.P)上进行的实验, 得到了具有竞争性的结果并且在其中5个数据集(AG、DBP、Yelp.P、Ama.F、Ama.P)上获得了较好的实验效果. 通过实验表明, 所提出的文本分类模型能显著降低分类错误率.
童根梅 , 朱敏 . 基于双向长短记忆网络和门控注意力的文本分类网络[J]. 华东师范大学学报(自然科学版), 2022 , 2022(2) : 67 -75 . DOI: 10.3969/j.issn.1000-5641.2022.02.008
In this paper, we propose the construction of a bi-directional fully connected structure for better extraction of context information. We also propose the construction of a bi-directional attention structure for compressing matrices containing rich text features into a vector. The bi-directional fully connected structure and the gated structure are then combined. This research demonstrates that the proposed combined structure has a net positive effect on text classification accuracy. Finally, by combining these three structures and a bi-direction long short-term memory, we propose a new text classification model. Using this model, we obtained competitive results on seven commonly used text classification datasets and achieved state-of-the-art results on five of them. Experiments showed that the combination of these structures can significantly reduce classification errors.
Key words: text classification; attention; long short-term memory
1 | WANG G, LI C, WANG W, et al. Joint embedding of words and labels for text classification [C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2018: 2321-2331. |
2 | PENNINGTON J, SOCHER R, MANNING C. GloVe: Global vectors for word representation [C]// Conference on Empirical Methods in Natural Language Processing. 2014: 1532-1543. |
3 | PETERS M, NEUMANN M, IYYER M, et al. Deep contextualized word representations [EB/OL]. (2018-03-22)[2020-10-16]. https://arxiv.org/pdf/1802.05365v2.pdf. |
4 | DEVLIN J, CHANG M W, LEE K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding [EB/OL]. (2019-03-24)[2020-10-16]. https://arxiv.org/pdf/1810.04805.pdf. |
5 | RADFORD A, WU J, CHILD R, et al. Language models are unsupervised multitask learners [EB/OL]. (2019-01-08)[2020-10-16]. https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf. |
6 | KIM Y. Convolutional neural networks for sentence classification [EB/OL]. (2014-09-03)[2020-10-16]. https://arxiv.org/pdf/1408.5882v2.pdf. |
7 | CONNEAU A, SCHWENK H, BARRAULT L, et al. Very deep convolutional networks for text classification [EB/OL]. (2017-01-27)[2020-10-16]. https://arxiv.org/pdf/1606.01781v2.pdf. |
8 | JOHNSON R, TONG Z. Deep pyramid convolutional neural networks for text categorization [C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017: 562-570. |
9 | HOCHREITER S, SCHMIDHUBER J. Long short-term memory [J]. Neural Computation, 1997, 9(8): 1735-1780. |
10 | LAI S, XU L, LIU K, et al. Recurrent convolutional neural networks for text classification [C]// Proceeding of the 25th International Joint Conference on Artificial Intelligence. 2015: 2267-2273. |
11 | PENG Z, QI Z, ZHENG S, et al. Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling [EB/OL]. (2016-11-21)[2020-10-22]. https://arxiv.org/pdf/1611.06639.pdf. |
12 | PAPPAS N, POPESCU-BELIS A. Multilingual hierarchical attention networks for document classification [EB/OL]. (2017-09-15)[2020-09-14]. https://arxiv.org/pdf/1707.00896v4.pdf. |
13 | ZHANG X, ZHAO J, LECUN Y. Character-level convolutional networks for text classification [EB/OL]. (2015-09-10)[2020-09-11]. https://arxiv.org/pdf/1509.01626v2.pdf. |
14 | TURIAN J P, RATINOV L A, BENGIO Y. Word representations: A simple and general method for semi-supervised learning [C]// Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. 2010: 384-394. |
15 | KINGMA D, BA J. Adam: A method for stochastic optimization [EB/OL]. (2015-07-30)[2020-10-16]. https://arxiv.org/pdf/1412.6980v8.pdf. |
16 | XIAO Y, CHO K. Efficient character-level document classification by combining convolution and recurrent layers [EB/OL]. (2016-02-01)[2020-10-16]. https://arxiv.org/pdf/1602.00367v1.pdf. |
17 | JOULIN A, GRAVE E, BOJANOWSKI P, et al. Bag of tricks for efficient text classification [EB/OL]. (2016-08-09)[2020-10-13]. https://arxiv.org/pdf/1607.01759v3.pdf. |
18 | QIAO C, HUANG B, NIU G, et al. A new method of region embedding for text classification [EB/OL]. (2018-01-30)[2020-10-16]. https://openreview.net/pdf?id=BkSDMA36Z. |
/
〈 |
|
〉 |