Text matching based on multi-dimensional feature representation

Ming WANG; Te LI; Dingjiang HUANG

doi:10.3969/j.issn.1000-5641.2022.05.011

Journal of East China Normal University(Natural Science) >

2022 , Vol. 2022 >Issue 5: 126 - 135

DOI: https://doi.org/10.3969/j.issn.1000-5641.2022.05.011

Construction and Analysis of Supply Chain Knowledge Graph

Text matching based on multi-dimensional feature representation

Ming WANG ,
Te LI ,
Dingjiang HUANG

Expand

School of Data Science and Engineering, East China Normal University, Shanghai　200062, China

Received date: 2022-07-20

Accepted date: 2022-07-20

Online published: 2022-09-26

Fold

Abstract

Text semantic matching is the basis of many natural language processing tasks. Text semantic matching techniques are required in many scenarios, such as search, question, and answer systems. In practical application scenarios, the efficiency of text semantic matching is crucial. Although the representational learning semantic-matching model is less accurate than the interactive model, it is more efficient. The key to improve the performance of learning-based semantic-matching models is to extract sentence vectors with high-level semantic features. On this basis, this paper presents the design of a feature-fusion module and feature-extraction module based on the ERINE model to obtain sentence vectors with multidimensional semantic features. Further, the performance of the model is improved to obtain semantic information by designing a loss function of semantic prediction. Finally, the accuracy on the Baidu Qianyan dataset reaches 0.851, which indicates good performance.

Key words： text matching; pre-training model; sentence embeddings; semantic features

Cite this article

Ming WANG , Te LI , Dingjiang HUANG . Text matching based on multi-dimensional feature representation[J]. Journal of East China Normal University(Natural Science), 2022 , 2022(5) : 126 -135 . DOI: 10.3969/j.issn.1000-5641.2022.05.011

References

1	YANG Y, ZHANG C. Attention-based multi-level network for text matching with feature fusion [C]// 2021 4th International Conference on Algorithms, Computing and Artificial Intelligence. 2021: 1-7.
2	RADFORD A, NARASIMHAN K, SALIMANS T, et al. Improving language understanding by generative pre-training [EB/OL]. [2022-07-08]. https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-supervised/language_understanding_paper.pdf.
3	DEVLIN J, CHANG M W, LEE K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding [EB/OL]. (2019-05-24) [2022-07-08]. https://arxiv.org/pdf/1810.04805.pdf.
4	REIMERS N, GUREVYCH I. Sentence-bert: Sentence embeddings using siamese bert-networks [EB/OL]. (2019-08-27) [2022-07-08]. https://arxiv.org/pdf/1908.10084.pdf.
5	SALTON G, WONG A, YANG C S. A vector space model for automatic indexing. Communications of the ACM, 1975, 18 (11): 613- 620.
6	MILLER G A. WordNet: A lexical database for English [C]// Proceedings of the Workshop on Human Language Technology. 1994: 468.
7	KOLTE S G, BHIRUD S G. Word sense disambiguation using wordnet domains [C]// First International Conference on Emerging Trends in Engineering and Technology. New York: IEEE, 2008: 1187-1191.
8	MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space [EB/OL]. (2013-09-07) [2022-07-08]. https://arxiv.org/pdf/1301.3781.pdf.
9	LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998, 86 (11): 2278- 2324.
10	HOCHREITER S, SCHMIDHUBER J. Long short-term memory. Neural Computation, 1997, 9 (8): 1735- 1780.
11	CHUNG J, GULCEHRE C, CHO K H, et al. Empirical evaluation of gated recurrent neural networks on sequence modeling [EB/OL]. (2014-12-11) [2022-07-08]. https://arxiv.org/pdf/1412.3555.pdf.
12	MUELLER J, THYAGARAJAN A. Siamese recurrent architectures for learning sentence similarity [C]// Proceedings of the AAAI Conference on Artificial Intelligence. 2016, 30(1): 2786-2792.
13	CHEN Q, ZHU X, LING Z, et al. Enhanced LSTM for natural language inference [EB/OL]. (2017-04-26) [2022-07-08]. https://arxiv.org/pdf/1609.06038.pdf.
14	WANG Z, HAMZA W, FLORIAN R. Bilateral multi-perspective matching for natural language sentences [EB/OL]. (2017-07-14) [2022-07-08]. https://arxiv.org/pdf/1702.03814.pdf.
15	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [EB/OL]. (2017-12-06) [2022-07-08]. https://arxiv.org/pdf/1706.03762.pdf.
16	SUN Y, WANG S, LI Y, et al. Ernie: Enhanced representation through knowledge integration [EB/OL]. (2019-04-19) [2022-07-08]. https://arxiv.org/pdf/1904.09223.pdf.
17	JAWAHAR G, SAGOT B, SEDDAH D. What does BERT learn about the structure of language? [EB/OL]. (2019-06-04) [2022-07-08]. https://hal.inria.fr/hal-02131630/document.
18	GRILL J B, STRUB F, ALTCHé F, et al. Bootstrap your own latent-a new approach to self-supervised learning. Advances in Neural Information Processing Systems, 2020, 33, 21271- 21284.
19	LIU X, CHEN Q, DENG C, et al. Lcqmc: A large-scale chinese question matching corpus [C]// Proceedings of the 27th International Conference on Computational Linguistics. 2018: 1952-1962.
20	CHEN J, CHEN Q, LIU X, et al. The bq corpus: A large-scale domain-specific chinese corpus for sentence semantic equivalence identification [C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018: 4946-4951.
21	YANG Y, ZHANG Y, TAR C, et al. PAWS-X: A cross-lingual adversarial dataset for paraphrase identification [EB/OL]. (2019-08-30) [2022-07-08]. https://arxiv.org/pdf/1908.11828.pdf.
22	WEI J, REN X, LI X, et al. Nezha: Neural contextualized representation for chinese language understanding [EB/OL]. (2019-09-05) [2022-07-08]. https://arxiv.org/pdf/1909.00204v2.pdf.

Options

Outlines

模态框（Modal）标题

Abstract

Cite this article

References