目前大多数智能聊天系统的实现主要有两种方式. 检索式得到的回复准确且有意义, 但回复内容和回复类型却受限于所选择的语料库. 生成式可以获得语料库中没有的回复, 更具灵活性, 但是容易产生一些错误或是无意义的回复内容. 为了解决上述问题, 本文提出一种新的模型GRS(Generative-Retrieval-Score), 此模型可以同时训练检索模型和生成模型, 并用一个打分模块对检索模型和生成模型的结果进行打分排序, 将得分最高的回复作为整个对话系统的输出, 进而巧妙地将两种方法的优点结合起来, 使最终得到的回复具体多样, 且生成的回复形式灵活多变. 在真实的京东智能客服对话数据集上的实验表明, 本文提出的模型比现有的检索式模型和生成式模型在多轮对话建模上有着更优异的表现.
There are generally two ways to realize most intelligent chat systems: ① based on retrieval and ② based on generation. The content and type of responses, however, are limited by the corpus chosen. The generative approach can obtain responses that are not in the corpus, rendering it more flexible; at the same time, it is also easy to produce errors or meaningless replies. In order to solve the aforementioned problems, a new model GRS (generative retrieval score) is proposed. This model can train the retrieval model and the generation model simultaneously. A scoring module is used to rank the results of the retrieval model and the generation model, and the responses with high scores are taken as the output of the overall dialogue system. As a result, GRS can combine the advantages of both dialogue systems and output a specific, diverse, and flexible response. An experiment on a real-world JingDong intelligent customer service dialogue dataset shows that the proposed model offers better outputs than existing retrieval and generation models.
[1] KEARNS L I J M, P KORMANN D, SINGH S, et al. Cobot in LambdaMOO: A social statistics agent [C]// Proceedings of the Seventeenth National Conference on Artificial Intelligence. 2001: 36-41.
[2] JI Z, LU Z, LI H. An information retrieval approach to short text conversation [EB/OL].(2014-08-29)[2020-07-01] https://arxiv.org/pdf/1408.6988.pdf.
[3] SORDONI A, GALLEY M, AULI M, et al. A neural network approach to context-sensitive generation of conversational responses [C]// Proceeding of NAACL-HLT. 2015: 196-205.
[4] SERBAN I V, SORDONI A, BENGIO Y. et al. Building end-to-end dialogue systems using generative hierarchical neural network models [C]// Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence. 2016: 3776-3784.
[5] HOCHREITER S, SCHMIDHUBER J. Long short-term memory [J]. Neural Computation, 1997, 9(8): 1735-1780.
[6] CHO K, MERRIENBOER B V, GULCEHRE C, et al. Learning phrase representations using RNN encoder decoder for statistical machine translation[EB/OL]. (2014-09-03)[2020-07-01]. https://arxiv.org/pdf/1406.1078.pdf.
[7] YAN R, SONG Y P, WU H. Learning to respond with deep neural networks for retrieval-based human-computer conversation system [C]// Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. 2016: 55-64.
[8] WU Y, WU W, XING C, et al. Sequential matching network: A new architecture for multi-turn response selection in retrieval-based chatbots [EB/OL]. (2016-12-06)[2020-07-01]. https://arxiv.org/pdf/1612.01627.pdf.
[9] YANG L, QIU M, QU C, et al. Response ranking with deep matching networks and external knowledge in information-seeking conversation systems [EB/OL]. (2018-05-01)[2020-07-01]. https://arxiv.org/pdf/1805.00188.pdf.
[10] LI X, MOU L L, YAN R, ZHANG M. StalemateBreaker: A proactive content introducing approach to automatic human-computer conversation[EB/OL]. (2016-04-15)[2020-07-01]. https://arxiv.org/pdf/1604.04358.pdf.
[11] GAO J F, GALLEY M, LI L H. Neural approaches to conversational AI [EB/OL]. (2019-09-10)[2020-07-01]. https://arxiv.org/pdf/1809.08267.pdf.
[12] RITTER A, CHERRY C, DOLAN W B. Data-driven response generation in social media[C]//EMNLP '11: Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2011: 583–593.
[13] ZOPH B, KNIGHT K. Multi-source neural translation[EB/OL]. (2019-01-05)[2020-07-01]. https://arxiv.org/pdf/1601.00710.pdf.
[14] SUTSKEVER I, VINYALS O, LE Q V. Sequence to sequence learning with neural networks[EB/OL]. (2014-12-14)[2020-07-01]. https://arxiv.org/pdf/1409.3215.pdf.
[15] VINYALS O, LE Q. A neural conversational model [EB/OL]. (2015-06-19)[2020-07-01]. https://arxiv.org/pdf/1506.05869.pdf.
[16] SERBAN I V, SORDONI A, LOWE R, et al. A hierarchical latent variable encoder-decoder model for generating dialogues [EB/OL].(2016-06-14)[2020-07-01]. https://arxiv.org/pdf/1605.06069.pdf.
[17] BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate [EB/OL]. (2016-05-19)[2020-07-01]. https://arxiv.org/pdf/1409.0473.pdf.
[18] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 6000-6010.
[19] DEVLIN J, CHANG M W, LEE K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[EB/OL]. (2019-05-24)[2020-07-01]. https://arxiv.org/pdf/1810.04805.pdf.
[20] SABOUR S, FROSST N, HINTON G E. Dynamic routing between capsules[EB/OL]. (2017-11-01)[2020-07-01]. https://arxiv.org/pdf/1710.09829.pdf.
[21] YANG M, ZHAO W, YE J, et al. Investigating capsule networks with dynamic routing for text classification[EB/OL]. (2018-09-03)[2020-07-01]. https://arxiv.org/pdf/1804.00538.pdf.
[22] ZHANG N, DENG S, SUN Z, et al. Attention-based capsule networks with dynamic routing for relation extraction [EB/OL]. (2018-12-29)[2020-07-01]. https://arxiv.org/pdf/1812.11321.pdf.
[23] CHEN M, LIU R X, SHEN L. The JDDC corpus: A large-scale multi-turn Chinese dialogue dataset for e-commerce customer service [EB/OL]. (2019-11-25)[2020-07-01]. https://arxiv.org/pdf/1911.09969v2.pdf.