文本自动生成旨在实现机器像人一样写作,减少语言工作人员的工作量,为读者传送实时、简洁的新闻报道.它可被运用在智能问答和对话、新闻的自动撰写、突发事件报道等应用中,且一直是学术界和工业界想突破的研究问题.本文将文本自动生成建模成关键词集合覆盖问题,提出了一种无监督的抽取式文本自动生成算法.该算法优化了自动文本的结构,不再是一段式文本.实验表明,该算法在大规模语料库上可取得不错效果,生成的文本覆盖信息更全面,与人工生成的文本意思更接近.
The aim of natural language generation is to achieve a state where machines can generate text automatically. This would reduce the workload of human language workers and helps us deliver real-time, concise news coverage to readers. It could be applied to many applications, such as question and answers systems, automatic news writing, incident reporting, and so on. The challenge has been one of the open problems for both academia and industry. In this paper, we model the issue as a keyword covering problem and propose an unsupervised approach to extract text for natural language generation.The experimental results illustrate that the algorithm is effective for large-scale corpus; the text coverage is more comprehensive and the generated text is closer to the manual text produced by an individual.
[1] 万小军. 文本自动生成研究进展与趋势[R]. 北京:北京大学, 2016:1-2.
[2] ZHANG Y, KRIEGER H U. Large-scale corpus-driven PCFG approximation of an HPSG[C]//Proceedings of the 12th International Conference on Parsing Technologies. Stroudsburg:Association for Computational Linguistics, 2011:198-208.
[3] SRIPADA S, REITER E, DAVY I. Sumtime-mousam:Configurable marine weather forecast generator[J]. Expert Update, 2003, 6(3):4-10.
[4] KUKICH K. Design of a knowledge-based report generator[C]//Proceedings of the 21st Annual Meeting on Association for Computational Linguistics. Stroudsburg:Association for Computational Linguistics, 1983:145-150.
[5] PORTET F, REITER E, GATT A, et al. Automatic generation of textual summaries from neonatal intensive care data[J]. Artificial Intelligence, 2009, 173(7/8):789-816.
[6] KARPATHY A, LI F F. Deep visual-semantic alignments for generating image descriptions[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2014, 39(4):664-676.
[7] LI S J, OUYANG Y, WANG W, et al. Multi-document summarization using support vector regression[C/OL]//Proceedings of the Document Understanding Conference.[2017-05-03]. http://www-nlpir.nist.gov/projects/duc/pubs/2007papers/pekingu.final.pdf.
[8] KNIGHT K, MARCU D. Statistics-based summarization-step one:Sentence compression[C]//Senventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence.[S.l]:AAAI Press, 2000:703-710.
[9] CLARKE J, LAPATA M. Global inference for sentence compression:An integer linear programming approach[J]. Journal of Artificial Intelligence Research, 2008, 31:399-429.
[10] FILIPPOVA K. Multi-sentence compression:Finding shortest paths in word graphs[C]//Proceedings of the 23rd International Conference on Computational Linguistics. Stroudsburg:Association for Computational Linguistics, 2010:322-330.
[11] THADANI K, MCKEOWN K. Supervised sentence fusion with single-stage inference[C]//International Joint Conference on Natural Language Processing. 2013:1410-1418.
[12] FUJITA A, INUI K, MATSUMOTO Y. Exploiting lexical conceptual structure for paraphrase generation[C]//International Conference on Natural Language Processing. Berlin:Springer, 2005:908-919.
[13] DUBOUE P A, CHU-CARROLL J. Answering the question you wish they had asked:The impact of paraphrasing for question answering[C]//Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume:Short Papers. Stroudsburg:Association for Computational Linguistics, 2006:33-36.
[14] BLEI D M, NG A Y, JORDAN M I. Latent dirichlet allocation[J]. Journal of Machine Learning Research, 2003(3):993-1022
[15] MIHALCEA R, TARAU P. TextRank:Bringing order into texts[C]//Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. Stroudsburg:Association for Computational Linguistics, 2004:404-411.
[16] EDMUNDSON H P. New methods in automatic extracting[J]. Journal of the ACM (JACM), 1969, 16(2):264-285.
[17] LIN C Y. ROUGE:A package for automatic evaluation of summaries[C/OL]//Proceedings of Workshop on Text Summarization Branches Out Post Conference Workshop of ACL 2004.[2017-05-03]. https://www.microsoft.com/en-us/research/wp-content/uploads/2016/07/was2004.pdf.
[18] PARVEEN D, MESGAR M, STRUBE M. Generating coherent summaries of scientific articles using coherence patterns[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2016:772-783.