An algorithm for natural language generation via text extracting

  • AI Li-si ,
  • TANG Wei-hong ,
  • FU Yun-bin ,
  • DONG Qi-min ,
  • ZHENG Jian-bing ,
  • GAO Ming
Expand
  • 1. School of Data Science and Engineering, East China Normal University, Shanghai 200062, China;
    2. Shanghai Agricultural Technology Extension and Service Center, Shanghai 201103, China;
    3. Vocational and Technical Education Center of Linxi County, Linxi Inner Mongolia 025250, China

Received date: 2017-06-19

  Online published: 2018-07-19

Abstract

The aim of natural language generation is to achieve a state where machines can generate text automatically. This would reduce the workload of human language workers and helps us deliver real-time, concise news coverage to readers. It could be applied to many applications, such as question and answers systems, automatic news writing, incident reporting, and so on. The challenge has been one of the open problems for both academia and industry. In this paper, we model the issue as a keyword covering problem and propose an unsupervised approach to extract text for natural language generation.The experimental results illustrate that the algorithm is effective for large-scale corpus; the text coverage is more comprehensive and the generated text is closer to the manual text produced by an individual.

Cite this article

AI Li-si , TANG Wei-hong , FU Yun-bin , DONG Qi-min , ZHENG Jian-bing , GAO Ming . An algorithm for natural language generation via text extracting[J]. Journal of East China Normal University(Natural Science), 2018 , 2018(4) : 70 -79 . DOI: 10.3969/j.issn.1000-5641.2018.04.007

References

[1] 万小军. 文本自动生成研究进展与趋势[R]. 北京:北京大学, 2016:1-2.
[2] ZHANG Y, KRIEGER H U. Large-scale corpus-driven PCFG approximation of an HPSG[C]//Proceedings of the 12th International Conference on Parsing Technologies. Stroudsburg:Association for Computational Linguistics, 2011:198-208.
[3] SRIPADA S, REITER E, DAVY I. Sumtime-mousam:Configurable marine weather forecast generator[J]. Expert Update, 2003, 6(3):4-10.
[4] KUKICH K. Design of a knowledge-based report generator[C]//Proceedings of the 21st Annual Meeting on Association for Computational Linguistics. Stroudsburg:Association for Computational Linguistics, 1983:145-150.
[5] PORTET F, REITER E, GATT A, et al. Automatic generation of textual summaries from neonatal intensive care data[J]. Artificial Intelligence, 2009, 173(7/8):789-816.
[6] KARPATHY A, LI F F. Deep visual-semantic alignments for generating image descriptions[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2014, 39(4):664-676.
[7] LI S J, OUYANG Y, WANG W, et al. Multi-document summarization using support vector regression[C/OL]//Proceedings of the Document Understanding Conference.[2017-05-03]. http://www-nlpir.nist.gov/projects/duc/pubs/2007papers/pekingu.final.pdf.
[8] KNIGHT K, MARCU D. Statistics-based summarization-step one:Sentence compression[C]//Senventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence.[S.l]:AAAI Press, 2000:703-710.
[9] CLARKE J, LAPATA M. Global inference for sentence compression:An integer linear programming approach[J]. Journal of Artificial Intelligence Research, 2008, 31:399-429.
[10] FILIPPOVA K. Multi-sentence compression:Finding shortest paths in word graphs[C]//Proceedings of the 23rd International Conference on Computational Linguistics. Stroudsburg:Association for Computational Linguistics, 2010:322-330.
[11] THADANI K, MCKEOWN K. Supervised sentence fusion with single-stage inference[C]//International Joint Conference on Natural Language Processing. 2013:1410-1418.
[12] FUJITA A, INUI K, MATSUMOTO Y. Exploiting lexical conceptual structure for paraphrase generation[C]//International Conference on Natural Language Processing. Berlin:Springer, 2005:908-919.
[13] DUBOUE P A, CHU-CARROLL J. Answering the question you wish they had asked:The impact of paraphrasing for question answering[C]//Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume:Short Papers. Stroudsburg:Association for Computational Linguistics, 2006:33-36.
[14] BLEI D M, NG A Y, JORDAN M I. Latent dirichlet allocation[J]. Journal of Machine Learning Research, 2003(3):993-1022
[15] MIHALCEA R, TARAU P. TextRank:Bringing order into texts[C]//Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. Stroudsburg:Association for Computational Linguistics, 2004:404-411.
[16] EDMUNDSON H P. New methods in automatic extracting[J]. Journal of the ACM (JACM), 1969, 16(2):264-285.
[17] LIN C Y. ROUGE:A package for automatic evaluation of summaries[C/OL]//Proceedings of Workshop on Text Summarization Branches Out Post Conference Workshop of ACL 2004.[2017-05-03]. https://www.microsoft.com/en-us/research/wp-content/uploads/2016/07/was2004.pdf.
[18] PARVEEN D, MESGAR M, STRUBE M. Generating coherent summaries of scientific articles using coherence patterns[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2016:772-783.
Outlines

/