[1] HARRIS Z S. Distributional structure[J]. Word, 1954, 10(2/3):146-162. [2] FIRTH J R. A synopsis of linguistic theory, 1930-1955[J]. Studies in linguistic analysis, 1957(S):1-31 [3] 来斯惟. 基于神经网络的词和文档语义向量表示方法研究[D]. 北京:中国科学院大学, 2016. [4] TURIAN J, RATINOV L, BENGIO Y. Word representations:a simple and general method for semi-supervised learning[C]//ACL 2010, Proceedings of the Meeting of the Association for Computational Linguistics, July 11-16, 2010, Uppsala, Sweden. DBLP, 2010:384-394. [5] DEERWESTER S, DUMAIS S T, FURNAS G W, et al. Indexing by latent semantic analysis[J]. Journal of the American Society for Information Science, 1990, 41(6):391. [6] PENNINGTON J, SOCHER R, MANNING C. Glove:Global vectors for word representation[C]//Conference on Empirical Methods in Natural Language Processing, 2014:1532-1543. [7] BROWN P F, DESOUZA P V, MERCER R L, et al. Class-based n-gram models of natural language[J]. Computational linguistics, 1992, 18(4):467-479. [8] GUO J, CHE W, WANG H, et al. Revisiting embedding features for simple semi-supervised learning[C]//Conference on Empirical Methods in Natural Language Processing, 2014:110-120. [9] CHEN X, XU L, LIU Z, et al. Joint learning of character and word embeddings[C]//International Conference on Artificial Intelligence. AAAI Press, 2015:1236-1242. [10] HINTON G E. Learning distributed representations of concepts[C]//Proceedings of the Eighth Annual Conference of the Cognitive Science Society, 1986:12. [11] MⅡKKULAINEN R, DYER M G. Natural language processing with modular neural networks and distributed lexicon[C]//Cognitive Science, 1991:343-399. [12] ALEXRUDNICKY. Can artificial neural networks learn language models?[C]//International Conference on Spoken Language Processing. DBLP, 2000:202-205. [13] BENGIO Y, DUCHARME R, VINCENT P, et al. A neural probabilistic language model[J]. Journal of Machine Learning Research, 2003, 3(6):1137-1155. [14] MNIH A, HINTON G. Three new graphical models for statistical language modelling[C]//Machine Learning, Proceedings of the Twenty-Fourth International Conference. DBLP, 2007:641-648. [15] SUTSKEVER I, HINTON G E. Learning multilevel distributed representations for high-dimensional sequences[J]. Journal of Machine Learning Research, 2007(2):548-555. [16] MNIH A, HINTON G. A scalable hierarchical distributed language model[C]//Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December. DBLP, 2008:1081-1088. [17] MNIH A, KAVUKCUOGLU K. Learning word embeddings efficiently with noise-contrastive estimation[C]//Advances in Neural Information Processing Systems, 2013:2265-2273. [18] MIKOLOV T, KARAFIÁT M, BURGET L, et al. Recurrent neural network based language model[C]//INTERSPEECH 2010, Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, September. DBLP, 2010:1045-1048. [19] MIKOLOV T, KOMBRINK S, DEORAS A, et al. Rnnlm-recurrent neural network language modeling toolkit[C]//Processingof the 2011 ASRU Workshop, 2011:196-201. [20] BENGIO Y, SIMARD P, FRASCONI P. Learning long-term dependencies with gradient descent is difficult[J]. IEEE Transactions on Neural Networks, 2002, 5(2):157-166. [21] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural computation, 1997, 9(8):1735-1780. [22] CHO K, VAN MERRIENBOER B, GULCEHRE C, et al. Learning phrase representations using RNN encoderdecoder for statistical machine translation[C]//Empirical Methods in Natural Language Processing, 2014:1724-1734. [23] CHO K, VAN MERRIËNBOER B, BAHDANAU D, et al. On the properties of neural machine translation:Encoder-decoder approaches[J]. ArXiv preprint arXiv:1409.1259, 2014. [24] CHUNG J, GULCEHRE C, CHO K H, et al. Empirical evaluation of gated recurrent neural networks on sequence modeling[J]. ArXiv preprint arXiv:1412.3555, 2014. [25] GREFF K, SRIVASTAVA R K, KOUTNÍK J, et al. LSTM:A search space odyssey[J]. IEEE Transactions on Neural Networks & Learning Systems, 2015(99):1-11. [26] JOZEFOWICZ R, ZAREMBA W, SUTSKEVER I, et al. An empirical exploration of recurrent network architectures[C]//International Conference on Machine Learning, 2015:2342-2350. [27] MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[J]. ArXiv preprint arXiv:1301.3781, 2013. [28] MORIN F, BENGIO Y. Hierarchical probabilistic neural network language model[C]//Aistats, 2005:246-252. [29] GOODMAN J. Classes for fast maximum entropy training[C]//IEEE International Conference on Acoustics, Speech, and Signal Processing. IEEE, 2001:561-564. [30] FELLBAUM C, MILLER G. WordNet:An Electronic Lexical Database[M].Cambridge, MA:MIT Press, 1998. [31] MNIH A, HINTON G. A scalable hierarchical distributed language model[C]//International Conference on Neural Information Processing Systems. Curran Associates Inc, 2008:1081-1088. [32] LE H S, OPARIN I, ALLAUZEN A, et al. Structured Output Layer neural network language model[C]//IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2011:5524-5527. [33] MIKOLOV T, KOMBRINK S, BURGET L, et al. Extensions of recurrent neural network language model[C]//IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2011:5528-5531. [34] COLLOBERT R, WESTON J. A unified architecture for natural language processing:Deep neural networks with multitask learning[C]//International Conference. DBLP, 2008:160-167. [35] COLLOBERT R, WESTON J, BOTTOU L, et al. Natural Language processing (almost) from scratch[J]. Journal of Machine Learning Research, 2011, 12(1):2493-2537. [36] GUTMANN M, HYVÄRINEN A. Noise-contrastive estimation:A new estimationp rinciple for unnormalized statistical models[J]. Journal of Machine Learning Research, 2010(9):297-304. [37] GUTMANN M U, HYVARINEN A. Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics[J]. Journal of Machine Learning Research, 2012, 13(1):307-361. [38] MNIH A, TEH Y W. A fast and simple algorithm for training neural probabilistic language models[C]//International Conference on Machine Learning, 2012:1751-1758. [39] BENGIO Y, SENÉCAL J S. Quick Training of Probabilistic Neural Nets by Impo rtance Sampling[C]//AISTATS, 2003:1-9. [40] ZOPH B, VASWANI A, MAY J, et al. Simple, Fast Noise-Contrastive Estimation for Large RNN Vocabularies[C]//Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, 2016:1217-1222. [41] DYER C. Notes on noise contrastive estimation and negative sampling[J]. ArXiv preprint arXiv:1410.8251, 2014. [42] MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed Representations of Words and Phrases and their Compositionality[J]. Advances in Neural Information Processing Systems, 2013, 26:3111-3119. [43] CHEN W, GRANGIER D, AULI M, et al. Strategies for training large vocabulary neural language models[C]//Meeting of the Association for Computational Linguistics, 2015:1975-1985. [44] DEVLIN J, ZBIB R, HUANG Z, et al. Fast and robust neural network joint models for statistical machine translation[C]//Meeting of the Association for Computational Linguistics, 2014:1370-1380. [45] ANDREAS J, DAN K. When and why are log-linear models self-normalizing?[C]//Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, 2015:244-249. [46] MIKOLOV T, KOPECKY J, BURGET L, et al. Neural network based language models for highly inflective languages[C]//IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2009:4725-4728. [47] SANTOS C D, ZADROZNY B. Learning character-level representations for part-of-speech tagging[C]//Proceedings of the 31st International Conference on Machine Learning (ICML-14), 2014:1818-1826. [48] COTTERELL R, SCHÜTZE H. Morphological word-embeddings[C]//Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, 2015:1287-1292. [49] BOJANOWSKI P, GRAVE E, JOULIN A, et al. Enriching word vectors with subword information[J]. ArXiv preprint arXiv:1607.04606, 2016. [50] LI Y, LI W, SUN F, et al. Component-enhanced Chinese character embeddings[C]//Empirical Methods in Natural Language Processing, 2015:829-834. [51] YU M, DREDZE M. Improving lexical embeddings with semantic knowledge[C]//Meeting of the Association for Computational Linguistics, 2014:545-550. [52] WANG Z, ZHANG J, FENG J, et al. Knowledge graph and text jointly embedding[C]//Conference on Empirical Methods in Natural Language Processing, 2014:1591-1601. [53] REISINGER J, MOONEY R J. Multi-prototype vector-space models of word meaning[C]//Human Language Technologies:The 2010 Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 2010:109-117. [54] HUANG E H, SOCHER R, MANNING C D, et al. Improving word representations via global context and multiple word prototypes[C]//Meeting of the Association for Computational Linguistics:Long Papers. Association for Computational Linguistics, 2012:873-882. [55] VILNIS L, MCCALLUM A. Word representations via gaussian embedding[R]. University of Massachusetts Amherst, 2014. [56] HILL F, REICHART R, KORHONEN A, et al. Simlex-999:Evaluating semantic models with genuine similarity estimation[J]. Computational Linguistics, 2015, 41(4):665-695. [57] FINKELSTEIN R L. Placing search in context:the concept revisited[J]. Acm Transactions on Information Systems, 2002, 20(1):116-131. [58] ZWEIG G, BURGES C J C. The Microsoft Research sentence completion challenge[R]. Technical Report MSRTR-2011-129, Microsoft, 2011. [59] GLADKOVA A, DROZD A, MATSUOKA S. Analogy-based detection of morphological and semantic relations with word embeddings:what works and what doesn't[C]//HLT-NAACL, 2016:8-15. [60] MIKOLOV T, YIH W, ZWEIG G. Linguistic regularities in continuous space word representations[C]//HLTNAACL, 2013:746-751. [61] BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate[C]//ICLR, 2015:1-15. [62] GROVER A, LESKOVEC J. Node2vec:Scalable feature learning for networks[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016:855-864. |