Journal of East China Normal University(Natural Sc ›› 2017, Vol. 2017 ›› Issue (5): 52-65,79.doi: 10.3969/j.issn.1000-5641.2017.05.006

• Big Data Analysis • Previous Articles     Next Articles

Survey on distributed word embeddings based on neural network language models

YU Ke-ren, FU Yun-bin, DONG Qi-wen   

  1. School of Data Science and Engineering, East China Normal University, Shanghai 200062, China
  • Received:2017-05-01 Online:2017-09-25 Published:2017-09-25

Abstract: Distributed word embedding is one of the most important research topics in the field of Natural Language Processing, whose core idea is using lower dimensional vectors to represent words in text. There are many ways to generate such vectors, among which the methods based on neural network language models perform best. And the respective case is Word2vec, which is an open source tool developed by Google inc. in 2012. Distributed word embeddings can be used to solve many Natural Language Processing tasks such as text clusting, named entity tagging, part of speech analysing and so on. Distributed word embeddings rely heavily on the performance of the neural network language model it based on and the specific task it processes. This paper gives an overview of the distributed word embeddings based on neural network and can be summarized from three aspects, including the construction of classical neural network language models, the optimization method for multi-classification problem in language model, and how to use auxiliary structure to train word embeddings.

Key words: word embedding, language model, neural network

CLC Number: