Journal of East China Normal University(Natural Sc ›› 2019, Vol. 2019 ›› Issue (4): 111-119.doi: 10.3969/j.issn.1000-5641.2019.04.011

• Computer Science • Previous Articles     Next Articles

An end-to-end Chinese speech synthesis scheme based on Tacotron 2

WANG Guo-liang1, CHEN Meng-nan2, CHEN Lei2   

  1. 1. Information and Communication Branch, State Grid Anhui Electric Power Co., Ltd., Hefei 230061, China;
    2. Department of Computer Science and Technology, East China Normal University, Shanghai 200062, China
  • Received:2018-10-28 Online:2019-07-25 Published:2019-07-18

Abstract: The disruptively design for an end-to-end speech synthesis system Tacotron 2, is currently only available in English. This paper is devoted to implementing several improvements to Tacotron 2 and presents a Chinese speech synthesis scheme, including:a pre-processing module to convert Chinese characters into phonetic characters to address the challenge of Chinese character not corresponding to pronunciation, having multiple tones, and having polyphonic words; a pre-training decoder to achieve better sound quality with less corpus given the lack of existing Chinese training corpus; a strategy of weighting the cross-entropy loss and using the multi-layer perceptron, instead of the linear transformation, to predict stop tokens and to solve the Chinese speech synthesis sudden pause problem; and a multi-head attention mechanism to further improve Chinese speech quality. The experimental comparison of the Mel spectrum and the Mel cepstrum distance (MCD) shows that our work is effective and can make Tacotron 2 adapted to the requirements of Chinese speech synthesis.

Key words: text to speech, multi-head attention, Tacotron 2

CLC Number: