华东师范大学学报(自然科学版) ›› 2021, Vol. 2021 ›› Issue (1): 152-164.doi: 10.3969/j.issn.1000-5641.202022009

• 物理学与电子学 • 上一篇    下一篇

函数拟合实现语音演唱

王咿卜, 李建文*()   

  1. 陕西科技大学 电子信息与人工智能学院, 西安 710021
  • 收稿日期:2020-06-10 出版日期:2021-01-25 发布日期:2021-01-28
  • 通讯作者: 李建文 E-mail:lijw@sust.edu.cn
  • 基金资助:
    国家自然科学基金(60672001)

Voice singing by function fitting

Yibu WANG, Jianwen LI*()   

  1. School of Electronic Information and Artificial Intelligence, Shaanxi University of Science and Technology, Xi’an 710021, China
  • Received:2020-06-10 Online:2021-01-25 Published:2021-01-28
  • Contact: Jianwen LI E-mail:lijw@sust.edu.cn

摘要:

语调是说话的腔调, 由不同的声调抑扬顿挫的配制和变化形成, 是人类传递情感的特征之一. 通过调节语调参数来改变一段话语中某个字音的长短及高低, 从而使可控的语调实现语音演唱的效果, 弥补了语音合成在歌曲演唱方面研究的缺失. 采用倒谱法来提取基音频率, 线性预测编码(Linear Predictive Coding, LPC)方法对共振峰进行估算, 最终通过高次多项式对语音声调的基频进行拟合, 将得到的拟合函数进行实时调整, 形成语调以达到语音演唱的目的. 从基音频率及共振峰两个基本参数出发, 结合发音的数理本质, 用直观的数学方法来进行语音演唱的合成, 使得原始语音与合成语音的总体识别率达到了87.6%. 合成结果表明, 采用调整语音合成参数的方法进行语调的变化, 能够使语音演唱的表现更加可控.

关键词: 语调, 声调, 语音演唱, 倒谱法, 基音频率, 线性预测编码(LPC)方法, 共振峰, 拟合函数

Abstract:

Intonation is the tone of speech, which is formed by variations in pitch and emphasis; it is one of the characteristics of human emotion transmission. By adjusting the intonation parameters to change the length and height of certain words in discourse, the controlled intonation can mimic the effect of singing; this approach, in turn, can be used to address the lack of research on voice synthesis in singing. The cepstrum method is used to extract the pitch frequency, the LPC (linear predictive coding) method is used to estimate the formant, and a high-order polynomial is used to fit the pitch of the voice; the fitting function is then adjusted in real time to form the tone required to achieve the objective of singing. Given two basic speech parameters, pitch frequency and formant, combined with the mathematical nature of pronunciation, this paper uses an intuitive mathematical method to synthesize the effect of singing; using this method, the original voice and the synthetic voice reach an overall recognition rate of 87.6%. The result of this synthesis shows that by adjusting the parameters of speech synthesis, we can achieve greater control over voice singing.

Key words: intonation, tone, voice singing, cepstrum, pitch frequency, LPC (linear predictive coding) method, formant, fitting function

中图分类号: