物理学与电子学

函数拟合实现语音演唱

  • 王咿卜 ,
  • 李建文
展开
  • 陕西科技大学 电子信息与人工智能学院, 西安 710021

收稿日期: 2020-06-10

  网络出版日期: 2021-01-28

基金资助

国家自然科学基金(60672001)

Voice singing by function fitting

  • Yibu WANG ,
  • Jianwen LI
Expand
  • School of Electronic Information and Artificial Intelligence, Shaanxi University of Science and Technology, Xi’an 710021, China

Received date: 2020-06-10

  Online published: 2021-01-28

摘要

语调是说话的腔调, 由不同的声调抑扬顿挫的配制和变化形成, 是人类传递情感的特征之一. 通过调节语调参数来改变一段话语中某个字音的长短及高低, 从而使可控的语调实现语音演唱的效果, 弥补了语音合成在歌曲演唱方面研究的缺失. 采用倒谱法来提取基音频率, 线性预测编码(Linear Predictive Coding, LPC)方法对共振峰进行估算, 最终通过高次多项式对语音声调的基频进行拟合, 将得到的拟合函数进行实时调整, 形成语调以达到语音演唱的目的. 从基音频率及共振峰两个基本参数出发, 结合发音的数理本质, 用直观的数学方法来进行语音演唱的合成, 使得原始语音与合成语音的总体识别率达到了87.6%. 合成结果表明, 采用调整语音合成参数的方法进行语调的变化, 能够使语音演唱的表现更加可控.

本文引用格式

王咿卜 , 李建文 . 函数拟合实现语音演唱[J]. 华东师范大学学报(自然科学版), 2021 , 2021(1) : 152 -164 . DOI: 10.3969/j.issn.1000-5641.202022009

Abstract

Intonation is the tone of speech, which is formed by variations in pitch and emphasis; it is one of the characteristics of human emotion transmission. By adjusting the intonation parameters to change the length and height of certain words in discourse, the controlled intonation can mimic the effect of singing; this approach, in turn, can be used to address the lack of research on voice synthesis in singing. The cepstrum method is used to extract the pitch frequency, the LPC (linear predictive coding) method is used to estimate the formant, and a high-order polynomial is used to fit the pitch of the voice; the fitting function is then adjusted in real time to form the tone required to achieve the objective of singing. Given two basic speech parameters, pitch frequency and formant, combined with the mathematical nature of pronunciation, this paper uses an intuitive mathematical method to synthesize the effect of singing; using this method, the original voice and the synthetic voice reach an overall recognition rate of 87.6%. The result of this synthesis shows that by adjusting the parameters of speech synthesis, we can achieve greater control over voice singing.

参考文献

1 杨润. 北方音乐, 语音语调中蕴含的情感表达点. 2018, 38 (15): 61.
2 赵一勤, 曹嘉欣, 刘靖禹. 电脑编程技巧与维护, 基于语音语调的抑郁症检测软件. 2019, (5): 37- 39.
3 徐晨煜. 电子世界, 基于统计机器学习的端到端的语音合成研究. 2020, (6): 77- 79.
4 王永鑫, 贾珈, 张雨辰, 等. 清华大学学报(自然科学版), 基于HMM语音合成的语调控制. 2013, 53 (6): 781- 786.
5 吴秀坤. 中国科技信息, 发声器官的构造与功能. 2006, (6): 243.
6 陶曙光. 音乐天地, 歌唱发声器官的基本构造与发声原理. 2015, (9): 48- 50.
7 宋知用. MATLAB在语音信号分与合成中的应用 [M]. 北京: 北京航空航天大学出版社, 2013: 16-20.
8 周珺. 在汉语语音识别中语速、音量和音调调整的研究 [D]. 西安: 西安电子科技大学, 2002.
9 余叶. 黄河之声, 音色在声乐演唱中的运用. 2020, (2): 70- 71.
10 彭佳, 许桂清, 吴先球. 物理通报, 具身认知视野下的初中物理概念教学设计优化——以声音的特征“响度”课堂教学为例. 2020, (1): 45- 48.
11 SCHARINE A A, MCBEATH M K. Routledge, Natural regularity of correlated acoustic frequency and intensity in music and speech: Auditory scene analysis mechanisms account for integrality of pitch and loudness. 2018, 1 (3/4): 205- 228.
12 杨懿. 儿童音乐, 噪音音乐艺术在古筝演奏中的展现. 2013, (8): 62- 64.
13 王建群, 高下, 刘晓宙, 等. 临床耳鼻咽喉头颈外科杂志, 艺术嗓音中不同唱法的元音共振峰研究. 2008, (15): 679- 682.
14 王莲子, 李钟晓, 陈倩倩, 等. 青岛大学学报(工程技术版), 基于K-SVD算法和组合字典的语音信号清浊音判决研究. 2020, 35 (2): 17- 23.
15 BRAUN S. Mechanical Systems and Signal Processing, Cepstrum based methods. 2019, 128, 674- 676.
16 焦蓓, 曾以成, 侯丽霞. 计算机工程与应用, 结合改进自相关与加权小波分量的基音检测法. 2013, 49 (14): 222- 226,246.
17 戴维一. 论基于电子音乐音响合成的创作思维 [D]. 上海: 上海音乐学院, 2010.
18 刘建新, 曹荣, 赵鹤鸣. 西华大学学报(自然科学版), 一种LPC改进算法在提取耳语音共振峰中的应用. 2008, (3): 77- 80,110.
19 ILYAS M, OTHMANI A, NAIT-ALI A. Multimedia Tools and Applications, Auditory perception based system for age classification and estimation using dynamic frequency sound. 2020, 79, 21603- 21626.
20 VAN MAASTRICHT L, ZEE T, KRAHMER E, et al. The interplay of prosodic cues in the L2: How intonation, rhythm, and speech rate in speech by Spanish learners of Dutch contribute to L1 Dutch perceptions of accentedness and comprehensibility [J/OL]. Speech Communication, (2020-04-28)[2020-06-01]. https://doi.org/10.1016/j.specom.2020.04.003.
21 郭慧. 文教资料, 汉语普通话陈述句与疑问句语调的声学特征分析. 2019, 35, 36- 39.
22 HA-KYUNG K, 赵风云, 刘晓明, 等. 听力学及言语疾病杂志, 正常青年人不同语料测试基频的研究. 2015, 23 (6): 575- 577.
23 ARUL E. Deep nonlinear regression least squares polynomial fit to detect malicious attack on IoT devices [J/OL]. Journal of Ambient Intelligence and Humanized Computing, (2020-05-14)[2020-06-01]. https://doi.org/10.1007/s12652-020-02075-y.
24 宋刚, 姚艳红. 计算机工程与应用, 用于汉语单音节声调识别的基频轨迹拟合方法. 2008, 29, 239- 240, 244.
文章导航

/