2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

IEEE Signal Processing Society

Institute of Electrical and Electronics Engineers (IEEE)

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

Technical Program

Paper Detail

Paper ID	SPE-15.1
Paper Title	NOISE LEVEL LIMITED SUB-MODELING FOR DIFFUSION PROBABILISTIC VOCODERS
Authors	Takuma Okamoto, National Institute of Information and Communications Technology, Japan; Tomoki Toda, Nagoya University, Japan; Yoshinori Shiga, Hisashi Kawai, National Institute of Information and Communications Technology, Japan
Session	SPE-15: Speech Synthesis 3: Vocoder
Location	Gather.Town
Session Time:	Wednesday, 09 June, 13:00 - 13:45
Presentation Time:	Wednesday, 09 June, 13:00 - 13:45
Presentation	Poster
Topic	Speech Processing: [SPE-SYNT] Speech Synthesis and Generation
IEEE Xplore Open Preview	Click here to view in IEEE Xplore
Virtual Presentation	Click here to watch in the Virtual Conference
Abstract	Although diffusion probabilistic vocoders WaveGrad and DiffWave can realize real-time high-fidelity speech synthesis with a simple loss function in training, all noise components with full noise level range are predicted by one model in all iterations. This paper proposes a simple but effective noise level limited sub-modeling framework for diffusion probabilistic vocoders as Sub-WaveGrad and Sub-DiffWave. In the proposed method, DiffWave conditioned on continuous noise level as WaveGrad and spectral enhancement post-filtering are also provided. The proposed Sub-WaveGrad and Sub-DiffWave models are realized by using 10 sub-models. These models are separately trained with different limited noise levels, and only necessary sub-models are used according to the noise schedule in inference. The results of experiments using a Japanese female speech corpus indicate that both the proposed Sub-WaveGrad and Sub-DiffWave outperform vanilla WaveGrad and DiffWave in terms of the model accuracy and synthesis quality while keeping the inference speed.