| Paper ID | AUD-26.1 |
| Paper Title |
SPEECH ENHANCEMENT WITH MIXTURE OF DEEP EXPERTS WITH CLEAN CLUSTERING PRE-TRAINING |
| Authors |
Shlomo E. Chazan, Jacob Goldberger, Sharon Gannot, Bar-Ilan University, Israel |
| Session | AUD-26: Signal Enhancement and Restoration 3: Signal Enhancement |
| Location | Gather.Town |
| Session Time: | Thursday, 10 June, 16:30 - 17:15 |
| Presentation Time: | Thursday, 10 June, 16:30 - 17:15 |
| Presentation |
Poster
|
| Topic |
Audio and Acoustic Signal Processing: [AUD-SEN] Signal Enhancement and Restoration |
| IEEE Xplore Open Preview |
Click here to view in IEEE Xplore |
| Virtual Presentation |
Click here to watch in the Virtual Conference |
| Abstract |
In this study we present a mixture of deep experts (MoDE) neural network architecture for single microphone speech enhancement. Our architecture comprises a set of deep neural networks (DNNs), each of which is an ‘expert’ in a different speech spectral pattern such as phoneme. A gating DNN is responsible for the latent variables which are the weights assigned to each expert’s output given a speech segment. The experts estimate a mask from the noisy input and the final mask is then obtained as a weighted average of the experts’ estimates, with the weights determined by the gating DNN. A soft spectral attenuation, based on the estimated mask, is then applied to enhance the noisy speech signal. As a byproduct, we gain reduction at the complexity in test time. We show that the experts specialization allows better robustness to unfamiliar noise types. |