2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

Technical Program

Paper Detail

Paper IDSPE-52.5
Paper Title A Time-domain Convolutional Recurrent Network for Packet Loss Concealment
Authors Ju Lin, Clemson University, United States; Yun Wang, Kaustubh Kalgaonkar, Gil Keren, Didi Zhang, Christian Fuegen, Facebook AI, United States
SessionSPE-52: Speech Enhancement 8: Echo Cancellation and Other Tasks
LocationGather.Town
Session Time:Friday, 11 June, 13:00 - 13:45
Presentation Time:Friday, 11 June, 13:00 - 13:45
Presentation Poster
Topic Speech Processing: [SPE-ENHA] Speech Enhancement and Separation
IEEE Xplore Open Preview  Click here to view in IEEE Xplore
Virtual Presentation  Click here to watch in the Virtual Conference
Abstract Packet loss may affect a wide range of applications that use voice over IP (VoIP), e.g. video conferencing. In this paper, we investigate a time-domain convolutional recurrent network (CRN) for online packet loss concealment. CRN comprises a convolutional encoder-decoder structure and long short-term memory (LSTM) layers, which have been shown to be suitable for real-time speech enhancement applications. Moreover, we propose lookahead and masked training to further improve the performance of the CRN framework. Experimental results show that the proposed system outperforms a baseline system using only LSTM layers in terms of two objective metrics -- perceptual evaluation of speech quality (PESQ) and short-term objective intelligibility (STOI); it also reduces the word error rate (WER) more than the baseline when used as a frontend for speech recognition. The advantage of the proposed system is also verified in a subjective evaluation by the mean opinion score (MOS).