2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

Technical Program

Paper Detail

Paper IDAUD-6.1
Paper Title ICASSP 2021 ACOUSTIC ECHO CANCELLATION CHALLENGE: INTEGRATED ADAPTIVE ECHO CANCELLATION WITH TIME ALIGNMENT AND DEEP LEARNING-BASED RESIDUAL ECHO PLUS NOISE SUPPRESSION
Authors Renhua Peng, Linjuan Cheng, Chengshi Zheng, Xiaodong Li, Institute of Acoustics, Chinese Academy of Sciences, China
SessionAUD-6: Active Noise Control, Echo Reduction, and Feedback Reduction 2: Active Noise Control and Echo Cancellation
LocationGather.Town
Session Time:Tuesday, 08 June, 16:30 - 17:15
Presentation Time:Tuesday, 08 June, 16:30 - 17:15
Presentation Poster
Topic Audio and Acoustic Signal Processing: [AUD-NEFR] Active Noise Control, Echo Reduction and Feedback Reduction
IEEE Xplore Open Preview  Click here to view in IEEE Xplore
Virtual Presentation  Click here to watch in the Virtual Conference
Abstract This paper describes a three-stage acoustic echo cancellation (AEC) and suppression framework for the ICASSP 2021 AEC-Challenge. In the first stage, a partitioned block frequency domain adaptive filtering is implemented to cancel the linear echo components without introducing the near-end speech distortion, where we estimate and compensate the time delay between the far-end reference signal and the microphone signal beforehand. In the second stage, a deep complex U-Net integrated with gated recurrent unit is proposed to further suppress the residual echo components. Finally, an extremely tiny deep complex U-Net is trained to further suppress environmental noise in the last stage, which can also further increase the echo return loss enhancement (ERLR) without increasing the computational complexity dramatically. Experimental results show that the proposed three-stage framework can get the ERLE over 50 dB in both single-talk and double-talk scenarios, and perceptual evaluation of speech quality can be improved about 0.7 in double-talk scenarios. Subjective results show that the proposed framework outperforms the AEC-Challenge baseline ResRNN by 0.12 points in terms of the MOS.