2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

IEEE Signal Processing Society

Institute of Electrical and Electronics Engineers (IEEE)

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

Technical Program

Paper Detail

Paper ID	AUD-6.1
Paper Title	ICASSP 2021 ACOUSTIC ECHO CANCELLATION CHALLENGE: INTEGRATED ADAPTIVE ECHO CANCELLATION WITH TIME ALIGNMENT AND DEEP LEARNING-BASED RESIDUAL ECHO PLUS NOISE SUPPRESSION
Authors	Renhua Peng, Linjuan Cheng, Chengshi Zheng, Xiaodong Li, Institute of Acoustics, Chinese Academy of Sciences, China
Session	AUD-6: Active Noise Control, Echo Reduction, and Feedback Reduction 2: Active Noise Control and Echo Cancellation
Location	Gather.Town
Session Time:	Tuesday, 08 June, 16:30 - 17:15
Presentation Time:	Tuesday, 08 June, 16:30 - 17:15
Presentation	Poster
Topic	Audio and Acoustic Signal Processing: [AUD-NEFR] Active Noise Control, Echo Reduction and Feedback Reduction
IEEE Xplore Open Preview	Click here to view in IEEE Xplore
Virtual Presentation	Click here to watch in the Virtual Conference
Abstract	This paper describes a three-stage acoustic echo cancellation (AEC) and suppression framework for the ICASSP 2021 AEC-Challenge. In the first stage, a partitioned block frequency domain adaptive filtering is implemented to cancel the linear echo components without introducing the near-end speech distortion, where we estimate and compensate the time delay between the far-end reference signal and the microphone signal beforehand. In the second stage, a deep complex U-Net integrated with gated recurrent unit is proposed to further suppress the residual echo components. Finally, an extremely tiny deep complex U-Net is trained to further suppress environmental noise in the last stage, which can also further increase the echo return loss enhancement (ERLR) without increasing the computational complexity dramatically. Experimental results show that the proposed three-stage framework can get the ERLE over 50 dB in both single-talk and double-talk scenarios, and perceptual evaluation of speech quality can be improved about 0.7 in double-talk scenarios. Subjective results show that the proposed framework outperforms the AEC-Challenge baseline ResRNN by 0.12 points in terms of the MOS.