IEEE ICASSP 2021 || Toronto, Ontario, Canada || 6-11 June 2021

My ICASSP 2021 Schedule

Note: Your custom schedule will not be saved unless you create a new account or login to an existing account.

Create a login based on your email (takes less than one minute)
Perform 'Paper Search'
Select papers that you desire to save in your personalized schedule
Click on 'My Schedule' to see the current list of selected papers
Click on 'Printable Version' to create a separate window suitable for printing (the header and menu will appear, but will not actually print)

Paper Detail

Paper ID

AUD-13.2

Paper Title

AN IMPROVED MEAN TEACHER BASED METHOD FOR LARGE SCALE WEAKLY LABELED SEMI-SUPERVISED SOUND EVENT DETECTION

Authors

Xu Zheng, Yan Song, National Engineering Laboratory for Speech and Language Information Processing, University of Science and Technology of China, China; Ian McLoughlin, ICT Cluster, Singapore Institute of Technology, Singapore; Lin Liu, iFLYTEK Research, iFLYTEK CO., LTD., China; Li-Rong Dai, National Engineering Laboratory for Speech and Language Information Processing, University of Science and Technology of China, China

Session

AUD-13: Detection and Classification of Acoustic Scenes and Events 2: Weak supervision

Location

Gather.Town

Session Time:

Wednesday, 09 June, 15:30 - 16:15

Presentation Time:

Wednesday, 09 June, 15:30 - 16:15

Presentation

Poster

Topic

Audio and Acoustic Signal Processing: [AUD-CLAS] Detection and Classification of Acoustic Scenes and Events

IEEE Xplore Open Preview

Click here to view in IEEE Xplore

Abstract

This paper presents an improved mean teacher~(MT) based method for large-scale weakly labeled semi-supervised sound event detection~(SED), by focusing on learning a better student model. Two main improvements are proposed based on the authors' previous perturbation based MT method. Firstly, an event-aware module is designed to allow multiple branches with different kernel sizes to be fused via an attention mechanism. By inserting this module after the convolutional layer, each neuron can adaptively adjust its receptive field to suit different sound events. Secondly, instead of using the teacher model to provide a consistency cost term, we propose using a stochastic inference of unlabeled examples to generate high quality pseudo-targets by averaging multiple predictions from the perturbed student model. MixUp of both labeled and unlabeled data is further exploited to improve the effectiveness of student model. Finally, the teacher model can be obtained via exponential moving average (EMA) of the student model, which generates final predictions for SED during inference. Experiments on the DCASE2018 task4 dataset demonstrate the ability of the proposed method. Specifically, an F1-score of 42.1% is achieved, significantly outperforming the 32.4\% achieved by the winning system, or the 39.3% by the previous perturbation based method.

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

My ICASSP 2021 Schedule

Paper Detail