2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

Technical Program

Paper Detail

Paper IDSPE-25.6
Paper Title DISENTANGLEMENT FOR AUDIO-VISUAL EMOTION RECOGNITION USING MULTITASK SETUP
Authors Raghuveer Peri, University of Southern California, United States; Srinivas Parthasarathy, Charles Bradshaw, Shiva Sundaram, Amazon Inc, United States
SessionSPE-25: Speech Emotion 3: Emotion Recognition - Representations, Data Augmentation
LocationGather.Town
Session Time:Wednesday, 09 June, 15:30 - 16:15
Presentation Time:Wednesday, 09 June, 15:30 - 16:15
Presentation Poster
Topic Speech Processing: [SPE-ANLS] Speech Analysis
IEEE Xplore Open Preview  Click here to view in IEEE Xplore
Virtual Presentation  Click here to watch in the Virtual Conference
Abstract Deep learning models trained on audio-visual data have been successfully used to achieve state-of-the-art performance for emotion recognition. In particular, models trained with multitask learning have shown additional performance improvements. However, such multitask models entangle information between the tasks, encoding the mutual dependencies present in label distributions in the real world data used for training. This work explores the disentanglement of multimodal signal representations for the primary task of emotion recognition and an secondary person identification task. In particular, we developed a multitask framework to extract low-dimensional embeddings that aim to capture emotion specific information, while containing minimal information related to person identity. We evaluate three different techniques for disentanglement and report results of up to 13% disentanglement while maintaining emotion recognition performance.