| Paper ID | SPE-24.6 |
| Paper Title |
DOMAIN-ADVERSARIAL AUTOENCODER WITH ATTENTION BASED FEATURE LEVEL FUSION FOR SPEECH EMOTION RECOGNITION |
| Authors |
Yuan Gao, Jiaxing Liu, Longbiao Wang, Tianjin University, China; Jianwu Dang, Japan Advanced Institute of Science and Technology, Japan |
| Session | SPE-24: Speech Emotion 2: Neural Networks for Speech Emotion Recognition |
| Location | Gather.Town |
| Session Time: | Wednesday, 09 June, 15:30 - 16:15 |
| Presentation Time: | Wednesday, 09 June, 15:30 - 16:15 |
| Presentation |
Poster
|
| Topic |
Speech Processing: [SPE-ANLS] Speech Analysis |
| IEEE Xplore Open Preview |
Click here to view in IEEE Xplore |
| Virtual Presentation |
Click here to watch in the Virtual Conference |
| Abstract |
Over the past two decades, although speech emotion recognition (SER) has garnered considerable attention, the problem of insufficient training data has been unresolved. A potential solution for this problem is to pre-train a model and transfer knowledge from large amounts of audio data. However, the data used for pre-training and testing originate from different domains, resulting in the latent representations to contain non-affective information. In this paper, we propose a domain-adversarial autoencoder to extract discriminative representations for SER. Through domain-adversarial learning, we can reduce the mismatch between domains while retaining discriminative information for emotion recognition. We also introduce multi-head attention to capture emotion information from different subspaces of input utterances. Experiments on IEMOCAP show that the proposed model outperforms the state-of-the-art systems by improving the unweighted accuracy by 4.15\%, thereby demonstrating the effectiveness of the proposed model. |