2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

Technical Program

Paper Detail

Paper IDSPE-53.2
Paper Title COMPOSITIONAL EMBEDDING MODELS FOR SPEAKER IDENTIFICATION AND DIARIZATION WITH SIMULTANEOUS SPEECH FROM 2+ SPEAKERS
Authors Zeqian Li, Jacob Whitehill, Worcester Polytechnic Institute, United States
SessionSPE-53: Speaker Diarization
LocationGather.Town
Session Time:Friday, 11 June, 13:00 - 13:45
Presentation Time:Friday, 11 June, 13:00 - 13:45
Presentation Poster
Topic Speech Processing: [SPE-SPKR] Speaker Recognition and Characterization
IEEE Xplore Open Preview  Click here to view in IEEE Xplore
Virtual Presentation  Click here to watch in the Virtual Conference
Abstract We propose a new method for speaker diarization that can handle overlapping speech with 2+ people. Our method is based on compositional embeddings [1]: Like standard speaker embedding methods such as x-vector [2], compositional embedding models contain a function f that separates speech from different speakers. In addition, they include a composition function g to compute set-union operations in the embedding space so as to infer the set of speakers within the input audio. In an experiment on multi-person speaker identification using synthesized LibriSpeech data, the proposed method outperforms traditional embedding methods that are only trained to separate single speakers (not speaker sets). In a speaker diarization experiment on the AMI Headset Mix corpus, we achieve state-of-the-art accuracy (DER=22.93%), slightly higher than the previous best result (23.8% from [3]).