2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

Technical Program

Paper Detail

Paper IDSPE-54.3
Paper Title BW-EDA-EEND: STREAMING END-TO-END NEURAL SPEAKER DIARIZATION FOR A VARIABLE NUMBER OF SPEAKERS
Authors Eunjung Han, Chul Lee, Andreas Stocke, Amazon, United States
SessionSPE-54: End-to-End Speaker Diarization and Recognition
LocationGather.Town
Session Time:Friday, 11 June, 13:00 - 13:45
Presentation Time:Friday, 11 June, 13:00 - 13:45
Presentation Poster
Topic Speech Processing: [SPE-SPKR] Speaker Recognition and Characterization
IEEE Xplore Open Preview  Click here to view in IEEE Xplore
Virtual Presentation  Click here to watch in the Virtual Conference
Abstract We present a novel online end-to-end neural diarization system, BW-EDA-EEND, that processes data incrementally for a variable number of speakers. The system is based on the EDA architecture of Horiguchi et al., but utilizes the incremental Transformer encoder, attending only to its left contexts and using block-level recurrence in the hidden states to carry information from block to block, making the algorithm complexity linear in time. We propose two variants of it. For unlimited-latency BW-EDA-EEND which processes in-puts linear in time, we show only moderate degradation for up to two speakers using a context size of 10 seconds compared to offline EDA-EEND. With more than two speakers, the accuracy gap between online and offline grows, but it still outperforms a baseline offline clustering diarization system for one to four speakers with unlimited context size, and shows comparable accuracy with context size of 10 seconds. For limited-latency BW-EDA-EEND, which produces diarization outputs block-by-block as audio arrives, we show accuracy comparable to the offline clustering-based system.