2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

IEEE Signal Processing Society

Institute of Electrical and Electronics Engineers (IEEE)

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

Technical Program

Paper Detail

Paper ID	DEMO-1.2
Paper Title	Speech Data Explorer: Interactive Analysis Tool for Speech Datasets
Authors	Vitaly Lavrukhin, Evelina Bakhturina, Boris Ginsburg, NVIDIA, United States
Session	DEMO-1: Show and Tell Demonstrations 1
Location	Zoom
Session Time:	Wednesday, 09 June, 08:00 - 09:45
Presentation Time:	Wednesday, 09 June, 08:00 - 09:45
Presentation	Poster
Topic	Show and Tell Demonstration: Demo
Virtual Presentation	Click here to watch in the Virtual Conference
Abstract	Automatic Speech Recognition (ASR) and Text-To-Speech (TTS) models require large labeled speech datasets for training. It is very important to have accurate reference transcripts that correspond to audio recordings. Otherwise, models might learn errors from training data and reproduce those errors during inference. We have developed Speech Data Explorer (SDE) to help examine quality of speech datasets and do interactive error analysis of ASR models’ predictions. Its core strengths include the following: - an interactive table that contains dataset’s utterances and supports filtering (thresholding) and sorting; - interactive visualization of metrics and a signal in time and frequency domains (with a built-in audio player); - easiness of extensibility (it is straightforward to add new metrics as table’s columns and have all interactive features). To the best of our knowledge, SDE is the first open source tool for interactive exploration of speech datasets and error analysis of ASR models’ predictions. It is implemented as a web application based on Plotly Dash framework. SDE is an essential tool for the analysis of speech datasets and ASR models in our own research. It has already helped us to quickly identify labeling issues in many public and commercial speech datasets, analyze accuracy of ASR models and construct new datasets (for example, Russian LibriSpeech [http://www.openslr.org/96/]). We believe that SDE with its interactivity and extensibility could be beneficial for the wide speech processing community. We will demonstrate how SDE could be used for: - interactive analysis of a speech dataset; - interactive error analysis of transcripts generated by an ASR model; - analysis with custom metrics that is useful for different tasks (for example, long utterance segmentation).