2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

Technical Program

Paper Detail

Paper IDHLT-12.2
Paper Title REPLACING HUMAN AUDIO WITH SYNTHETIC AUDIO FOR ON-DEVICE UNSPOKEN PUNCTUATION PREDICTION
Authors Daria Soboleva, Ondrej Skopek, Márius Šajgalík‎, Victor Cărbune‎, Felix Weissenberger, Julia Proskurnia, Bogdan Prisacari, Daniel Valcarce, Justin Lu, Rohit Prabhavalkar, Balint Miklos, Google, Switzerland
SessionHLT-12: Language Understanding 4: Semantic Understanding
LocationGather.Town
Session Time:Thursday, 10 June, 13:00 - 13:45
Presentation Time:Thursday, 10 June, 13:00 - 13:45
Presentation Poster
Topic Human Language Technology: [HLT-UNDE] Spoken Language Understanding and Computational Semantics
IEEE Xplore Open Preview  Click here to view in IEEE Xplore
Virtual Presentation  Click here to watch in the Virtual Conference
Abstract We present a novel multi-modal unspoken punctuation prediction system for the English language which combines acoustic and text features. We demonstrate for the first time, that by relying exclusively on synthetic data generated using a prosody-aware text-to-speech system, we can outperform a model trained with expensive human audio recordings on the unspoken punctuation prediction problem. Our model architecture is well suited for on-device use. This is achieved by leveraging hash-based embeddings of automatic speech recognition text output in conjunction with acoustic features as input to a quasi-recurrent neural network, keeping the model size small and latency low.