2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information
Login Paper Search My Schedule Paper Index Help

My ICASSP 2021 Schedule

Note: Your custom schedule will not be saved unless you create a new account or login to an existing account.
  1. Create a login based on your email (takes less than one minute)
  2. Perform 'Paper Search'
  3. Select papers that you desire to save in your personalized schedule
  4. Click on 'My Schedule' to see the current list of selected papers
  5. Click on 'Printable Version' to create a separate window suitable for printing (the header and menu will appear, but will not actually print)

Paper Detail

Paper IDSAM-2.4
Paper Title SYNTHETIC DATA FOR DNN-BASED DOA ESTIMATION OF INDOOR SPEECH
Authors Femke B. Gelderblom, Norwegian University of Science and Technology & SINTEF, Norway; Yi Liu, Johannes Kvam, SINTEF, Norway; Tor Andre Myrvoll, Norwegian University of Science and Technology & SINTEF, Norway
SessionSAM-2: Direction of Arrival Estimation 2
LocationGather.Town
Session Time:Tuesday, 08 June, 16:30 - 17:15
Presentation Time:Tuesday, 08 June, 16:30 - 17:15
Presentation Poster
Topic Sensor Array and Multichannel Signal Processing: [SAM-MAPR] Microphone array processing
IEEE Xplore Open Preview  Click here to view in IEEE Xplore
Abstract This paper investigates the use of different room impulse response (RIR) simulation methods for synthesizing training data for deep neural network-based direction of arrival (DOA) estimation of speech in reverberant rooms. Different sets of synthetic RIRs are obtained using the image source method (ISM) and more advanced methods including diffuse reflections and/or source directivity. Multi-layer perceptron (MLP) deep neural network (DNN) models are trained on generalized cross correlation (GCC) features extracted for each set. Finally, models are tested on features obtained from measured RIRs. This study shows the importance of training with RIRs from directive sources, as resultant DOA models achieved up to 51% error reduction compared to the steered response power with phase transform (SRP-PHAT) baseline (significant with p<<.01), while models trained with RIRs from omnidirectional sources did worse than the baseline. The performance difference was specifically present when estimating the azimuth of speakers not facing the array directly.