2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information
Login Paper Search My Schedule Paper Index Help

My ICASSP 2021 Schedule

Note: Your custom schedule will not be saved unless you create a new account or login to an existing account.
  1. Create a login based on your email (takes less than one minute)
  2. Perform 'Paper Search'
  3. Select papers that you desire to save in your personalized schedule
  4. Click on 'My Schedule' to see the current list of selected papers
  5. Click on 'Printable Version' to create a separate window suitable for printing (the header and menu will appear, but will not actually print)

Paper Detail

Paper IDSPE-37.2
Paper Title Time-domain speaker verification using temporal convolutional networks
Authors Sangwook Han, Jaeuk Byun, Jong Won Shin, Gwangju Institute of Science and Technology, South Korea
SessionSPE-37: Speaker Recognition 5: Neural Embedding
LocationGather.Town
Session Time:Thursday, 10 June, 14:00 - 14:45
Presentation Time:Thursday, 10 June, 14:00 - 14:45
Presentation Poster
Topic Speech Processing: [SPE-SPKR] Speaker Recognition and Characterization
IEEE Xplore Open Preview  Click here to view in IEEE Xplore
Abstract Recently, speaker verification systems using deep neural networks have been widely studied. Many of them utilize hand-crafted features such as mel-filterbank energies, mel-frequency cepstral coefficients, and magnitude spectrograms, which are not designed specifically for the speaker verification task and may not be optimal. Recent releases of the large datasets such as VoxCeleb enable us to extract the task-specific features in a data-driven way. In this paper, we propose a speaker verification system that takes the time-domain raw waveforms as inputs, which adopts a learnable encoder and temporal convolutional networks (TCNs) that have shown impressive performance in speech separation. Moreover, we have applied the squeeze and excitation networks after each TCN block to apply channel-wise attention. Our experiments on the VoxCeleb1 dataset demonstrate that the speaker verification system utilizing the proposed feature extraction model outperforms previously proposed time-domain speaker verification systems.