| Paper ID | SPE-18.4 | 
    | Paper Title | BLIND AND NEURAL NETWORK-GUIDED CONVOLUTIONAL BEAMFORMER FOR JOINT DENOISING, DEREVERBERATION, AND SOURCE SEPARATION | 
	| Authors | Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Hiroshi Sawada, Shoko Araki, NTT Corporation, Japan | 
  | Session | SPE-18: Speech Enhancement 4: Multi-channel Processing | 
  | Location | Gather.Town | 
  | Session Time: | Wednesday, 09 June, 14:00 - 14:45 | 
  | Presentation Time: | Wednesday, 09 June, 14:00 - 14:45 | 
  | Presentation | Poster | 
	 | Topic | Speech Processing: [SPE-ENHA] Speech Enhancement and Separation | 
  
	
    | IEEE Xplore Open Preview | Click here to view in IEEE Xplore | 
  
	
    | Virtual Presentation | Click here to watch in the Virtual Conference | 
  
  
    | Abstract | This paper proposes an approach for optimizing a Convolutional BeamFormer (CBF) that can jointly perform denoising (DN), dereverberation (DR), and source separation (SS). First, we develop a blind CBF optimization algorithm that requires no prior information on the sources or the room acoustics, by extending a conventional joint DR and SS method. For making the optimization computationally tractable, we incorporate two techniques into the approach: the Source-Wise Factorization (SW-Fact) of a CBF and the Independent Vector Extraction (IVE). To further improve the performance, we develop a method that integrates a neural network (NN) based source power spectra estimation with CBF optimization by an inverse-Gamma prior. Experiments using noisy reverberant mixtures reveal that our proposed method with both blind and NN-guided scenarios greatly outperforms the conventional state-of-the-art NN-supported mask-based CBF in terms of the improvement in automatic speech recognition and signal distortion reduction performance. |