| Paper ID | AUD-19.3 | 
  
    | Paper Title | 
     MAXIMUM A POSTERIORI ESTIMATOR FOR CONVOLUTIVE SOUND SOURCE SEPARATION WITH SUB-SOURCE BASED NTF MODEL AND THE LOCALIZATION PROBABILISTIC PRIOR ON THE MIXING MATRIX | 
  
	| Authors | 
    Mieszko Fraś, Konrad Kowalczyk, AGH University of Science and Technology, Poland | 
  | Session | AUD-19: Audio and Speech Source Separation 6: Topics in Source Separation | 
  | Location | Gather.Town | 
  | Session Time: | Thursday, 10 June, 13:00 - 13:45 | 
  | Presentation Time: | Thursday, 10 June, 13:00 - 13:45 | 
  | Presentation | 
     Poster
     | 
	 | Topic | 
     Audio and Acoustic Signal Processing: [AUD-SEP] Audio and Speech Source Separation | 
  
	
    | IEEE Xplore Open Preview | 
     Click here to view in IEEE Xplore | 
  
  
	
    | Virtual Presentation | 
     Click here to watch in the Virtual Conference | 
  
  
  
    | Abstract | 
     In this paper we present a method for the separation of sound source signals recorded using multiple microphones in a reverberant room. In particular, we propose a maximum a posteriori (MAP) estimator based on the multichannel nonnegative tensor factorization (NTF) model with the localization prior distribution on the mixing matrix, in which the latent data consists of the so-called sub-sources for an improved performance in a reverberant environment. For the proposed MAP estimator, we derive the sub-source based expectation maximization (EM) algorithm with the multiplicative update rules (MU) and the localization prior distribution (LP) on the mixing matrix (SSEM-MU-LP). We then perform several experiments for speech and instrumental sound sources recorded using two microphones, in determined and under-determined scenarios, and with different types of initialization of the model parameters. The results of these experiments clearly indicate a significant improvement of the proposed algorithm with the localization prior over the state-of-the-art NTF-based source separation algorithms, which can reach up to $50\%$ in the signal-to-distortion ratio. |