|| Speech Enhancement Using Masking for Binaural Reproduction of Ambisonics Signals
||Moti Lugasi, Boaz Rafaely, Ben-Gurion University of the Negev, Israel|
|Session||AUD-15: Modeling, Analysis and Synthesis of Acoustic Environments 1: Soundfield Acquisition and Reproduction|
|Session Time:||Wednesday, 09 June, 16:30 - 17:15|
|Presentation Time:||Wednesday, 09 June, 16:30 - 17:15|
|| Audio and Acoustic Signal Processing: [AUD-SARR] Spatial Audio Recording and Reproduction|
|| Click here to watch in the Virtual Conference
|| Speech enhancement in a single channel has been well studied in the literature in many applications. However, in emerging applications such as virtual reality, in addition to attenuating undesired signals, the ability to preserve the spatial information of the desired signal captured in a noisy environment is of great importance. Nevertheless, there are only a few studies in the literature that propose solutions to this challenge. Most of these studies present solutions that attenuate the undesired signals, while preserving only limited spatial information regarding the desired signal. Methods that preserve complete spatial information have only recently been suggested, and have not been studied comprehensively. In this paper, two such methods based on time-frequency masking are investigated with the aim of attenuating the undesired signal, while preserving the spatial components of the desired signal. The first is referred to as spatial masking and is based on masking in the plane wave density domain, and the second on masking in the spherical harmonics (SH) domain. The two methods are compared with a reference method, based on beamforming followed by single-channel time-frequency masking. Objective analysis and two listening tests were conducted in order to evaluate the performance of these methods for speech enhancement.