SPE-36: Speech Enhancement 6: Multi-modal Processing |
| Session Type: Poster |
| Time: Thursday, 10 June, 14:00 - 14:45 |
| Location: Gather.Town |
| Virtual Session: View on Virtual Platform |
| Session Chair: Chandan K A Reddy, Microsoft |
| SPE-36.1: AUDIO-VISUAL SPEECH INPAINTING WITH DEEP LEARNING |
| Giovanni Morrone; University of Modena and Reggio Emilia |
| Daniel Michelsanti; Aalborg University |
| Zheng-Hua Tan; Aalborg University |
| Jesper Jensen; Aalborg University |
| SPE-36.2: VSET: A MULTIMODAL TRANSFORMER FOR VISUAL SPEECH ENHANCEMENT |
| Karthik Ramesh; Huawei |
| Chao Xing; Huawei |
| Wupeng Wang; Huawei |
| Dong Wang; Tsinghua University |
| Xiao Chen; Huawei |
| SPE-36.3: SWITCHING VARIATIONAL AUTO-ENCODERS FOR NOISE-AGNOSTIC AUDIO-VISUAL SPEECH ENHANCEMENT |
| Mostafa Sadeghi; Inria, Grenoble Alpes |
| Xavier Alameda-Pineda; Inria, Grenoble Alpes |
| SPE-36.4: AUDIO-VISUAL SPEECH ENHANCEMENT METHOD CONDITIONED ON THE LIP MOTION AND SPEAKER-DISCRIMINATIVE EMBEDDINGS |
| Koichiro Ito; Hitachi, Ltd. |
| Masaaki Yamamoto; Hitachi, Ltd. |
| Kenji Nagamatsu; Hitachi, Ltd. |
| SPE-36.5: AUDIO-VISUAL SPEECH SEPARATION USING CROSS-MODAL CORRESPONDENCE LOSS |
| Naoki Makishima; NTT Media Intelligence Laboratories, NTT Corporation |
| Mana Ihori; NTT Media Intelligence Laboratories, NTT Corporation |
| Akihiko Takashima; NTT Media Intelligence Laboratories, NTT Corporation |
| Tomohiro Tanaka; NTT Media Intelligence Laboratories, NTT Corporation |
| Shota Orihashi; NTT Media Intelligence Laboratories, NTT Corporation |
| Ryo Masumura; NTT Media Intelligence Laboratories, NTT Corporation |
| SPE-36.6: MUSE: MULTI-MODAL TARGET SPEAKER EXTRACTION WITH VISUAL CUES |
| Zexu Pan; National University of Singapore |
| Ruijie Tao; National University of Singapore |
| Chenglin Xu; National University of Singapore |
| Haizhou Li; National University of Singapore |