2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information
Login Paper Search My Schedule Paper Index Help

My ICASSP 2021 Schedule

Note: Your custom schedule will not be saved unless you create a new account or login to an existing account.
  1. Create a login based on your email (takes less than one minute)
  2. Perform 'Paper Search'
  3. Select papers that you desire to save in your personalized schedule
  4. Click on 'My Schedule' to see the current list of selected papers
  5. Click on 'Printable Version' to create a separate window suitable for printing (the header and menu will appear, but will not actually print)

Paper Detail

Paper IDMLSP-16.3
Paper Title CHANNEL-WISE MIX-FUSION DEEP NEURAL NETWORKS FOR ZERO-SHOT LEARNING
Authors Guowei Wang, Tianjin University, China; Naiyang Guan, National Innovation Institute of Defense Technology, China; Hanjia Ye, Nanjing University, China; Xiaodong Yi, Hang Cheng, Junjie Zhu, National Innovation Institute of Defense Technology, China
SessionMLSP-16: ML and Graphs
LocationGather.Town
Session Time:Wednesday, 09 June, 14:00 - 14:45
Presentation Time:Wednesday, 09 June, 14:00 - 14:45
Presentation Poster
Topic Machine Learning for Signal Processing: [MLR-TRL] Transfer learning
IEEE Xplore Open Preview  Click here to view in IEEE Xplore
Abstract Zero-shot learning (ZSL), with the assistance of the seen class image and additional semantic knowledge, generalizes its classification ability to the unseen class by aligning the visual-semantic space embeddings. Few previous methods have researched whether discriminative visual features are helpful to recognize different classes while neglecting the rich semantic information from the surrounding background. This paper proposes a channel-wise mix-fusion ZSL model (CMFZ) to contextualize the ZSL classifier’s discriminative information by incorporating much richer visual semantic information from both objects and their semantic surrounding environments. In particular, the channel-wise connection module (CCM) learns to construct the relationship between the object and its surroundings. A collaborative channel-wise activation module (CAM) is adopted to learn from a more delicate scale image attained from the cropping module. It highlights the most distinct channels representing the object’s discriminative regions to eliminate inadvertently introduced background noise. Furthermore, the representation ability of the learned mapping is enhanced by integrating the visual semantic features processed by CCM and CAM. Experimental results show that CMFZ outperforms the state-of-the-art ZSL methods and verifies the effectiveness of incorporating visual semantic information.