2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

Technical Program

Paper Detail

Paper IDSS-5.1
Paper Title EXPLORING VISUAL-AUDIO COMPOSITION ALIGNMENT NETWORK FOR QUALITY FASHION RETRIEVAL IN VIDEO
Authors Yanhao Zhang, Jianmin Wu, Xiong Xiong, Dangwei Li, Chenwei Xie, Yun Zheng, Pan Pan, Yinghui Xu, Alibaba group, China
SessionSS-5: Domain Adaptation for Multimedia Signal Processing
LocationGather.Town
Session Time:Wednesday, 09 June, 13:00 - 13:45
Presentation Time:Wednesday, 09 June, 13:00 - 13:45
Presentation Poster
Topic Special Sessions: Domain Adaptation for Multimedia Signal Processing
IEEE Xplore Open Preview  Click here to view in IEEE Xplore
Virtual Presentation  Click here to watch in the Virtual Conference
Abstract Fashion retrieval in video suffers from the issues of imperfect visual representation and low quality of search results under the E-commercial circumstance. Previous works generally focus on searching the identical images from visual perspective only, but lack of leveraging multi-modal information for high quality commodities. As a cross-domain problem, instructional or exhibiting audio reveals rich semantic information to facilite the video-to-shop task. In this paper, we present a novel Visual-Audio Composition Alignment Network (VACANet) to deal with quality fashion retrieval in video. Firstly, we introduce the visual-audio composition module in VACANet aiming to distinguish attentive and residual entities by learning semantic embedding from both visual and audio streams. Secondly, a quality alignment training scheme is then designed by quality-aware triplet mining and domain alignment constraint for video-to-image adaptation. Finally, extensive experiments conducted on challenging video datasets demonstrate the scalable effectiveness of our model in alleviating quality fashion retrieval.