| Paper ID | MMSP-8.3 | ||
| Paper Title | HIERARCHICAL SIMILARITY LEARNING FOR LANGUAGE-BASED PRODUCT IMAGE RETRIEVAL | ||
| Authors | Zhe Ma, Fenghao Liu, Zhejiang University, China; Jianfeng Dong, Zhejiang Gongshang University, China; Xiaoye Qu, Huazhong University of Science and Technology, China; Yuan He, Alibaba Group, China; Shouling Ji, Zhejiang University, China | ||
| Session | MMSP-8: Multimedia Retrieval and Signal Detection | ||
| Location | Gather.Town | ||
| Session Time: | Friday, 11 June, 13:00 - 13:45 | ||
| Presentation Time: | Friday, 11 June, 13:00 - 13:45 | ||
| Presentation | Poster | ||
| Topic | Multimedia Signal Processing: Multimedia Databases and File Systems | ||
| IEEE Xplore Open Preview | Click here to view in IEEE Xplore | ||
| Abstract | This paper aims for the language-based product image retrieval task. The majority of previous works have made significant progress by designing network structure, similarity measurement, and loss function. However, they typically perform vision-text matching at certain granularity regardless of the intrinsic multiple granularities of images. In this paper, we focus on the cross-modal similarity measurement, and propose a novel Hierarchical Similarity Learning (HSL) network. HSL first learns multi-level representations of input data by stacked encoders, and object-granularity similarity and image-granularity similarity are computed at each level. All the similarities are combined as the final hierarchical cross-modal similarity. Experiments on a large-scale product retrieval dataset demonstrate the effectiveness of our proposed method. Code and data are available at https://github.com/liufh1/hsl. | ||