IEEE ICASSP 2021 || Toronto, Ontario, Canada || 6-11 June 2021

My ICASSP 2021 Schedule

Note: Your custom schedule will not be saved unless you create a new account or login to an existing account.

Create a login based on your email (takes less than one minute)
Perform 'Paper Search'
Select papers that you desire to save in your personalized schedule
Click on 'My Schedule' to see the current list of selected papers
Click on 'Printable Version' to create a separate window suitable for printing (the header and menu will appear, but will not actually print)

Paper Detail

Paper ID

AUD-21.6

Paper Title

MULTIMODAL METRIC LEARNING FOR TAG-BASED MUSIC RETRIEVAL

Authors

Minz Won, Universitat Pompeu Fabra, Spain; Sergio Oramas, Oriol Nieto, Fabien Gouyon, Pandora, United States; Xavier Serra, Universitat Pompeu Fabra, Spain

Session

AUD-21: Music Information Retrieval and Music Language Processing 4: Structure and Alignment

Location

Gather.Town

Session Time:

Thursday, 10 June, 14:00 - 14:45

Presentation Time:

Thursday, 10 June, 14:00 - 14:45

Presentation

Poster

Topic

Audio and Acoustic Signal Processing: [AUD-MIR] Music Information Retrieval and Music Language Processing

IEEE Xplore Open Preview

Click here to view in IEEE Xplore

Abstract

Tag-based music retrieval is crucial to browse large-scale music libraries efficiently. Hence, automatic music tagging has been actively explored, mostly as a classification task, which has an inherent limitation: a fixed vocabulary. On the other hand, metric learning enables flexible vocabularies by using pretrained word embeddings as side information. Also, metric learning has proven its suitability for cross-modal retrieval tasks in other domains (e.g., text-to-image) by jointly learning a multimodal embedding space. In this paper, we investigate three ideas to successfully introduce multimodal metric learning for tag-based music retrieval: elaborate triplet sampling, acoustic and cultural music information, and domain-specific word embeddings. Our experimental results show that the proposed ideas enhance the retrieval system quantitatively and qualitatively. Furthermore, we release the MSD500: a subset of the Million Song Dataset (MSD) containing 500 cleaned tags, 7 manually annotated tag categories, and user taste profiles.

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

My ICASSP 2021 Schedule

Paper Detail