2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

Technical Program

Paper Detail

Paper IDMLSP-1.4
Paper Title ROBUSTNESS AND DIVERSITY SEEKING DATA-FREE KNOWLEDGE DISTILLATION
Authors Pengchao Han, Northeastern University, China; Jihong Park, Deakin University, Australia; Shiqiang Wang, IBM T. J. Watson Research Center, United States; Yejun Liu, Chongqing University of Posts and Telecommunications, China
SessionMLSP-1: Deep Learning Training Methods 1
LocationGather.Town
Session Time:Tuesday, 08 June, 13:00 - 13:45
Presentation Time:Tuesday, 08 June, 13:00 - 13:45
Presentation Poster
Topic Machine Learning for Signal Processing: [MLR-DEEP] Deep learning techniques
IEEE Xplore Open Preview  Click here to view in IEEE Xplore
Virtual Presentation  Click here to watch in the Virtual Conference
Abstract Knowledge distillation (KD) has enabled remarkable progress in model compression and knowledge transfer. However, KD requires a large volume of original data or their representation statistics that are not usually available in practice. Data-free KD has recently been proposed to resolve this problem, wherein teacher and student models are fed by a synthetic sample generator trained from the teacher. Nonetheless, existing data-free KD methods rely on fine-tuning of weights to balance multiple losses, and ignore the diversity of generated samples, resulting in limited accuracy and robustness. To overcome this challenge, we propose robustness and diversity seeking data-free KD (RDSKD) in this paper. The generator loss function is crafted to produce samples with high authenticity, class diversity, and inter-sample diversity. Without real data, the objectives of seeking high sample authenticity and class diversity often conflict with each other, causing frequent loss fluctuations. We mitigate this by exponentially penalizing loss increments. With MNIST, CIFAR-10, and SVHN datasets, our experiments show that RDSKD achieves higher accuracy with more robustness over different hyper-parameter settings, compared to other data-free KD methods such as DAFL, MSKD, ZSKD, and DeepInversion.