2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

IEEE Signal Processing Society

Institute of Electrical and Electronics Engineers (IEEE)

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

Technical Program

Paper Detail

Paper ID	IVMSP-5.3
Paper Title	Real Image Super-Resolution using Token Based Contextual Attention
Authors	Zhihong Pan, Baopu Li, Baidu USA, United States
Session	IVMSP-5: Super-resolution 1
Location	Gather.Town
Session Time:	Tuesday, 08 June, 16:30 - 17:15
Presentation Time:	Tuesday, 08 June, 16:30 - 17:15
Presentation	Poster
Topic	Image, Video, and Multidimensional Signal Processing: [IVTEC] Image & Video Processing Techniques
IEEE Xplore Open Preview	Click here to view in IEEE Xplore
Virtual Presentation	Click here to watch in the Virtual Conference
Abstract	Current state-of-the-art (SOTA) image super-resolution (SR) methods rely heavily on deep neural network (DNN), and many of them use attentions to regulate feature channels. While these models perform well on benchmark datasets where low-resolution (LR) images are constructed from high-resolution (HR) references with known blur kernel, real image SR is more challenging when the LR-HR pair are both collected from real cameras with complex blur kernel and noise statistics. Besides, current methods are trained in small image patches where channel attentions are calculated based on statistics of the full patch. This leads to impressive performance when the test image is small or chopped to small patches, but performs poorly when tested on full size real images. To alleviate these issues, we propose a new token based attention module with innovative contextual encoding to enable SR models to be robust to image patch sizes at testing. The dot-product attention between different tokens can efficiently describe the affinity relationship for different regions in an image. Together with the proximity relationship considered by contextual encoding, it leads to better global SR effects for full size images. Comprehensive experiments illustrate the superior performance of the proposed scheme.