2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

Technical Program

Paper Detail

Paper IDIVMSP-5.3
Paper Title Real Image Super-Resolution using Token Based Contextual Attention
Authors Zhihong Pan, Baopu Li, Baidu USA, United States
SessionIVMSP-5: Super-resolution 1
LocationGather.Town
Session Time:Tuesday, 08 June, 16:30 - 17:15
Presentation Time:Tuesday, 08 June, 16:30 - 17:15
Presentation Poster
Topic Image, Video, and Multidimensional Signal Processing: [IVTEC] Image & Video Processing Techniques
IEEE Xplore Open Preview  Click here to view in IEEE Xplore
Virtual Presentation  Click here to watch in the Virtual Conference
Abstract Current state-of-the-art (SOTA) image super-resolution (SR) methods rely heavily on deep neural network (DNN), and many of them use attentions to regulate feature channels. While these models perform well on benchmark datasets where low-resolution (LR) images are constructed from high-resolution (HR) references with known blur kernel, real image SR is more challenging when the LR-HR pair are both collected from real cameras with complex blur kernel and noise statistics. Besides, current methods are trained in small image patches where channel attentions are calculated based on statistics of the full patch. This leads to impressive performance when the test image is small or chopped to small patches, but performs poorly when tested on full size real images. To alleviate these issues, we propose a new token based attention module with innovative contextual encoding to enable SR models to be robust to image patch sizes at testing. The dot-product attention between different tokens can efficiently describe the affinity relationship for different regions in an image. Together with the proximity relationship considered by contextual encoding, it leads to better global SR effects for full size images. Comprehensive experiments illustrate the superior performance of the proposed scheme.