2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

Technical Program

Paper Detail

Paper IDIVMSP-1.5
Paper Title SALIENCY-DRIVEN VERSATILE VIDEO CODING FOR NEURAL OBJECT DETECTION
Authors Kristian Fischer, Felix Fleckenstein, Christian Herglotz, André Kaup, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Germany
SessionIVMSP-1: Object Detection 1
LocationGather.Town
Session Time:Tuesday, 08 June, 13:00 - 13:45
Presentation Time:Tuesday, 08 June, 13:00 - 13:45
Presentation Poster
Topic Image, Video, and Multidimensional Signal Processing: [IVCOM] Image & Video Communications
IEEE Xplore Open Preview  Click here to view in IEEE Xplore
Virtual Presentation  Click here to watch in the Virtual Conference
Abstract Saliency-driven image and video coding for humans has gained importance in the recent past. In this paper, we propose such a saliency-driven coding framework for the video coding for machines task using the latest video coding standard Versatile Video Coding (VVC). In order to determine the salient regions before encoding, we employ the real-time-capable object detection network You Only Look Once (YOLO) in combination with a novel decision criterion. To measure the coding quality for a machine, the state-of-the art object segmentation network Mask R-CNN was applied to the decoded frame. From extensive simulations we find that, compared to the reference VVC with a constant quality, up to 29% of bitrate can be saved with the same detection accuracy at the decoder side by applying the proposed saliency-driven framework. Besides, we compare YOLO against other, more traditional saliency detection methods.