2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information
Login Paper Search My Schedule Paper Index Help

My ICASSP 2021 Schedule

Note: Your custom schedule will not be saved unless you create a new account or login to an existing account.
  1. Create a login based on your email (takes less than one minute)
  2. Perform 'Paper Search'
  3. Select papers that you desire to save in your personalized schedule
  4. Click on 'My Schedule' to see the current list of selected papers
  5. Click on 'Printable Version' to create a separate window suitable for printing (the header and menu will appear, but will not actually print)

Paper Detail

Paper IDIVMSP-29.4
Paper Title AGGREGATION ARCHITECTURE AND ALL-TO-ONE NETWORK FOR REAL-TIME SEMANTIC SEGMENTATION
Authors Kuntao Cao, Xi Huang, Jie Shao, University of Electronic Science and Technology of China, China
SessionIVMSP-29: Semantic Segmentation
LocationGather.Town
Session Time:Friday, 11 June, 13:00 - 13:45
Presentation Time:Friday, 11 June, 13:00 - 13:45
Presentation Poster
Topic Image, Video, and Multidimensional Signal Processing: [IVARS] Image & Video Analysis, Synthesis, and Retrieval
IEEE Xplore Open Preview  Click here to view in IEEE Xplore
Abstract Deep convolutional neural network has demonstrated its outstanding performance in the field of image semantic segmentation. However, the enormous computational complexity of existing high-precision networks limits the application of the model in real-time segmentation tasks. How to achieve a good trade-off between accuracy and speed becomes a challenge. Existing solutions can be roughly divided into three categories according to the network architecture: dilation, encoder-decoder, and multi-pathway, each of which has its advantages. In this paper, we make the following contributions: (i) First, unlike the previous three architectures, we propose a new aggregation architecture as the network backbone. (ii) Second, a multi-level auxiliary loss design model is used for the training phase, which can improve the model segmentation effect. (iii) According to this aggregation structure, an all-to-one network (ATONet) for real-time semantic segmentation is proposed, which achieves a good trade-off between speed and accuracy by assembling the features of all blocks. (iv) Finally, the proposed network achieves the accuracy of 74.4% and 70.1% mIoU with the inference speed of 42.7 FPS and 93.5 FPS on the Cityscapes and CamVid datasets.