2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

IEEE Signal Processing Society

Institute of Electrical and Electronics Engineers (IEEE)

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

Technical Program

Paper Detail

Paper ID	SPCOM-7.2
Paper Title	VGAI: END-TO-END LEARNING OF VISION-BASED DECENTRALIZED CONTROLLERS FOR ROBOT SWARMS
Authors	Ting-Kuei Hu, Texas A&M University, United States; Fernando Gama, University of Pennsylvania, United States; Tianlong Chen, Zhangyang Wang, University of Texas at Austin, United States; Alejandro Ribeiro, University of Pennsylvania, United States; Brian M. Sadler, US Army Research Laboratory, United States
Session	SPCOM-7: Communication-enabled Applications
Location	Gather.Town
Session Time:	Friday, 11 June, 13:00 - 13:45
Presentation Time:	Friday, 11 June, 13:00 - 13:45
Presentation	Poster
Topic	Signal Processing for Communications and Networking: [SPCN-DIST] Distributed, adaptive, and collaborative communication techniques
IEEE Xplore Open Preview	Click here to view in IEEE Xplore
Virtual Presentation	Click here to watch in the Virtual Conference
Abstract	Decentralized coordination of a robot swarm requires addressing the tension between local perceptions and actions, and the accomplishment of a global objective. In this work, we propose to learn decentralized controllers based on solely raw visual inputs. For the first time, that integrates the learning of two key components: communication and visual perception, in one end-to-end framework. More specifically, we consider that each robot has access to a visual perception of the immediate surroundings, and communication capabilities to transmit and receive messages from other neighboring robots. Our proposed learning framework combines a convolutional neural network (CNN) for each robot to extract messages from the visual inputs, and a graph neural network (GNN) over the entire swarm to transmit, receive and process these messages in order to decide on actions. The use of a GNN and locally-run CNNs results naturally in a decentralized controller. We jointly train the CNNs and the GNN so that each robot learns to extract messages from the images that are adequate for the team as a whole. Our experiments demonstrate the proposed architecture in the problem of drone flocking and show its promising performance and scalability, e.g., achieving successful decentralized flocking for large-sized swarms .