2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

IEEE Signal Processing Society

Institute of Electrical and Electronics Engineers (IEEE)

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

Technical Program

Paper Detail

Paper ID	SPTM-3.5
Paper Title	Byzantine-Resilient Decentralized TD Learning with Linear Function Approximation
Authors	Zhaoxian Wu, Sun Yat-Sen University, China; Han Shen, Tianyi Chen, Rensselaer Polytechnic Institute, United States; Qing Ling, Sun Yat-Sen University, China
Session	SPTM-3: Estimation, Detection and Learning over Networks 1
Location	Gather.Town
Session Time:	Tuesday, 08 June, 14:00 - 14:45
Presentation Time:	Tuesday, 08 June, 14:00 - 14:45
Presentation	Poster
Topic	Signal Processing Theory and Methods: Signal Processing over Networks
IEEE Xplore Open Preview	Click here to view in IEEE Xplore
Virtual Presentation	Click here to watch in the Virtual Conference
Abstract	This paper considers the policy evaluation problem in reinforcement learning with agents of a decentralized and directed network. The focus is on decentralized temporal-difference (TD) learning with linear function approximation in the presence of unreliable or even malicious agents, termed as Byzantine agents. In order to evaluate the quality of a fixed policy in a common environment, agents usually run decentralized TD($\lambda$) collaboratively. However, when some Byzantine agents behave adversarially, decentralized TD($\lambda$) is unable to learn an accurate linear approximation for the true value function. We propose a trimmed-mean based decentralized TD($\lambda$) algorithm to perform policy evaluation in this setting. We establish the finite-time convergence rate, as well as the asymptotic learning error that depends on the number of Byzantine agents. Numerical experiments corroborate the robustness of the proposed algorithm.