| Paper ID | SPTM-3.5 | 
    | Paper Title | Byzantine-Resilient Decentralized TD Learning with Linear Function Approximation | 
	| Authors | Zhaoxian Wu, Sun Yat-Sen University, China; Han Shen, Tianyi Chen, Rensselaer Polytechnic Institute, United States; Qing Ling, Sun Yat-Sen University, China | 
  | Session | SPTM-3: Estimation, Detection and Learning over Networks 1 | 
  | Location | Gather.Town | 
  | Session Time: | Tuesday, 08 June, 14:00 - 14:45 | 
  | Presentation Time: | Tuesday, 08 June, 14:00 - 14:45 | 
  | Presentation | Poster | 
	 | Topic | Signal Processing Theory and Methods: Signal Processing over Networks | 
  
	
    | IEEE Xplore Open Preview | Click here to view in IEEE Xplore | 
  
	
    | Virtual Presentation | Click here to watch in the Virtual Conference | 
  
  
    | Abstract | This paper considers the policy evaluation problem in reinforcement learning with agents of a decentralized and directed network. The focus is on decentralized temporal-difference (TD) learning with linear function approximation in the presence of unreliable or even malicious agents, termed as Byzantine agents. In order to evaluate the quality of a fixed policy in a common environment, agents usually run decentralized TD($\lambda$) collaboratively. However, when some Byzantine agents behave adversarially, decentralized TD($\lambda$) is unable to learn an accurate linear approximation for the true value function. We propose a trimmed-mean based decentralized TD($\lambda$) algorithm to perform policy evaluation in this setting. We establish the finite-time convergence rate, as well as the asymptotic learning error that depends on the number of Byzantine agents. Numerical experiments corroborate the robustness of the proposed algorithm. |