| Paper ID | SPCOM-9.4 |
| Paper Title |
ON INFORMATION ASYMMETRY IN ONLINE REINFORCEMENT LEARNING |
| Authors |
Ezra Tampubolon, Haris Ceribasic, Holger Boche, Technical University of Munich, Germany |
| Session | SPCOM-9: Online and Active Learning for Communications |
| Location | Gather.Town |
| Session Time: | Friday, 11 June, 14:00 - 14:45 |
| Presentation Time: | Friday, 11 June, 14:00 - 14:45 |
| Presentation |
Poster
|
| Topic |
Signal Processing for Communications and Networking: [SPCN-NETW] Networks and Network Resource allocation |
| IEEE Xplore Open Preview |
Click here to view in IEEE Xplore |
| Virtual Presentation |
Click here to watch in the Virtual Conference |
| Abstract |
In this work, we study the system of two interacting non-cooperative Q-learning agents, where one agent has the privilege of observing the other's actions. We show that this information asymmetry can lead to a stable outcome of population learning, which does not occur in an environment of general independent learners. Furthermore, we discuss the resulted post-learning policies, show that they are almost optimal in the underlying game sense, and provide numerical hints of almost welfare-optimal of the resulted policies. |