Distributional analysis of sampling-based RL algorithms
Prakash Panangaden
McGill University and Mila
SWS Distinguished Lecture Series
Prakash Panangaden is a Professor of Computer Science at McGill University.
His research interests are primarily in theoretical foundations of computer science
with a focus on stochastic systems, but ranges from black holes and curved space-time
to reinforcement learning. He has received numerous awards, including the
Test of Time award at LICS. He is a Fellow of the ACM.
AG 1, AG 2, AG 3, INET, AG 4, AG 5, SWS, RG1, MMCI
Distributional reinforcement learning (RL) is a new approach to RL with
the emphasis on the distribution of the rewards obtained rather than just
the expected reward as in traditional RL. In this work we take the
distributional point of view and analyse a number of sampling-based
algorithms such as value iteration, TD(0) and policy iteration. These
algorithms have been shown to converge under various assumptions but
usually with completely different proofs. We have developed a new
viewpoint which allows us to prove convergence using a uniform approach.
The idea is based on couplings and on viewing the approximation algorithms
as Markov processes in their own right. It originated from work on
bisimulation metrics in which I have been working for the last quarter
century. This is joint work with Philip Amortila (U. Illinois), Marc
Bellemare (Google Brain) and Doina Precup (McGill, Mila and DeepMind).