Prakash Panangaden

Reinforcement learning (RL) is a powerful and widely used technique to learn optimal policies. The setting is Markov Decision Processes which are probabilistic systems with which an agent can interact by performing actions. An action will cause a change in the state and also produce a numerical reward. The goal is to learn a policy–choice of action at each state–that produces optimal long-term average rewards. Many mathematical ideas: metric spaces, fixed-point theory and other topics lie at the heart of the subject. In this talk I will review the basic ideas of RL and give an overview of some of the relevant mathematics. I will end with a discussion of the role of symmetry in such systems and give a brief account of work on this topic by me and my collaborators which was first reported in NeurIPS 2022. The talk should be accessible to all IMA students. My collaborators on this work are: Sahand Rezaei-Shoshtari, Rosie Zhao, David Meger and Doina Precup.

**Bio**

Panangaden has been at McGill University since 1990, where for the past twenty-five years he has been working on various aspects of Markov processes: process equivalence, logical characterization, approximation and metrics. Recently he has worked on using metrics to enhance representation learning. Panangaden first studied physics at the Indian Institute of Technology in Kanpur. For his MSc in physics at the University of Chicago, he studied stimulated emission from black holes, and for his PhD in physics at the University of Wisconsinâ€“Milwaukee, he worked on quantum field theory in curved spacetime. He was formerly an assistant professor of computer science at Cornell University, where he primarily worked on the semantics of concurrent programming languages. A Fellow of the Royal Society of Canada and Fellow of the Association for Computing Machinery (ACM), Panangaden has published papers in physics, quantum information and pure mathematics.