Online RL for infinite state space problems
| Dia | 2026-03-27 10:30:00-03:00 |
| Hora | 2026-03-27 10:30:00-03:00 |
| Lugar | FCEA: Salón 1 del EIP (entrada por Lauro Müller) |
Online RL for infinite state space problems
Vittorio Puricelli (LAAS-CNRS, Francia)
For infinite space problems, classical reinforcement learning (RL) algorithms can fail to converge due to unstable behavior. In this talk, we present a path toward addressing this problem and obtaining converge guarantees.
We will start by introducing the fundamental concepts from Markov decision process theory, focusing on the expected discounted cost problem. We then move on to online RL methods, with a particular emphasis on the Q-learning algorithm.
We comment on the convergence of Q-learning for finite state space problems. We argue then that the convergence can be extended for infinite state spaces under certain stability assumptions.
Next, we present a recent result where we show that Q-learning by itself can fail to promote stability, and that an stabilizing scheme is needed to ensure convergence. We will outline the key ideas of the proof of this last result and discuss connections to self-interacting random walk models.
We will end by discussing work in progress in which we aim to develop an online stabilizing scheme that guarantees convergence of Q-learning.
We will start by introducing the fundamental concepts from Markov decision process theory, focusing on the expected discounted cost problem. We then move on to online RL methods, with a particular emphasis on the Q-learning algorithm.
We comment on the convergence of Q-learning for finite state space problems. We argue then that the convergence can be extended for infinite state spaces under certain stability assumptions.
Next, we present a recent result where we show that Q-learning by itself can fail to promote stability, and that an stabilizing scheme is needed to ensure convergence. We will outline the key ideas of the proof of this last result and discuss connections to self-interacting random walk models.
We will end by discussing work in progress in which we aim to develop an online stabilizing scheme that guarantees convergence of Q-learning.
