# Anytime exponential concentration of contractive stochastic approximation: Additive and multiplicative noise

Dia | 2022-09-02 10:30:00-03:00 |

Hora | 2022-09-02 10:30:00-03:00 |

Lugar | zoom |

### Anytime exponential concentration of contractive stochastic approximation: Additive and multiplicative noise

#### Martín Zubeldía (ISyE, University of Minnesota, EEUU)

In this talk, we study stochastic approximation (SA) algorithms under a contractive operator with respect to an arbitrary norm. We consider two settings where the iterates are potentially unbounded: additive sub-Gaussian noise, and bounded multiplicative noise. We obtain concentration bounds on the convergence errors, and show that these errors have sub-Gaussian tails. Moreover, our bounds hold anytime in the sense that the entire sample path lies within a tube of decaying radius with high probability. To establish these results, we first bound the Moment Generating Function of the generalized Moreau envelope of the error, which serves as a Lyapunov function. Then, we construct an exponential supermartingale and use Ville's maximal inequality to obtain anytime exponential concentration bounds. To overcome the challenge of having multiplicative noise, we develop a bootstrapping argument to iteratively improve an initially loose concentration bound and obtain a much tighter one.

Our results enable us to provide anytime high probability bounds for a large class of reinforcement learning algorithms. Since a special case of contractive SA with multiplicative noise is linear SA with bounded, Hurwitz in expectation, but not almost surely Hurwitz matrices, we establish high probability bounds of various TD-learning algorithms (such as on-policy TD with linear function approximation, and off-policy TD) in one shot. To the best of our knowledge, exponential concentration bounds of off-policy TD-learning have not been established in the literature due to the challenge of handling such multiplicative noise. Moreover, we also provide anytime high probability bounds for the popular Q-learning algorithm.