Wang, S., Si, N., Blanchet, J. & Zhou, Z.. (2023). A Finite Sample Complexity Bound for Distributionally Robust Q-learning. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:3370-3398 Available from https://proceedings.mlr.press/v206/wang23b.html.

View Publication

Abstract

We consider a reinforcement learning setting in which the deployment environment is different from the training environment. Applying a robust Markov decision processes formulation, we extend the distributionally robust Q-learning framework studied in [Liu et. al. 2022]. Further, we improve the design and analysis of their multi-level Monte Carlo estimator. Assuming access to a simulator, we prove that the worst-case expected sample complexity of our algorithm to learn the optimal robust Q-function within an error in the sup norm is upper bounded by , where is the discount rate, is the non-zero minimal support probability of the transition kernels and is the uncertainty size. This is the first sample complexity result for the model-free robust RL problem. Simulation studies further validate our theoretical results.

Authors
Shengbo Wang, Nian Si, Jose Blanchet, Zhengyuan Zhou
Publication date
2023/4/11
Conference
International Conference on Artificial Intelligence and Statistics
Pages
3370-3398
Publisher
PMLR