Nian Si, Fan Zhang, Zhengyuan Zhou, Jose Blanchet (2023) Distributionally Robust Batch Contextual Bandits. Management Science 69(10):5772-5793. https://doi.org/10.1287/mnsc.2023.4678
Abstract
Policy learning using historical observational data are an important problem that has widespread applications. Examples include selecting offers, prices, or advertisements for consumers; choosing bids in contextual first-price auctions; and selecting medication based on patients’ characteristics. However, existing literature rests on the crucial assumption that the future environment where the learned policy will be deployed is the same as the past environment that has generated the data: an assumption that is often false or too coarse an approximation. In this paper, we lift this assumption and aim to learn a distributionally robust policy with incomplete observational data. We first present a policy evaluation procedure that allows us to assess how well the policy does under worst-case environment shift. We then establish a central limit theorem type guarantee for this proposed policy evaluation scheme. Leveraging this …
Authors
Nian Si, Fan Zhang, Zhengyuan Zhou, Jose Blanchet
Publication date
2023/10
Journal
Management Science
Volume
69
Issue
10
Pages
5772-5793
Publisher
INFORMS