We design new differentially private algorithms for the problems of adversarial bandits and bandits with expert advice. For adversarial bandits, we give a simple and efficient conversion of any non-private bandit algorithms to private bandit algorithms. Instantiating our conversion with existing non-private bandit algorithms gives a regret upper bound of , improving upon the existing upper bound in all privacy regimes. In particular, our algorithms allow for sublinear expected regret even when , establishing the first known separation between central and local differential privacy. For bandits with expert advice, we give the first differentially private algorithms, with expected regret , and , where and denote the number of actions and experts respectively. These rates allow us to get sublinear regret for different combinations of small and large , and .
- † University of Michigan
- ** Work done while at Apple