We study an extension of the standard two-party communication model in which Alice and Bob hold probability distributions and over domains and , respectively. Their goal is to estimate
to within additive error for a bounded function , known to both parties. We refer to this as the distributed estimation problem. Special cases of this problem arise in a variety of areas including sketching, databases and learning. Our goal is to understand how the required communication scales with the communication complexity of and the error parameter .
The random sampling approach — estimating the mean by averaging over random samples — requires total communication, where is the randomized communication complexity of . We design a new debiasing protocol which improves the dependence on to be linear instead of quadratic. Additionally we show better upper bounds for several special classes of functions, including the Equality and Greater-than functions. We introduce lower bound techniques based on spectral methods and discrepancy, and show the optimality of many of our protocols: the debiasing protocol is tight for general functions, and that our protocols for the equality and greater-than functions are also optimal. Furthermore, we show that among full-rank Boolean functions, Equality is essentially the easiest.
- † University of California, Los Angeles
- ‡ University of California, Berkeley
- § Institute for Advanced Study (IAS)