On Jacob Bernoulli, the Law of Large Numbers, and the Origins of the Central Limit Theorem | by Sachin Date

In spite of the WLLN’s importance to the CLT, the path from the WLLN to the CLT is full of tough, thorny, difficult brambles that took Bernoulli’s successors several decades to hack through. Look once again at the equation at the heart of Bernoulli’s theorem:

Bernoulli chose to frame his investigation within a Binomial setting. The ticket-filled urn is the sample space for what is clearly a binomial experiment, and the count X_bar_n of black tickets in the sample is Binomial(n, p). If the real fraction p of black tickets in the urn is known, then E(X_bar_n) is the expected value of a Binomial(n, p) random variable which is np. With E(X_bar_n) known, the probability distribution P(X_bar_n|p,n) is fully specified. Then it’s theoretically possible to crank out probabilities such as P(np — δ ≤ X_bar_n ≤ np + δ) as follows:

P(np-δ ≤ X_bar_n ≤ np+δ) where X_bar_n ~ Binomial(n,p) (Image by Author)

I suppose P(np — δ ≤ X_bar_n ≤ np + δ) is a useful probability to calculate. But you can only calculate it if you know the true ratio p. And who will ever know the true p? Bernoulli with his Calvinist leanings, and Abraham De Moivre whom we’ll meet in my next article and who was to continue Bernoulli’s research seemed to believe that a divine being might know the true ratio. In their writings, both made clear references to Fatalism and ORIGINAL DESIGN. Bernoulli brought up Fatalism in the final para of Ars Conjectandi. De Moivre mentioned ORIGINAL DESIGN (in capitals!) in his book on probability, The Doctrine of Chances. Neither man made secret his suspicion that a Creator’s intention was the reason we have a law such as the Law of Large Numbers.

But none of this theology helps you or me. Almost never will you know the true value of pretty much any property of any non-trivial system in any part of the universe. And if by an unusually freaky stroke of good fortune you were to stumble upon the true value of some parameter then case closed, right? Why waste your time drawing random samples to estimate what you already know when you have God’s eye view of the data? To paraphrase another famous scientist, God has no use for statistical inference.

On the other hand, down here on Earth, all you have is a random sample, and its mean or sum X_bar_n, and its variance S. Using them, you’ll want to draw inferences about the population. For example, you’ll want to build a (1 — α)100% confidence interval around the unknown population mean μ. Thus, it turns out you don’t have as much use for the probability:

P(np — δ ≤ X_bar_n ≤ np + δ)

as you do for the confidence interval for the unknown mean, namely:

P(X_bar_n — δ ≤ np ≤ X_bar_n+δ).

Notice how subtle but crucial is the difference between the two probabilities.

The probability P(X_bar_n — δ ≤ np ≤ X_bar_n+δ) can be expressed as a difference of two cumulative probabilities:

To estimate the two cumulative probabilities, you’ll need a way to estimate the probability P(p|X_bar_n,n) which is the exact inverse of the binomial probability P(X_bar_n|n,p) that Bernoulli worked with. And by the way, since the ratio p is a real number, P(p|X_bar_n,n) is the Probability Density Function (PDF) of p conditioned upon the observed sample mean X_bar_n. Here you are asking the question:

Given the observed ratio X_bar_n/n, what is the probability density function of the unknown true ratio p?

P(p|n,X_bar_n) is called inverse probability (density). Incidentally, the path to the Central Theorem’s discovery runs straight through a mechanism to compute this inverse probability — a mechanism that an English Presbyterian minister named Thomas Bayes (of the Bayes Theorem fame), and the Isaac Newton of France Pierre-Simon Laplace were to independently discover in the late 1700s to early 1800s using two strikingly different approaches.

Returning to Jacob Bernoulli’s thought experiment, the way to understand inverse probability is to look at the true fraction of black tickets p as the cause that is ‘causing’ the effect of observing X_bar_n/n fraction of black tickets in a random sample of size n. For each observed value of X_bar_n, there are an infinite number of possible values for p. With each value of p is associated a probability density that can be read off from the inverse probability distribution function P(p|X_bar_n,n). If you know this inverse PDF, you can calculate the probability that p will lie within some specified interval [p_low, p_high], i.e. P(p_low ≤ p ≤ p_high) given the observed X_bar_n.

Unfortunately, Jacob Bernoulli’s theorem isn’t expressed in terms of inverse PDF P(p|n,X_bar_n). Instead, it’s expressed in terms of its exact complement i.e. P(X_bar_n|n,p) which requires you to know the true ratio p.

Having come as far as stating and proving the WLLN in terms of the ‘forward’ probability P(X_bar_n|n,p), you’d think Jacob Bernoulli would take the natural next step to invert the statement of his theorem and show how to calculate the inverse PDF P(p|n,X_bar_n).

But Bernoulli did no such thing, choosing instead to mysteriously bring the whole of Ars Conjectandi to a sudden, unexpected close with a rueful sounding paragraph on Fatalism.

“…if eventually the observations of all should be continued through all eternity (from probability turning to perfect certainty), everything in the world would be determined to happen by certain reasons and by the law of changes. And so even in the most casual and fortuitous things we are obliged to acknowledge a certain necessity, and if I may say so, fate,…”

The final page of Pars Quarta (Part IV) of *Ars Conjectandi* (Public domain)

PARS QUARTA of Ars Conjectandi was to disappoint (but also inspire) future generations of scientists in yet another way.

Look at the summations on the R.H.S. of the following equation:

They contain big, bulky factorials that are all but impossible to crank out for large n. Unfortunately, everything about Bernoulli’s theorem is about large n. And the calculation must become especially tedious if you are doing it in the year 1689 under the unsteady, dancing glow of grease lamps and using nothing more than paper and pen. In Part 4, Bernoulli did a few of these calculations particularly to calculate the minimum sample sizes required to achieve different degrees of accuracy. But he left the matter there.

The final two pages of Ars Conjectandi illustrating Jacob Bernoulli’s estimation of minimum sample sizes (25550, 31258, 36966 etc.) needed to achieve specified degrees of accuracy (1/1000, 1/10000, 1/100000) around the sample mean, **assuming a known population mean** (Public domain)

Neither did Bernoulli show how to approximate the factorial (a technique that was to be discovered four decades later by Abraham De Moivre and James Stirling (in that order), nor did he make the crucial, conceptual leap of showing how to attack the problem of inverse probability.