Samuel Pepys, the famed English diarist, proposed the following problem to Isaac Newton, in order to gauge his chances in a wager:
Which experiment is more probable?
A. You roll 6 fair die and get at least 1 dice that shows up as “6″;
B. You roll 12 fair die and get at least 2 die that show up as “6″;
C. You roll 18 fair die and get at least 3 die that show up as “6″.
The correct answer turns out to be experiment A. Based upon the correspondence between Newton (who provided the correct solution) and Pepys, the wagerer was not immediately convinced of the answer. When challenged, Newton attempted to give an intuitive explanation of the answer, though it is technically incorrect/incomplete (Newton probably knew this). Here it is, paraphrased from the Ye Olde Englishe version:
Imagine that B and C toss their dice in groups of six. A is most favorable because it requires a 6 in only one group toss, while B and C require a 6 in each of their group tosses.
Astute readers will recognize that this intuitive explanation neglects situations where C might get two 6′s in the first or third set, for example.
This problem has been discussed often enough that it has been titled the “Newton-Pepys Problem“, and one is likely to find it given as an elementary probability exercise. I myself came across a more general version of Pepys problem: I win if, in 6n independent rolls of dice I acquire at least n 6′s. You win if, in 6(n + 1) independent rolls of dice you acquire at least n + 1 6′s. Who is more likely to win?
When n = 1 or 2, we know from Newton that I am more likely to win (yay!), but what about for all n? The answer is of course that I still am more likely to win, but how goes the proof? It is unclear whether Newton had a general solution, as his letters to Pepys indicate a purely numerical approach (he calculated the probabilities by hand for A, B and C). My guess is that he had an elegant solution that unfortunately could not fit in the margin of his letter.
And surprisingly, the solution to this general Pepys’s problem cannot fit within the confines of a Wikipedia entry or Wolfram Mathworld page either! The latter aludes to a general solution, which involves this horrifying “regularised hypergeometric function” – what the hell is that?! No, no, there has to be a better way.
Let be the random variable that counts the occurrences of 6′s in
independent dice rolls (fair die). Then we are comparing probabilities
vs.
. Thanks to independence, we can write
.
We can forget that the dice have six sides, and instead pretend they’re coins, where you get heads (6) with probability 1/6, and get tails (1,…, 5) with probability 5/6. Then the has a binomial distribution:
with . But treating
as a convolution of
and
, we can instead write:
By implementing the change of variables , we have
where . That is the first trick.
Note that , so in order to compare
vs.
we can compare
against
.
Assuming that (we’ll leave cases
to the reader),
for
. Here comes the second trick. Note that
, which follows from this identity. Thus, we know that in the
sum, the
are being weighted by numbers that sum up to 1, and so the average must be smaller than the largest probability, which is
. Thus,
, and so
. I win!