Controversial probability calculation regarding Thai lotto incident

629 Views Asked by At

At 1st September 2020, the number "999997" was picked for the first prize in Thailand's government lotto. The consecutive repeating of the number "9" caused extensive controversial discussion whether the lotto machine was working properly or not, some even claim that this incident proved that the government was cheating.

Note for the lotto drawing method. A six-digit number will be randomly picked from the set of 000000, ..., 999999 for the 1st prize by using 6 staffs each draw a number 0 - 9 from their corresponding machines.

To simplify the problem, I will consider the 1st prize number "999999" instead of "999997" in this question.

Commonly, most people know that every number has equal probability of $1/1000000$. Let me define the mathematical statement for this.

Statement 1: Randomly drawing a number $n$ from the set of six-digit numbers $000000, ..., 999999$, the probability of $n$ being any specific number in the set is $1/1000000$

Now, the problem arises when someone proposes the following statement.

Statement 2: Let $A$ be a set {000000, 111111, 222222, ..., 999999}, The probability of $n$ being a member of $A$ is $10/1000000$.

On one side, people use Statement 1 to explain that the number "999999" being drawn is as usual as any familiar number such as "326648", "863439", ...

On the other side, people use Statement 2 to claim that the number "999999" being drawn is "unusual" as it has only $10/1000000$ probability to draw this kind of number.

I got some feeling that latter claim using Statement 2 has something wrong because if I let the set $A$ being a set of my any desired 10 numbers such as {123456, 443253, 857342, ...}, I could claim that any number is unusual. But I cannot explain it clearly enough to convince the people who believe this claim.

Please help me see if there is some mathematical explanation behind this conflict, which can explain why the claim using Statement 2 is invalid and why people find it difficult to figure it out spontaneously.

2

There are 2 best solutions below

0
On BEST ANSWER

I have been thinking hard about this problem, and I finally found some explanation that I think very make sense.

But first, let me re-clarify my problem.

Problem re-clarification: The claim using Statement 1 and Statement 2 both seem correct but conflict each other, therefore one of them must be incorrect actually. I would like to have a mathematical explanation which one is wrong, how it is wrong and why many people find it difficult to know it is wrong.

In order to solve this problem, I will define here the generalized form of this lottery game and call it a "lotto-like game"

Let $A$ be a set of six-digit numbers from 000000 to 999999. A lotto-like game can be played as follows

  1. Define independent subsets of $A$, for example $A_0 := \{000000,...,099999\}$, $A_1 := \{100000,...,199999\}$, $A_n := \{n00000,...,n99999\}$. Note that the subsets can have different number of elements depending on how we define them.
  2. Let the players choose one of such subset.
  3. The dealer draw a number from set $A$.
  4. The players who have chosen the subset in which the number is a member win the game.
  5. Note that the probability for the number being a member of any subset $A_n$ is $\frac{|A_n|}{|A|}$, which clarify that the different definition of $A_n$ affects the probability, and therefore the sense of "usuality".

With this definition, we will have to explicitly define all the subsets before proceeding the game. And the players will have to choose a "subset" instead of a "number".

Please note that the normal lotto game is a special case of this lotto-like game, which define all the subsets to have only one element $A_n := \{n\}$ for each $n \in A$.

You may think it is overkill to define such lengthy rules using set notation for the simple game everyone know about. But I do this to point out that people take the simplicity for granted and are unaware that they all "always" do the step 1. before proceeding the game and the probability calculation. They define the subsets without knowing in their own ways and they misunderstand that they are talking about "the same game" even though their subsets definition are different!!

Please consider following real world discussion between I and my friend, John, to see what I mean to say that people "always" define the subsets in different ways without knowing.

I: Hey John, why you said that the first prize number "999997" is unusual. Every number has the same probability.

John: No, it isn't. Imagine when the number "9" is drawn for the first digit, then for the second digit to be "9", it has probability only $\frac{1}{10}$ compared to the other numbers which have $\frac{9}{10}$, and so on for the rest digits. Therefore, it is very unusual to get the repeating number such as "999997".

In this example, John did not know he was defining the subset $A_1 := \{999990,999991,...,999999\}$ and $A_2$ the rest, and that he was playing a completely different "lotto-like game" from me.

To summarize, here is what my answer for the questions

  1. Which claim is wrong?
    Answer: The claim using Statement 2, which claim that the first prize number "999997" is unusual, is wrong.

  2. How it is wrong?
    Answer: The claimer defines the subsets of the game in different way from the government lotto game, therefore they are different games which have unequal probability calculation.

  3. Why many people find it difficult to know it is wrong?
    Answer: Because people don't know that they always define the subsets for the game in different way without knowing. Or they know they are defining the subsets but are not aware that the different definition of subsets make it a completely different game and cause the difference in the sense of "usuality".

0
On

As said in comments, since there isn't anything considered "unusual," it's hard to define. Your logic is completely right. If it was $345678$ for example, there would be a similar story.

We think that a number like $999999$ would come up very rarely, but it comes up just as much as any other number, as you said. But for your statement 2, it goes with any other set $A$, for example $0000001, 111112, \dots, 999990$. But your statement is completely correct.