At 1st September 2020, the number "999997" was picked for the first prize in Thailand's government lotto. The consecutive repeating of the number "9" caused extensive controversial discussion whether the lotto machine was working properly or not, some even claim that this incident proved that the government was cheating.
Note for the lotto drawing method. A six-digit number will be randomly picked from the set of 000000, ..., 999999 for the 1st prize by using 6 staffs each draw a number 0 - 9 from their corresponding machines.
To simplify the problem, I will consider the 1st prize number "999999" instead of "999997" in this question.
Commonly, most people know that every number has equal probability of $1/1000000$. Let me define the mathematical statement for this.
Statement 1: Randomly drawing a number $n$ from the set of six-digit numbers $000000, ..., 999999$, the probability of $n$ being any specific number in the set is $1/1000000$
Now, the problem arises when someone proposes the following statement.
Statement 2: Let $A$ be a set {000000, 111111, 222222, ..., 999999}, The probability of $n$ being a member of $A$ is $10/1000000$.
On one side, people use Statement 1 to explain that the number "999999" being drawn is as usual as any familiar number such as "326648", "863439", ...
On the other side, people use Statement 2 to claim that the number "999999" being drawn is "unusual" as it has only $10/1000000$ probability to draw this kind of number.
I got some feeling that latter claim using Statement 2 has something wrong because if I let the set $A$ being a set of my any desired 10 numbers such as {123456, 443253, 857342, ...}, I could claim that any number is unusual. But I cannot explain it clearly enough to convince the people who believe this claim.
Please help me see if there is some mathematical explanation behind this conflict, which can explain why the claim using Statement 2 is invalid and why people find it difficult to figure it out spontaneously.
I have been thinking hard about this problem, and I finally found some explanation that I think very make sense.
But first, let me re-clarify my problem.
In order to solve this problem, I will define here the generalized form of this lottery game and call it a "lotto-like game"
With this definition, we will have to explicitly define all the subsets before proceeding the game. And the players will have to choose a "subset" instead of a "number".
Please note that the normal lotto game is a special case of this lotto-like game, which define all the subsets to have only one element $A_n := \{n\}$ for each $n \in A$.
You may think it is overkill to define such lengthy rules using set notation for the simple game everyone know about. But I do this to point out that people take the simplicity for granted and are unaware that they all "always" do the step 1. before proceeding the game and the probability calculation. They define the subsets without knowing in their own ways and they misunderstand that they are talking about "the same game" even though their subsets definition are different!!
Please consider following real world discussion between I and my friend, John, to see what I mean to say that people "always" define the subsets in different ways without knowing.
I: Hey John, why you said that the first prize number "999997" is unusual. Every number has the same probability.
John: No, it isn't. Imagine when the number "9" is drawn for the first digit, then for the second digit to be "9", it has probability only $\frac{1}{10}$ compared to the other numbers which have $\frac{9}{10}$, and so on for the rest digits. Therefore, it is very unusual to get the repeating number such as "999997".
In this example, John did not know he was defining the subset $A_1 := \{999990,999991,...,999999\}$ and $A_2$ the rest, and that he was playing a completely different "lotto-like game" from me.
To summarize, here is what my answer for the questions
Which claim is wrong?
Answer: The claim using Statement 2, which claim that the first prize number "999997" is unusual, is wrong.
How it is wrong?
Answer: The claimer defines the subsets of the game in different way from the government lotto game, therefore they are different games which have unequal probability calculation.
Why many people find it difficult to know it is wrong?
Answer: Because people don't know that they always define the subsets for the game in different way without knowing. Or they know they are defining the subsets but are not aware that the different definition of subsets make it a completely different game and cause the difference in the sense of "usuality".