Counterintuitive examples in probability

41.6k Views Asked by At

I want to teach a short course in probability and I am looking for some counter-intuitive examples in probability. I am mainly interested in the problems whose results seem to be obviously false while they are not.

I already found some things. For example these two videos:

In addition, I have found some weird examples of random walks. For example this amazing theorem:

For a simple random walk, the mean number of visits to point $b$ before returning to the origin is equal to $1$ for every $b \neq 0$.

I have also found some advanced examples such as Do longer games favor the stronger player?

Could you please do me a favor and share some other examples of such problems? It's very exciting to read yours...

30

There are 30 best solutions below

24
On BEST ANSWER

The most famous counter-intuitive probability theory example is the Monty Hall Problem

  • In a game show, there are three doors behind which there are a car and two goats. However, which door conceals which is unknown to you, the player.
  • Your aim is to select the door behind which the car is. So, you go and stand in front of a door of your choice.
  • At this point, regardless of which door you selected, the game show host chooses and opens one of the remaining two doors. If you chose the door with the car, the host selects one of the two remaining doors at random (with equal probability) and opens that door. If you chose a door with a goat, the host selects and opens the other door with a goat.
  • You are given the option of standing where you are and switching to the other closed door.

Does switching to the other door increase your chances of winning? Or does it not matter?

The answer is that it does matter whether or not you switch. This is initially counter-intuitive for someone seeing this problem for the first time.


  • If a family has two children, at least one of which is a daughter, what is the probability that both of them are daughters?
  • If a family has two children, the elder of which is a daughter, what is the probability that both of them are the daughters?

A beginner in probability would expect the answers to both these questions to be the same, which they are not.

Math with Bad Drawings explains this paradox with a great story as a part of a seven-post series in Probability Theory


Nontransitive Dice

Let persons P, Q, R have three distinct dice.

If it is the case that P is more likely to win over Q, and Q is more likely to win over R, is it the case that P is likely to win over R?

The answer, strangely, is no. One such dice configuration is $(\left \{2,2,4,4,9,9 \right\},\left \{ 1,1,6,6,8,8\right \},\left \{ 3,3,5,5,7,7 \right \})$


Sleeping Beauty Paradox

(This is related to philosophy/epistemology and is more related to subjective probability/beliefs than objective interpretations of it.)

Today is Sunday. Sleeping Beauty drinks a powerful sleeping potion and falls asleep.

Her attendant tosses a fair coin and records the result.

  • The coin lands in Heads. Beauty is awakened only on Monday and interviewed. Her memory is erased and she is again put back to sleep.
  • The coin lands in Tails. Beauty is awakened and interviewed on Monday. Her memory is erased and she's put back to sleep again. On Tuesday, she is once again awaken, interviewed and finally put back to sleep.

In essence, the awakenings on Mondays and Tuesdays are indistinguishable to her.

The most important question she's asked in the interviews is

What is your credence (degree of belief) that the coin landed in heads?

Given that Sleeping Beauty is epistemologically rational and is aware of all the rules of the experiment on Sunday, what should be her answer?

This problem seems simple on the surface but there are both arguments for the answer $\frac{1}{2}$ and $\frac{1}{3}$ and there is no common consensus among modern epistemologists on this one.


Ellsberg Paradox

Consider the following situation:

In an urn, you have 90 balls of 3 colors: red, blue and yellow. 30 balls are known to be red. All the other balls are either blue or yellow.

There are two lotteries:

  • Lottery A: A random ball is chosen. You win a prize if the ball is red.
  • Lottery B: A random ball is chosen. You win a prize if the ball is blue.

Question: In which lottery would you want to participate?

  • Lottery X: A random ball is chosen. You win a prize if the ball is either red or yellow.
  • Lottery Y: A random ball is chosen. You win a prize if the ball is either blue or yellow.

Question: In which lottery would you want to participate?

If you are an average person, you'd choose Lottery A over Lottery B and Lottery Y over Lottery X.

However, it can be shown that there is no way to assign probabilities in a way that make this look rational. One way to deal with this is to extend the concept of probability to that of imprecise probabilities.

7
On

A famous example is this one that is called St. Petersburg paradox:

Consider a game in which you earn $2^n\: \$ $ if you get $n$ consecutive Heads in a fair coin tosses. The fair entrance fee of this game is $\infty$

17
On

A while back, the xkcd blog posted this problem, which I found fascinating. Usually when I re-tell it, I do so slightly differently from the original author:

I have selected two numbers from $\mathbb{R}$, following some unknown and not necessarily independent distribution. I have written each number in a separate envelope. By fair coin toss, I select one of these two envelopes to open, revealing that number. I then ask the question "Is the number in the other envelope larger than this one?". You win if you guess correctly.

Can you win this game with probability $>\frac{1}{2}$? Note, that is a strict inequality. Winning with probability $=\frac{1}{2}$ is obviously easy.

Now, the solution to this starts out with a double-integral, so depending on the level of the class you're teaching it may not be appropriate.

8
On

Strongly related with OPs example is this consequence of the Arc sine law for last visits. Let's assume playing with a fair coin.

Theorem (false) In a long coin-tossing game each player will be on the winning side for about half the time, and the lead will pass not infrequently from one player to the other.

The following text is from the classic An Introduction to Probability Theory and Its Applications, volume 1, by William Feller.

  • According to widespread beliefs a so-called law of averages should ensure the Theorem above. But, in fact this theorem is wrong and contrary to the usual belief the following holds:

    With probability $\frac{1}{2}$ no equalization occurred in the second half of the game regardless of the length of the game. Furthermore, the probabilities near the end point are greatest.

5
On

Birthday Problem

For me this was the first example of how counter intuitive the real world probability problems are due to the inherent underestimation/overestimation involved in mental maps for permutation and combination (which is an inverse multiplication problem in general), which form the basis for probability calculation. The question is:

How many people should be in a room so that the probability of at least two people sharing the same birthday, is at least as high as the probability of getting heads in a toss of an unbiased coin (i.e., $0.5$).

This is a good problem for students to hone their skills in estimating the permutations and combinations, the base for computation of a priori probability.

I still feel the number of persons for the answer to be surreal and hard to believe! (The real answer is $23$).

Pupils should at this juncture be told about quick and dirty mental maps for permutations and combinations calculations and should be encouraged to inculcate a habit of mental computations, which will help them in forming intuition about probability. It will also serve them well in taking to the other higher level problems such as the Monty Hall problem or conditional probability problems mentioned above, such as:

$0.5\%$ of the total population out of a population of $10$ million is supposed to be affected by a strange disease. A test has been developed for that disease and it has a truth ratio of $99\%$ (i.e., its true $99\%$ of the times). A random person from the population is selected and is found to be tested positive for that disease. What is the real probability of that person suffering from the strange disease. The real answer here is approximately $33\%$.

Here strange disease can be replaced by any real world problems (such as HIV patients or a successful trading / betting strategy or number of terrorists in a country) and this example can be used to give students a feel, why in such cases (HIV patients or so) there are bound to be many false positives (as no real world tests, I believe for such cases are $99\%$ true) and how popular opinions are wrong in such cases most of the times.

This should be the starting point for introducing some of the work of Daniel Kahneman and Amos Tversky as no probability course in modern times can be complete without giving pupils a sense of how fragile one's intuitions and estimates are in estimating probabilities and uncertainties and how to deal with them. $20\%$ of the course should be devoted to this aspect and it can be one of the final real world projects of students.

7
On

The boy or girl paradox already mentioned by Agnishom has an interesting variation:

''Suppose we were told not only that Mr. Smith has two children, and one of them is a boy, but also that the boy was born on a Tuesday: does this change the previous analyses?'' (for the question ''what is the probability that both children are boys'')?

Using some elementary computations with Bayes formula, the seemingly useless information that a child was born on Tuesday, changes the results.

To understand the intuition behind, consider an extreme case where you knew that one boy was born on December $30$. Then it is very unlikely that the other child is born on that date too, so one child is ''specified'' by the date-information. This reduces the question to ''is the other child a boy''? and changes the probability from $\frac13$ to approximately $\frac12$.

However, I do not recommend to use this example for teaching, as there are many interpretations of this paradox (that partially depend on language nuances of the formulation) and it can add more confusion then clarify something.

25
On

I particular like the triple-or-nothing game:

You start with $1$ sweet $^{[1]}$ in the pot. At each step, you can either choose to leave the game with all the sweets in the pot, or you can continue the game. If you continue, a fair coin is flipped, and if it comes up heads then the sweets in the pot are tripled, but if it comes up tails then the pot is emptied.

If you can play this game only once, how many sweets would you be willing to pay to play? And how should you play? (Assume that you want to get the most sweets possible.)

$^{[1]}$ Let's not be money-minded here...

The naive (and incorrect) analysis is to compute that at any step if there are $x$ sweets in the pot and you continue the game then the expected number of sweets in the pot will become $1.5x$. Thus you should not stop. But that is stupid; if you never stop you will never get any sweets! So when to stop?

Worse still, a correct analysis will tell you that no matter how many sweets you pay, you can play in such a way that the expected number of sweets you leave with is more than what you paid! The (silly) conclusion is that you should be willing to pay any number of sweets to play!

If you think really carefully about it, you will realize that expectation is a very poor indicator of rationality of choice. Instead, everyone will have some risk aversity, more generally a mapping from probability distributions to favourability. One possibility is that a probability distribution is unfavourable iff its median is not positive (representing no net gain). Then clearly this game will never be favourable to anyone with this kind of risk aversity except if you commit to playing for exactly one step. In real life, people will evaluate distributions in a much more complicated way than just checking the median.

That said, a reasonable rule of thumb is that it is not worth to make a decision whose estimated benefit does not have both positive mean and positive median. Positive mean is necessary for rules of thumb, otherwise you will not benefit in the long run. Positive median will prevent other stupid decisions such as playing the triple-or-nothing game for more than one step or paying more than 1.5 sweets to play it. More risk-averse people will play for zero steps and just take the initial sweet and leave!

This rule will show (reasonably) not only that it is not worth to pay even 2 sweets to play the triple-or-nothing game only once, but also that it is not worth to offer the game for others to play! Any application of probability to real-life decisions should be able to deal with such situations.


[Further remarks...]

My claim about the rule of thumb being reasonable is that it should work quite well in real life. Whether it agrees with various mathematical models of human rationality is irrelevant. Secondly, my rule of thumb is merely for determining whether a single option is worth taking or not. To compare between multiple choices of which you must pick one, you would have to extend the rule of thumb. One possible way is to define the value of each choice to be the minimum of the mean and median benefit. Then you of course pick the choice with the maximum value. Thirdly, different people will of course have different ways to evaluate a choice based on its benefit's probability distribution (assuming it can even be translated to some real number). A very risk averse person might take the minimum of the 1st percentile (roughly speaking the minimum benefit you believe you will gain in 99% of the cases) and the mean (average benefit). Someone else may combine the percentiles and mean in a different fashion, such as taking $-\infty$ as the value if the 5th percentile is below some threshold (such as representing serious hurt), but taking the mean otherwise.

17
On

I find that almost anything about probability is counter-intuitive to my college students on first encounter. Possibly this may depend on your audience. Here are a few examples:

$1.$ Question: "If a certain event has a $40\%$ chance of success, and we run $50$ experiments, then how many would you expect to succeed?" The most common responses I usually get are "all of them" and "none of them". This is after an hour-long lecture on the subject.

$2.$ Question: "Interpret this probability statement: There is a $30\%$ chance of rain today in the New York area." I usually only get about a $65\%$ successful response rate on this on a multiple-choice quiz, even after the hour-long lecture on the subject. Once I had a student so bamboozled by it that she called up the national meteorology service for a consultation.

$3.$ Question: "We have a hand of four cards $\{A, 2, 3, 4\}$, and pick out two at random; what is the probability we get the $A$ or $2$ ?" Common responses are $25\%$, $50\%$, and $75\%$. I've never had anyone in a class intuit the correct answer on first presentation.

$4.$ Question: "If you drive to school on a given day, you either get in an accident or you don't. Are these equally likely outcomes?" At least half of any class answers "yes" on the first presentation. This can be repeated with the same result with similar follow-up questions.

5
On

Bertrand's Paradox

Given two concentric circles ($S_1$, $S_2$) with radii $R_1=r$ and $R_2=\frac{r}2$, what is the probability, upon choosing a chord $c$ of the circle $S_1$ at random, that $c\:\cap\: S_2 \neq \emptyset$ ?

Simply speaking, your task is to

choose a chord of the larger circle at random and find the probability that it will intersect the smaller circle.

Surprisingly, Bertrand's Paradox offers three distinct yet valid solutions.

The same problem can also be stated as:

Given an equilateral triangle inscribed in a circle, find the probability of randomly choosing a chord of the circle greater than the length of a side of the triangle.

The counter-intuition steps in when you understand that the answer to the stated problem is $\frac12,\:\frac13,$ and even $\frac14$ at the same time, and all three answers are perfectly valid.

The crucial reason why there are three solutions to this in different cases is that the methods of selection of random variables are different in each case.

Here's the Wikipedia page for details on how each value is obtained and through what steps.

I remember that a professor had begun my high-school level probability class using Bertrand's Paradox as an introductory example.

2
On

Consider the $d-$dimensional sphere, then as $d$ goes to infinity the mass concentrates on the equator $x_1=0$

5
On

Airplane Seating

$100$ people are boarding a plane in a line and each of them is assigned to one of the $100$ seats on a plane. However, the first person in line forgot his boarding pass and as a result decided to sit down in a random seat. The second person will do the following:

  1. Sit in her seat if it still available.
  2. If her seat is not available, choose a random seat among the seats remaining and sit there.

Each following person sits according to the same rules as the second person. What is the probability the $100^{th}$ person will be able to sit in her assigned seat?

Most people think the probability is very small and think there is a tiny chance of the 100th person's seat being left after all the people move around. But the actual probability ends up being $\frac{1}{2}$.

6
On

I think the most stunning example are the non-transitive dice.

Take three cubic dice with the following numbers on their sides:

  • Die $A:$ $3 \: 3 \: 3 \: 3 \: 3 \: 6$
  • Die $B:$ $2 \: 2 \: 2 \: 5 \: 5 \: 5$
  • Die $C:$ $1 \: 4 \:4 \: 4 \: 4 \:4$

Now I offer you the following game: You choose a die as you like, then I choose another die, and then we are rolling and the highest number wins.

No matter which die you choose, I can choose another one that wins more often than loses against your choice.

3
On

Perhaps Parrondo's Paradox would be interesting. One can combine losing propositions into a winning proposition.

Simpson's Paradox is also interesting. (And actually occurred in a court case.)

5
On

I flip two coins. Given that one is heads, what's the probability the other one is heads?

Surprisingly, it's not $\frac12$.

4
On

The secretary's problem (which has other names).The secretary has $n$ letters ($0<n<\infty$) and $n$ pre-addressed envelopes but puts the letters into the envelopes randomly, one letter per envelope. What is the chance $C(n)$ that NO letter gets into the right envelope?

The answer is $C(n)=\sum_{j=0}^n(-1)^j/j!,$ which converges to $1/e$ as $n\to \infty$. I think the method of solution is instructive.

One counter-intuitive result is that $C(n)$ is not monotonic in $n.$

Also many people would be inclined to guess that $C(n)>1/2$ for large $n.$

Another version of this is to take two shuffled decks, each with $n$ playing cards, and ask for the chance that no card occupies the same position in both decks.

I first saw this in "101 Great Problems In Elementary Mathematics" by H. Dorrie.

10
On

It's not counter intuitive but it's amazing for teaching in class.

Pick $a,b \in [n]$ randomly. $\mathbb{P}[gcd(a,b)=1]$ tends to $\frac{6}{\pi^2}$ as $n$ goes to infinity.

Also, there is some other interesting problem whose answers have $\pi , e ,...$

2
On

Base rate fallacy

If presented with related base rate information (or generic information) and specific information (information only pertaining to a certain case), the mind tends to ignore the former and focus on the latter.

Example:
A group of police officers have breathalyzers displaying false drunkenness in 5% of the cases in which the driver is sober. However, the breathalyzers never fail to detect a truly drunk person. One in a thousand drivers is driving drunk. Suppose the police officers then stop a driver at random, and force the driver to take a breathalyzer test. It indicates that the driver is drunk. We assume you don't know anything else about him or her. How high is the probability he or she really is drunk?

Intuitive first answer might be as high as 0.95, but the correct probability is about 0.02.

Solution : Using Bayes's theorem.

The goal is to find the probability that the driver is drunk given that the breathalyzer indicated he/she is drunk, which can be represented as $${\displaystyle p(\mathrm {drunk} |D)}$$
where "D" means that the breathalyzer indicates that the driver is drunk.

Bayes's theorem tells us that

$$ {\displaystyle p(\mathrm {drunk} |D) = {\frac {p(D|\mathrm {drunk} )\, p(\mathrm {drunk} )}{p(D)}}} $$

We were told the following in the first paragraph:

$${\displaystyle p(\mathrm {drunk} )=0.001} $$ $${\displaystyle p(\mathrm {sober} )=0.999} $$ $${\displaystyle p(D|\mathrm {drunk} )=1.00} $$ $${\displaystyle p(D|\mathrm {sober} )=0.05} $$

As you can see from the formula, one needs p(D) for Bayes' theorem, which one can compute from the preceding values using
$${\displaystyle p(D)=p(D|\mathrm {drunk} )\,p(\mathrm {drunk} )+p(D|\mathrm {sober} )\,p(\mathrm {sober} )} $$

which gives $$ {\displaystyle p(D)=(1.00\times 0.001)+(0.05\times 0.999)=0.05095} $$

Plugging these numbers into Bayes' theorem, one finds that $$ {\displaystyle p(\mathrm {drunk} |D)={\frac {1.00\times 0.001}{0.05095}}=0.019627 \approx 0.02 } $$


A more intuitive explanation: on average, for every 1,000 drivers tested, 1 driver is drunk, and it is 100% certain that for that driver there is a true positive test result, so there is 1 true positive test result 999 drivers are not drunk, and among those drivers there are 5% false positive test results, so there are 49.95 false positive test results.
Therefore, the probability that one of the drivers among the $$1 + 49.95 = 50.95 $$positive test results really is drunk is $$ {\displaystyle p(\mathrm {drunk} |D)=1/50.95\approx 0.019627} $$

2
On

In contract bridge, there is the principle of restricted choice. It's always seemed counterintuitive to me.

https://en.m.wikipedia.org/wiki/Principle_of_restricted_choice

2
On

One of the most puzzling results in probability is that the probability of randomly (and with uniform probability) picking a rational number among the set of reals is zero. This is nicely explained here.

The set of rational numbers, for instance in the $\Omega=[0,1]$ interval is the countable union of disjoint singletons, and each one of these singletons has a probability of zero. Here is the proof:


A singleton, $\{b\}$, is a Borel measurable set with a Lebesgue measure of zero. The proof is as follows:

$$\Pr\left(\{b\}\right)=\Pr\left(\bigcap_{n=1}^\infty\left(b-\frac{1}{n},b + \frac{1}{n}\right]\cap \Omega\right)$$

is the probability of nested decreasing sets, allowing the use of the theorem of continuity of probability measures $(*)$ to re-write it as:

$$\Pr\left(\{b\}\right)=\lim_{n\rightarrow \infty}\,\Pr\left(\left(b-\frac{1}{n},b + \frac{1}{n}\right]\cap \Omega\right)$$

The probability of $$\Pr\left(b-\frac{1}{n},\,b + \frac{1}{n} \right)\leq \lim_{n\rightarrow \infty}\frac{2}{n}=0$$


Therefore, by countable additivity of measures $(**)$, the probability for the whole set of $\mathbb Q$ is zero:

$$\Pr(\mathbb Q\;\cap \Omega) = 0$$

The apparent paradox is that despite the infinity number of rational numbers in the $[0,1]$ interval, the probability of randomly choosing a rational is strictly zero.

The source is this great explanation here.


$(*)$ If $B_j, j = 1, 2,\cdots,$ is a sequence of events decreasing to $B$, then $\displaystyle\lim_{n\rightarrow \infty} \Pr \{B_n\} = \Pr \{B\} .$

$(**)$ For all countable collections $\{E_i\}_{i=1}^\infty$ of pairwise disjoint sets in a sigma algebra: $$\mu\left( \bigcup_{k=1}^\infty \, E_k \right)=\sum_{k=1}^\infty \mu(E_k).$$

3
On

Someone mentioned non-transitive dice, and that reminded me of this one:

Suppose there are two unweighted six-sided dice that you cannot examine, but which you can direct a machine to roll and inform you of the sum. You can do this as often as you like, and the distribution of the sum is exactly what you would expect of a pair of ordinary six-sided dice.

Are they, in fact, a pair of ordinary six-sided dice?

Not necessarily.

Then someone mentioned a secretary problem, which turned out to be about derangements. I had in mind a different secretary's problem, which is also called the sultan's dowry:

You have $100$ candidates, upon which there exists a total order. The candidates have appointments with you, one at a time, in a random order. From each interview, you can tell exactly how that candidate ranks amongst those you have already examined. At that point, you may either accept or reject the candidate. Any acceptance or rejection is permanent; no candidate, once rejected, may be reconsidered. Your objective is solely to accept the best candidate. What is the strategy for maximizing your probability of doing so, and what is that probability?

As often happens in probability puzzles, the answer is $1/e$*, which many people find surprisingly high.


*Approximately, with the approximation getting better and better as the number of candidates increases without bound.

2
On

Lake Wobegon Dice

Find a set of $n$ dice (each with $s$ sides, numbered appropriately), in which each die is more likely to roll above the set average on that roll than below the set average. Given $n$, find the Lake Wobegon Optimal set, in which that probability is maximum.

"Lake Wobegon Dice," by Jorge Moraleda and David G. Stork, College Mathematics Journal, 43(2):152--159 (2012)

Abstract:

  • We present sets of $n$ non-standard dice—Lake Wobegon dice—having the following paradoxical property: On every (random) roll of a set, each die is more likely to roll greater than the set average than less than the set average; in a specific statistical sense, then, each die is “better than the set average.”

    We define the Lake Wobegon Dominance of a die in a set as the probability the die rolls greater than the set average minus the probability the die rolls less than the set average. We further define the Lake Wobegon Dominance of the set to be the dominance of the set’s least dominant die and prove that such paradoxical dominance is bounded above by $(n-2)/n$ regardless of the number of sides $s$ on each die and the maximum number of pips $p$ on each side. A set achieving this bound is called Lake Wobegon Optimal. We give a constructive proof that Lake Wobegon Optimal sets exist for all $n \ge 3$ if one is free to choose $s$ and $p$. We also show how to construct minimal optimal sets, that is, that set that requires the smallest range in the number of pips on the faces.

    We determine the frequency of such Lake Wobegon sets in the $n = 3$ case through exhaustive computer search and find the unique optimal $n = 3$ set having minimal $s$ and $p$. We investigate symmetry properties of such sets, and present equivalence classes having identical paradoxical dominance. We construct inverse sets, in which on any roll each die is more likely to roll less than the set average than greater than the set average, and thus each die is “worse than the set average.” We show the unique extreme “worst” case, the Lake Wobegon pessimal set.

enter image description here

4
On

I have also had a lecture on this excellent topic.

Unfortunately, my lecture notes are in Czech, but I can translate some paradoxes from there:

Monty Hall

Monty Hall is in my opinion the most famous probabilistic paradox. It is described here and on the Wiki well. So I just provide you a way how to make it feel intuitive, to persuade other people that the real probability is computed correctly. It is quite impressive, almost like a magician trick :-)

Take a pack of cards and let someone draw a random card. Tell him that he want to draw the ace of hearts and he should not look on the chosen card. Then show to audience all remaining cards but one which are not the ace of hearts. So then there are two hidden cards: One in your hand and one in his hand. Finally, he may change his first guess. Most people do it and they are most likely correct :-).

Tennis-like game

There are two players, Alice and Bob, playing a tennis-like game. Every time, one player serves the ball and winning probabilities of the ball depends on the player. Player who first reaches say 11 points wins the match. Alice serves the first ball. Then there are three possible schemes:

  1. The winner of the previous ball serves the next.
  2. The loser of the previous ball serves the next.
  3. Service is regularly alternating.

One would expect that scheme 1. helps stronger players and scheme 2 helps weaker players. The paradox is that winning probabilities of the match do not depend on the chosen scheme.

Proof sketch: Pregenerate 11 cards with the winner of Alice services (pack A) and 10 cards with the winner of Bob's service (pack B). Then each Alice's (or Bob's) service can be modeled just by drawing a card from the pack A (or B). It can be shown that these 21 cards suffice for any of these 3 presented schemes. And the winner is determined by the cards: there is exactly one player written on at least 11 cards.

Candies

I have a bag of candies, there are 123 caramel candies and 321 mint candies. Every morning I randomly draw candies from the pack and eat them while they are all the same. When I take a different kind of candy, I return it back. What is the probability that the last eaten candy will be the caramel one?

Answer: 1/2. (one would expect that it is less than 1/2 since there are less caramel candies)

Proof: It suffices to show that every morning the probability that all caramel candies will be eaten is the same as the probability that all mint candies will be eaten. We can imagine that candies are randomly ordered every morning and I am drawing them from left to right. I eat all caramel candies if the order is "First caramel candies, then mint ones.". I eat all mint candies if the order is the opposite.

Wolf on a circle

There is a wolf at one vertex of a regular n-gon. There is a sheep at every remaining vertex. Each step, the wolf moves to a randomly chosen adjacent vertex and if there is a sheep, the wolf eat it. The wolf ends when it eats n-2 sheep (so there remains just one sheep).

Intuitively, the sheep at the opposite vertex from the wolf is in the best position. The paradox is that all sheep have the same probability of survival.

Proof: Take one sheep S. The wolf will definitely get to an adjacent vertex to S for the first time. This time the sheep on the other adjacent vertex is still not eaten. So for S survival, the wolf have to go around the whole circle without eating S. The probability of going around the whole circle from one vertex adjacent to S to the other without visiting S does not depend on the position of S.

Simpson's Paradox

There is a research on two medical cures A, B.

200 people tried the cure A, it helped to 110 people (50 men, 60 women) and did not helped to 90 people (60 men, 30 women).

210 people tried the cure B, it helped to 120 people (30 men and 90 women) and did not helped to 90 people (40 men, 50 women).

So in general, the cure B is better since 120:90 > 110:90.

But if you are a man, you can consider just men statistics: 50:60 > 30:40, so the cure A is more appropriate.

And if you are a woman, you can consider just women statistics: 60:30 > 90:50, so the cure A is again more appropriate.

Shocking, isn't it? :-)

0
On

There's the two envelopes paradox. Wikipedia states it as follows:

You are given two indistinguishable envelopes, each containing money, one contains twice as much as the other. You may pick one envelope and keep the money it contains. Having chosen an envelope at will, but before inspecting it, you are given the chance to switch envelopes. Should you switch?

The issue is that there is an amount $X$ of money in the envelope you have. If it's the lesser amount then switching gives you a reward of $2X$ and if you don't then you only get $X/2$. Since both cases are equally likely, it seems that you should switch, even though that is nonsensical since clearly your chance of getting the larger amount can only ever be 50%.

The Wikipedia article goes into great depth explaining various resolutions of this fallacy. It boils down to the fact that the sum of the two envelopes is the same in both cases, which means that the $X$s considered above aren't actually identical.

A related problem is the necktie paradox.

2
On

I see the Monty Hall has been mentioned a couple of times, but I want to mention it again because I think the reason that it's interesting is missed in the other answers. In particular, it demonstrates not only a counter-intuitive result for a given formulation, but it also demonstrates how sensitive the correct answer is to the formulation of the problem. I especially like this NYT article as an illustration:

http://www.nytimes.com/1991/07/21/us/behind-monty-hall-s-doors-puzzle-debate-and-answer.html?pagewanted=all

From a teaching point of view, this is a fun exercise because Monty Hall (the real person) is part of the article, and he plays a role both in validating the math on the "academic" version of the problem and in showing the math is meaningless in the real game because he has controls that are not in academic version. Moreover, after years of doing the show, he's good at reading individual contestants, so probability is not really at play in a significant way.

0
On

This is one of my favorites:

100 prisoners problem

The director of a prison offers 100 death row prisoners, who are numbered from 1 to 100, a last chance. A room contains a cupboard with 100 drawers. The director randomly puts one prisoner's number in each closed drawer. The prisoners enter the room, one after another. Each prisoner may open and look into 50 drawers in any order. The drawers are closed again afterwards. If, during this search, every prisoner finds his number in one of the drawers, all prisoners are pardoned. If just one prisoner does not find his number, all prisoners die. Before the first prisoner enters the room, the prisoners may discuss strategy—but may not communicate once the first prisoner enters to look in the drawers.

If every prisoner selects $50$ drawers at random, the probability that a single prisoner finds his number is $50\%$. Therefore, the probability that all prisoners find their numbers is the product of the single probabilities, which is $1/2^{100} \approx 0.0000000000000000000000000000008$, a vanishingly small number. The situation appears hopeless.

Up to which value can prisoners improve the probability of being pardoned using a good strategy?

1
On

I remember to be confused the first time this riddle was proposed to me:

What is the probability of getting a poker hand with at least two aces, assuming that it contains at least one ace?

What if we know that it is indeed an ace of spades? does this information change the probability?

0
On

The Shooting Room Paradox

A single person enters a room and two dice are rolled. If the result is double sixes, he is shot. Otherwise he leaves the room and nine new players enter. Again the dice are rolled, and if the result is double sixes, all nine are shot. If not, they leave and 90 new players enter, and so on (the number of players increasing tenfold with each round). The game continues until double sixes are rolled and a group is executed, which is certain to happen eventually (the room is infinitely large, and there's an infinite supply of players).

If you're selected to enter the room, how worried should you be? Not particularly: Your chance of dying is only 1 in 36. Later your mother learns that you entered the room. How worried should she be? Extremely: About 90 percent of the people who played this game were shot. What does your mother know that you don't? Or vice versa?

5
On

One that I found surprising as a beginner was that three events (or random variables) can be independent pairwise, but not jointly independent. Or to put it somewhat more strikingly, we can have $C$ independent of $A$ and independent of $B$, yet not independent of $A,B$. Wording it this way shows that it does take care to state independence assumptions carefully, and it also illustrates some non-obvious subtleties in the definition of independence (one of them being that independence of two events does not mean that they don't interact and can't be influenced by a third factor).

One example of pairwise but non-mutual independence is given on the Wikipedia page.

The example that I typically use is to take $T$ to be a uniformly random angle in $[0,2\pi)$ and then consider the events $A = \{\sin T > 0\}$, $B = \{ \cos T > 0\}$ and $C = \{ \tan T > 0 \}$ (effectively, this is just two independent $\pm 1$ variables and their product, but the trig formulation helps to visualize the events in terms of quadrants).

It's easy to see that $P(A)=P(B)=P(C) = \tfrac12$ and that $P(A\cap B) = P(A\cap C) = P(B\cap C) = \tfrac14$, but clearly $P(A\cap B\cap \overline C) = 0$.

2
On

One I saw on Twitter recently, which is perhaps a clearer version of sex-of-children-type problems:

Three casino chips have a dot on each side:

  • on one chip the dots are both blue,
  • on the second there is a blue dot on one side and red on the other, and
  • on the third the dots are both red.

The chips are placed in a bag and, without looking, you choose one and place it on the table. When you look at the chip, you see it has a blue dot on top. What is the probability that the dot on the other side is blue?

Many people will say $1/2$ - I did before I thought properly - but...

you are blindly choosing both the chip, and which side to face up. So you have to consider that each dot has an equal chance of showing, making the chance $2/3$.

4
On

Optimizer's Curse: suppose you have a number of options to choose from, each with some objective true value. You don't have access to the objective true value, but instead to a noisy estimate of it, say a value sampled from a normal distribution with mean the true value and variance $1$.

You, naturally, pick the choice whose estimate of true value is the highest. When you do so, you discover what the true value really was. Call the difference between the estimate and the true value your post-decision surprise.

Now, the error of your estimate was normally distributed with mean $0$, so you might also guess that your post-decision surprise will have mean $0$ – sometimes you will be pleasantly surprised, sometimes disappointed. But in fact the post-decision surprise is usually negative, that is to say, you are usually disappointed by your choice!

In retrospect, this is perhaps not so surprising: certainly if all the true values were the same, you'd simply pick the estimate with the highest inflation due to noise, and more generally, conditional on an estimate being the leader of a pack, it's more likely to be an overestimate than an underestimate.

More interestingly, if the variances aren't the same for all your estimates, it's no longer correct to just pick the highest estimate to maximise your expected true value. If the highest estimate is also high-variance, it may lead to a lower true value in expectation than a lower estimate with better precision, and so what looks superficially like some kind of risk-aversion (discounting the value of apparently high-variance options) is actually justifiable purely on expected-value grounds.

(Notice also that "there a bunch of options with some true values, but you only get a noisy estimate of the true value, and you have to decide which to choose" is a REALLY common situation to be in, so this problem is pretty pervasive in real life optimization scenarios)