Probability question about conditional probability

134 Views Asked by At

Let us suppose that we have a sample of 2 random distinct numbers $I=\{z_1,z_2\}$ that are generated from a uniform distribution with support in $[0,1]$. Let's call $d = \max{I}$ the maximum of the randomly generated sample.

I want to compute the probability that $z_1\leq r$ for some $0\leq r\leq1$ ($r$ is not a random variable) given that $z_1 \neq d$, i.e.

$$ p(z_1 \leq r | z_1 \neq d) $$

To easily compute this probability, we can notice that if $z_1\neq d$, then $z_1$ is the minimum, and therefore

$$ p(z_1 \leq r | z_1 \neq d) = p(\min I\leq r) = 1-p(z\geq r)^2 = 1 - (1-r)^2 $$

I have confirmed this result numerically on Mathematicathat you can check with the following code

checkDistribution[r_] := 
 Module[{win = 0, loss = 0, list, max, i, n = 2}, 
  For[i = 1, i <= 10000, i++,
   list = RandomSample[Range[100 n], n]/(100 n) // N;
   max = Max[list];
   If[list[[1]] != max, 
    If[list[[1]]^(n - 1) <= r, win = win + 1, loss = loss + 1];];
   ];
  Return[{r, win/(win + loss)} // N]]
points = Table[checkDistribution[r][[{1, 2}]], {r, 0, 1, 0.01}];
Show[points // ListPlot, Plot[2 r - r^2, {r, 0, 1}, PlotStyle -> Red]]

that returns

enter image description here

I want however to compute this probability without using the fact that $z_1$ is the minimum, but only using our knowledge that $d$ is the maximum. We should then consider the distribution of the maximum of two random variables. Since $d$ is the maximum, we have

$$ p(d\leq r) = r^2\\ p(d>r) = 1-r^2 $$

Now, we have two cases

  1. $d>r$, in which case $p(z_1 \leq r | z_1 \neq d) = p(d>r) p(z_1<r) = (1-r)\times r$
  2. $d\leq r$, in which case $p(z_1 \leq r | z_1 \neq d) = p(d<r) p(z_1<d) = r^2 \times 1$

but the sum of these two terms does not give the answer. Where is the mistake?

I want to solve this problem in the other way, because I want to generalize it to a set of 3 numbers $I=\{z_1,z_2,z_3\}$ and compute

$$ p(z_1 \leq r | z_1 \neq d)\,. $$ In this generalization, I don't know if $z_1$ is the minimum of the distribution.

4

There are 4 best solutions below

2
On BEST ANSWER

Let's name the events: $A \equiv z_1 \le r$, $B \equiv z_1 \ne d$ , $C \equiv d > r$ and its complement $\bar C \equiv d \le r$ .

Then $$P(A|B)= P(C) P(A | B C) + P(\bar C) P(A | B\bar C) \tag 1$$

Now, you've already computed $P(\bar C)=r^2$, and also $ P(A | B \bar C) = 1$

Hence it's true that

$$P(A|B)= (1-r^2) P(A | B C) + r^2 \tag 2$$

Now, you are implicitly assuming $P(A | B C) = P(A)=r$. However that is wrong.

What is true is that $P(B | A C)=1$ which implies

$$P(A | B C) = \frac{P(A B C)}{P(B C)}=\frac{P(A C)}{P(B C)}=\frac{P(C|A)P(A)}{P(B|C)P(C)}=\frac{2r}{r+1} \tag 3$$

because $P(C|A)=1-r$, $P(A)=r$, $P(C)=1-r^2$ , $P(B|C)=P(B)=\frac12$

Hence finally $P(A|B) = 1 - (1-r)^2$ as expected.

0
On

The applicable relations are: $$\begin{align}&P(z_1\le r\mid z_1\ne d)\\ &= P(d>r)P(z_1\le r\mid z_1\ne d\text{ and }d>r)+P(d\le r)P(z_1\le r\mid z_1\ne d\text{ and }d\le r)\\ &= P(d>r)P(z_1\le r\mid z_1\ne d\text{ and }d>r)+P(d\le r)P(z_1\lt d\mid z_1\ne d\text{ and }d\le r)\\ &\ne P(d>r)P(z_1\le r)\phantom{xxxxxxxxxxxxxx}+P(d\le r)P(z_1\le d)\\ \end{align}$$

You've attempted to use the inequality as an equality.

0
On

Both 1. and 2. are wrong: for 1. you get.

$p(z_1\leq r, d>r | z_1 \neq d)= \frac{p(z_1 \leq r, d>r , d\neq z_1 )}{p(z_1 \neq d)}=\frac{p(z_1 \leq r, z_2 \geq r )}{2}= \frac{r(1-r)}{2}$

  1. Is a bit tricker but

$p(z_1\leq r, d<r | z_1 \neq d) = \frac{p(z_1 \leq r, d<r , d\neq z_1 )}{p(z_1 \neq d)} = \frac{p(z_1 \leq z_2 \leq r )}{2}$

With $p(z_1 \leq z_2 \leq r )= \int^r_0 (\int^{z_2}_0 1 dz_1) dz_2 = \frac{1}{2}r^2$

thus : $ p(z_1\leq r, d>r | z_1 \neq d) + p(z_1\leq r, d<r | z_1 \neq d) = \frac{r(1-r)}{2} + \frac{\frac{1}{2}r^2}{2} = 1-(1-r)^2$

Which is your initial result. It might to think about the problem geometricly when lookin at several points.

0
On

If your intuitive argument is unconvincing, you can check by focusing not on the distribution of $d$, but rather on the already known distributions of $z_1$ and $z_2$.

$$\def\P{\operatorname{\sf P}}\begin{align}\P(z_1\leqslant r\mid z_1\neq d)~&=~\P(z_1\leqslant r\mid z_1<z_2)\\[1ex]&=~2\P(z_1\leqslant r, z_2>z_1)\\[1ex]&=~\textstyle 2\int\limits_0^{\min\{r,1\}}\int\limits_u^1\,\mathrm dv\,\mathrm d u\\[1ex]&=~\textstyle 2\int\limits_0^{\min\{r,1\}} (1-u)\,\mathrm du\\[1ex]&=~(2r-r^2)\,\mathbf 1_{0\leqslant r\leqslant 1}+\mathbf 1_{1<r}\\[1ex]&=~\bigl(1-(1-r)^2\bigr)\,\mathbf 1_{0\leqslant r\leqslant 1}+\mathbf 1_{1<r}\end{align}$$

Which confirms the result of your intuition.