The probability of observing the mean draw from binomial distribution

198 Views Asked by At

I'm reading an article that considers the random variable $p$ that is uniformly distributed on the interval $[0,1]$. It attempts to express the probability of some observation of $p$ being the mean draw from $M$ trials. The argument goes:

Consider $M$ draws of the random variable $p$. $p$ is the mean(=median, given uniform dist) when $k$ draws are less than $p$ and $M-k$ are greater than $p$.* Since $p \in [0,1]$, the probability of observing $p$ is just $p$. Hence, the probability of $k$ draws being less than $p$ (= probability of draws being greater than $p$), is:

$$\int_0^1\binom{M}{k}p^k(1-p)^{M-k}=(M+1)^{-1}$$

I think $p$ is assumed to equal $0.5$ for this expression (I doubt this holds in general).

Is my interpretation of the above expression (as giving you the probability of the mean draw) correct? Secondly, how does $\int_0^1\binom{M}{k}p^k(1-p)^{M-k}$ equal $(M+1)^{-1}$?


*I guess this assumes that $M=2k$.

In case my explanation was confusing, see the following:

enter image description here

2

There are 2 best solutions below

0
On BEST ANSWER

I don't think I recognise your description as being the same as the argument presented in the paper. A few points on your description:

  • You mention that it is about calculating observations of $p$ being the mean, but the article does not talk about the mean.
  • $p$ is also not a distribution: i.e. you refer to $M$ draws of $p$.
  • $p$ is random, so it is not being assumed to equal $0.5$, also in this interpretation what is the integral over? The integral should be interpreted as being in the variable $p$.

I will now try to expand on the description given in the paper.

Suppose we have $M+1$ random variables which are independently uniformly distributed $X_0\,X_1,\ldots, X_M \sim \text{Unif}[0,1]$. The paper is answering the following question:

What is the probability that $X_0$ is the $k$-th largest value of the $M+1$ random variables?

Let me denote this event $A^{(0)}_k$, i.e. the event that $X_0$ is the $k$-th largest variable.

The paper now approaches this question from two angles: conditioning and symmetry. I will consider both arguments separately.

Conditioning Argument

In this argument we condition on the specific value taken by $X_0$, this is the value $p$. According to the law of total probability we have

$$\mathbf P[ A^{(0)}_k] = \int \mathbf P \left[A^{(0)}_k \, | \, X_0 = p \right] f_{X_0}(p) dp,$$

where $f_{X_0}$ is the probability density function of $X_0$; since $X_0$ is uniformly distributed, $f_{X_0} = 1$ on the interval $[0,1]$ and is $0$ elsewhere. Hence the integral above becomes

$$\mathbf P[A^{(0)}_k] = \int_0^1 \mathbf P[A^{(0)}_k \, | \, X_0 = p]dp.$$

Now we must consider the conditional probability term. Note that for a single uniform random variable $X$, and a fixed value $a \in [0,1]$, we have

$$\mathbf P[ X < a] = a, \qquad \qquad \mathbf P[x > a] = 1- a.$$ From the perspective of our conditional distribution this means

$$\mathbf P[X_1 < X_0 \, | \, X_0 = p] = p.$$

Note that this is equivalent to saying that conditioned on $X_0 = p$, the event $\{X_1 < X_0\}$ is Bernoulli distributed $\text{Ber}(p)$. Using independence of the $X_1,\ldots, X_M$ this extends to the statement that the event $A^{(0)}_k$, conditioned on $X_0 = p$ is Binomially distributed $\text{Bin}(M,p)$. In particular

$$\mathbf P[A^{(0)}_k\, | \, X_0 = p] = \binom{M}{k}p^k(1-p)^{M-k}.$$

Putting this together, we have

$$\mathbf P[A^{(0)}_k] = \int_{0}^1 \binom{M}{k}p^k(1-p)^{M-k} dp.$$

Symmetry Argument

This argument is much simpler. We note that since $X_0,X_1,\ldots, X_M$ all have the same distribution, they are all equally likely to be the $k$-th largest value.

That is if $A^{(m)}_k$ denotes the event that $X_m$ is the $k$-th largest variable, then we have

$$\mathbf P[A^{(0)}_k] = \mathbf P[A^{(1)}_k] = \cdots = \mathbf P[A^{(M)}_k]$$

Moreover, since one of these has to be the $k$-th largest we have

$$\mathbf P[A^{(0)}_k] + \mathbf P[A^{(1)}_k] + \cdots + \mathbf P[A^{(M)}_k] = 1,$$

and together these imply

$$\mathbf P[A^{(0)}_k] = \frac{1}{M+1}.$$

And so combining the two arguments we have

$$\mathbf P[A^{(0)}_k] = \int_{0}^1 \binom{M}{k}p^k(1-p)^{M-k} dp = \frac{1}{M+1}$$

0
On

I don't think this interpretation is quite right because we are not drawing from $p$ and we are integrating over all possible values of $p$ (not assuming $p=0.5$).

This problem is part of Beta-Binomial Conjugacy. Suppose you knew that your data is distributed as binomial, but you do not know $p$. So you suppose that $p \sim Unif(0,1)$ (this is the same as $p \sim Beta(1,1)$), this is the prior distribution for $p$. Then if you observe $a$ successes and $b$ failures, Beta-Binomial conjugacy says that your posterior distribution for $p$ is $p \sim Beta(1+a, 1+b)$, i.e the distribution is still Beta.

This integral shows up in the following way: Suppose $X \sim Bin(n,p)$ but $p$ is unknown, so we model $p$ as $Unif(0,1)$. We want the unconditional probability that $P(X=k)$. Notice that by Law of total probability $$P(X=k)=\int_{0}^{1}P(X=k|p)f(p)dp$$ But given $p$, $P(X=k)$ is a known binomial, so $$P(X=k)=\int_{0}^{1}\binom{n}{k}p^k(1-p)^{n-k}dp$$

Surprisingly this integral equals $1/(n+1)$, regardless of what $k$ is. I think this is called Bayes' Billards Argument. Here is the intuition I use: $\textit{For the left hand side:}$ Suppose you randomly throw $n$ white darts and $1$ orange dart at the unit interval $[0,1]$. Call the position of the orange dart the random variable $p$. Its clear that $p \sim Unif(0,1)$. Let $X$ be the number of white darts to the left of the orange dart. If we knew $p$, then $X \sim Bin(n,p)$. Using LOTP $$P(X=k)=\int_{0}^{1}P(X=k|p)f(p)=\int_{0}^{1}\binom{n}{k}p^k(1-p)^{n-k}$$ Now I will show that the $\textit{right hand side}$ is equivalent to this. Now suppose you have $n+1$ unpainted darts that you throw at the unit interval and $\textit{then}$ paint $n$ white and $1$ orange. The orange ball is equally likely to be at any position, so $P(X=k)=1/(n+1)$. So we have $$\int_{0}^{1}\binom{n}{k}p^k(1-p)^{n-k}=1/(n+1)$$

This is the way it was taught in Stat 110. All the lecture videos and problem sets for the class are available online at stat110.net.