How do the binomial distribution works?

253 Views Asked by At

I have the following question, we have just been looking at random variables and their distributions. So I know that a random variable (RV) is a function $X:\Omega\rightarrow M$ where $M$ is an arbitrary quantity. Now if we define a measure of probability $P$ on $\Omega$ then we can use RV to project this measure onto the set $M$ and get the image measure $P_X(A)=P(X^{-1}(A)) $where $A$ is a subset of $\Omega$, then $P_X$ is called the distribution of our RV. Is this correct up to here?

Now we have looked at the binomial distribution, for example. There we take $p\in [0,1]$ and $X:\Omega \rightarrow \{0,...,n\}$ then we said that $P(X=k)= \binom{n}{k}p^k (1-p)^{n-k}$. But now it says in a lecture that the binomial distribution itself is $$\sum _{k=0}^n \binom{n}{k}p^k (1-p)^{n-k} \delta_k$$ Somehow I'm a bit confused now, firstly I don't see why we defined it differently and secondly I don't see why we have to take such a sum here. Could someone explain this to me? I would be very grateful.

2

There are 2 best solutions below

0
On BEST ANSWER

Your first paragraph about random variables, distributions and image measures is correct.

It is important to distinguish between a binomial random variable, which is a random variable $X$, whose distribution $\mathbb{P}_X$ is the binomial distribution, and the binomial distribution itself, which is just a measure.

Suppose, that $X:\Omega \rightarrow \mathbb{R}$ is a random variable with distribution $$\mathbb{P}_X = \sum_{k=0}^n {n \choose k } p^k(1-p)^{n-k}\delta_k,$$ what can we then say about $\mathbb{P}(X=j)$ for $j\in \{0,\dots,n\}$? We compute using the formula \begin{align*} \mathbb{P}(X=j) &= \mathbb{P}_X(\{j\}) \\ &= \sum_{i=0}^n {n \choose k } p^k(1-p)^{n-k}\delta_k(\{j\}) \\ &= {n \choose j } p^j(1-p)^{n-j} \end{align*} where we have used that $$\delta_k(\{j\}) = \begin{cases} 1 & k=j \\ 0 & k\neq j\end{cases}.$$ And we see that two definitions do agree with eachother. The advantage of the second definition is, that $\sum_{k=0}^n {n \choose k } p^k(1-p)^{n-k}\delta_k(A)$ is well defined for all measurable sets $A \subseteq \mathbb{R}$ and not just singletons. Do note that some random variables (normally distributed r.v. for instance) do have $\mathbb{P}(X=j) = 0$ for all $j$.

0
On

The first one is the mass function at $k$. The second is the measure associated to that mass function.

The mass at $k$ is a single number: the probability that $X = k$. The mass function is a function from the support of $X$ to $[0, 1]$ defined by $f(k) = \binom{n}{k}p^k(1 - p)^{n - k}$. Or we might say that $f$ is a function from $\mathbf{R} \to [0,1]$ but supported on $\{0,\dots,n\}$.

A probability measure is not a function from $\{0,\dots,n\}$ or from $\mathbf{R}$ to $[0,1]$. A measure is a function from measurable subsets to $[0,1]$. Meaning either subsets of $\{0,\dots,n\}$ or $\mathbf{R}$ depending on what the domain of $\delta_k$ is.

Specifically that measure is

$$\mu_X(A) = \sum_{k = 0}^n \binom{n}{k}p^k(1 - p)^{n - k}\delta_k(A) = \sum_{k \in A} \binom{n}{k}p^k(1 - p)^{n - k}$$

since $\delta_k(A) = 1$ if $k \in A$ and is $0$ if $k \notin A$.

You will notice that the domain is usually a bit ambiguous in probability theory.


Edit: here is some more definitions in measure-theoretic language.

Let me make the notations a little more explicit here. So we have a probability space $(\Omega, \Sigma, \mathbf{P})$ and $X : \Omega \to \mathbf{R}^n$ is some measurable function. Let $S$ be the range of $X$. Then $X$ induces a measure $\mu_X$ on the measurable subsets of $S$. This construction is known as a pushforward measure and is defined by $\mu_X(A) = \mathbf{P}(X \in A) = \mathbf{P}(X^{-1}(A))$. We call $\mu_X$ the probability distribution or measure of $X$.

Now suppose $\mu_X$ is absolutely continuous with respect to the Lebesgue measure on $\mathbf{R}^n$. Then there is a function $f_X = \frac{d\mu_X}{d\lambda} : \mathbf{R}^n \to \mathbf{R}$ called the density function of $X$ (or Radon-Nikodym derivative in more general measure theory). The connection between $f_X$ and $\mu_X$ is that for every measurable set $A \subseteq S$,

$$\mu_X(A) = \int_{\mathbf{R}^n} \pmb 1_A f_X d\lambda = \int_{\mathbf{R}^n} \pmb 1_A \frac{d\mu_X}{d\lambda} d\lambda = \int_{\mathbf{R}^n} \pmb 1_A d\mu_X.$$

Likewise, if $\mu_X$ is absolutely continuous with respect to the counting measure on $\mathbf{Z}^n$, the Radon-Nikodym derivative now is called the mass function of $\mu_X$ and again we have the integral identity above. Except now, the integral is over a countable set. So we can instead write it as

$$\mu_X(A) = \int_{\mathbf{R}^n} \pmb 1_A d\mu_X = \sum_{x \in A} \mu_X(\{x\}) = \sum_{x \in A} f_X(x).$$

The counting measure on some discrete set $S$, like $\mathbb{Z}$ or $\mathbf{N}$ is $\lambda = \sum_{k \in S} \delta_k$ where

$$ \delta_k(A) = \begin{cases} 1 & \text{if } k \in A, \\ 0 & \text{if not}.\end{cases}$$

With this decomposition, we see that for any subset $A$ of $\mathbf{R}^n$, we have $\lambda(A) = \sum_{k \in S} \delta_k(A) = \#(A \cap S)$. More generally, for the measure $\mu_X$, we have

$$\mu_X(A) = \sum_{k \in S} f_X(k) \delta_k(A).$$

Or simply "$\mu_X = \sum f_X(k) \delta_k$."