Show that $E(|S_n-np|) = 2vq b(v; n, p) $.

105 Views Asked by At

(Feller Vol.1, P.241, Q.35) Let $S_n$ be the number of successes in $n$ Bernoulli trials. Prove $$E(|S_n-np|) = 2vq b(v; n, p) $$ where $v$ is the integer such that $np < v \le np+1$ and $b(v; n,p)$ is a binomial distribution with $v$ successes of $n$ trials. Hint: The left side $= \sum_{k=0}^{v-1} (np - k) \frac{n}{k} p^k q^{n-k}$.

My attempt: I found that $P(|S_n - np| = j)= b(np +j ; n, p)$ if $S_n \ge np$, $b(np-j; n,p)$ if $S_n < np$. Therefore, $$E(|S_n-np|)= \sum_{k=0}^{v-1} (np-k)b(k;n,p) + \sum_{k=v}^{n}(k-np)b(k; n,p).$$

I am stuck here, and don't know how to proceed. I would appreciate if you give some help.

2

There are 2 best solutions below

3
On BEST ANSWER

I'd be tempted to take this hint and see how I can simplify it first. And by simplify I mean get rid of the summation. So we have \begin{align*} &\sum_{k=0}^{v-1}(np-k)\left(\begin{array}{cc} n \\ k\end{array}\right)p^kq^{n-k} \\ =&np\sum_{k=0}^{v-1}\left(\begin{array}{cc} n \\ k\end{array}\right)p^kq^{n-k}-n\sum_{k=1}^{v-1}\left(\begin{array}{cc} n-1 \\ k-1\end{array}\right)p^kq^{n-k} \end{align*} using the identity $\left(\begin{array}{cc} n \\ k\end{array}\right)=\frac{n}{k}\left(\begin{array}{cc} n-1 \\ k-1\end{array}\right)$ which is easy to prove, and which should often be in the back of your head when you have these combination expressions multiplied by $n$ or $k$. I replace $k=0$ with $k=1$ in the second sum also as the substitution is only valid when $k\geq 1$ and the $k=0$ term contributed nothing to the sum anyway.

Now we don't know how to sum any of these really, unless they refer to an expectation or probability. If I'd kept the $k$ in the second sum it would look more like an expectation but as the sum is partial it will never be one, which is what motivated me to replace it with $n$ which can be removed from the sum. It's easy to see that the first sum is $\mathbb{P}[S_n\leq v-1]$, it would be nice if we could make the second sum look similar. It would have to be a probability related to $S_{n-1}$; see that \begin{align*} \sum_{k=1}^{v-1}\left(\begin{array}{cc} n-1 \\ k-1\end{array}\right)p^kq^{n-k}&=\sum_{l=0}^{v-2}\left(\begin{array}{cc} n-1 \\ l\end{array}\right)p^{l+1}q^{n-(l+1)} \\ &=p\sum_{l=0}^{v-2}\left(\begin{array}{cc} n-1 \\ l\end{array}\right)p^{l}q^{n-1-l} \\ &=p\mathbb{P}[S_{n-1}\leq v-2]. \end{align*} Then we have \begin{align*} np\sum_{k=0}^{v-1}\left(\begin{array}{cc} n \\ k\end{array}\right)p^kq^{n-k}-n\sum_{k=1}^{v-1}\left(\begin{array}{cc} n-1 \\ k-1\end{array}\right)p^kq^{n-k} &=np\left(\mathbb{P}[S_n\leq v-1]-\mathbb{P}[S_{n-1}\leq v-2]\right). \end{align*} It's annoying that these probabilities are for different variables, but consider the following. If, instead of being independent variables, $S_{n-1}$ was the number of successes in $n-1$ trials and $S_n$ was the number of successes in $n$ trials, and these trials were from the $\textbf{same sequence}$, then by sketching a Venn diagram if necessary we can see that \begin{align*} \mathbb{P}[S_n\leq v-1]-\mathbb{P}[S_{n-1}\leq v-2]=&\mathbb{P}[S_{n-1}=v-1,\;n\text{-th trial is a failure}] \\ =&\left(\begin{array}{cc} n-1 \\ v-1\end{array}\right)p^{v-1}q^{n-1-(k-1)}q \end{align*} so that ultimately we get \begin{align*} \mathbb{E}[\vert S_n-np\vert]=np\left(\begin{array}{cc} n-1 \\ v-1\end{array}\right)p^{v-1}q^{n-1-(v-1)}q=vq\left(\begin{array}{cc} n \\ v\end{array}\right)p^vq^{n-v}=vqb(v;n,p), \end{align*} as required.

0
On

The hint given is not quite correct as mentioned by @PepeSilvia.

For any random variable $X$ with finite mean $\mu$, we have

\begin{align} \mathbb E[|X-\mu|]&=\mathbb E[|X-\mu|I_{X>\mu}]+\mathbb E[|X-\mu|I_{X\le\mu}] \\&=\mathbb E[(X-\mu )I_{X>\mu}]+\mathbb E[(\mu-X )I_{X\le\mu}] \tag{1} \end{align}

But $\mathbb E(X-\mu )=0$, so

$$0=\mathbb E[(X-\mu)I_{X>\mu}]+\mathbb E[(X-\mu)I_{X\le\mu}]$$

That is, $$\mathbb E[(X-\mu)I_{X>\mu}]=\mathbb E[(\mu-X)I_{X\le\mu}] \tag{2}$$

From $(1)$ and $(2)$ we see that

$$\mathbb E[|X-\mu|]=2\,\mathbb E[(X-\mu)I_{X>\mu}]=2\,\mathbb E[(\mu-X)I_{X\le\mu}]$$

So a direct calculation here would involve evaluating $$\mathbb E[|S_n-np|]=2\sum_{j=v}^n (j-np) f(j)=2\sum_{j=0}^{v-1} (np-j) f(j)\,,\tag{3}$$

where $f(j)=\binom{n}{j}p^jq^{n-j}$ and $v=\lfloor np \rfloor +1$.

But this can be done here without much work.

We have the relation

$$\frac{f(j+1)}{f(j)}=\left(\frac{n-j}{j+1}\right)\frac{p}{q} \quad,\, j=0,1,\ldots,n$$

That is, $$q(j+1)f(j+1)=npf(j)-pjf(j) \quad,\, j=0,1,\ldots,n$$

Therefore, $$q\sum_{j=v}^n (j+1)f(j+1)=np\sum_{j=v}^n f(j)-p\sum_{j=v}^n jf(j)$$

Noting that $f(n+1)=0$, this is same as $$q\sum_{j=v}^n jf(j)-q vf(v)=np\sum_{j=v}^n f(j)-p\sum_{j=v}^n jf(j)$$

That is, $$\sum_{j=v}^n jf(j)-np\sum_{j=v}^n f(j)=q vf(v) \tag{4}$$

So from $(3)$ and $(4)$ we end up with

$$\mathbb E[|S_n-np|]=2\left[\sum_{j=v}^n jf(j)-np\sum_{j=v}^n f(j)\right]=2vqf(v)$$

(Working with the other sum $\sum_{j=0}^{v-1} (np-j) f(j)$ also results in the same answer).

This can be seen as a general method for finding mean absolute deviation about mean (if it exists) of a non-negative discrete distribution whenever the probability mass function $f$ is such that $$\frac{f(j+1)}{f(j)}=\frac{\alpha+\beta j}{j+1}\quad,\,j=0,1,\ldots$$ for some constant $\alpha,\beta$.