$X\sim\mathcal {Bin}(n,p)$
I want to evaluate $\sum\limits_{x=0}^n {^n\mathrm C_x} p^x(1-p)^{n-x}x\log(x)$. Is there any way to avoid the sum because my $n$ can be very large (around $10^6$)?
$X\sim\mathcal {Bin}(n,p)$
I want to evaluate $\sum\limits_{x=0}^n {^n\mathrm C_x} p^x(1-p)^{n-x}x\log(x)$. Is there any way to avoid the sum because my $n$ can be very large (around $10^6$)?
On
Under the assumption* that if $X = 0$, $X \log X = 0$, the usual approach is to use the delta method with the choice $g(x) = x \log x$, so that $$\operatorname{E}[g(X)] \approx g(\operatorname{E}[X]) + \frac{1}{2}g''(\operatorname{E}[X]) \operatorname{Var}[X].$$ This gives in your case $$\operatorname{E}[X \log X] \approx np \log np + \frac{np(1-p)}{2np} = np \log np + \frac{1-p}{2}.$$ For very large $np$, the term $(1-p)/2$ is negligible. But if $p$ is as small as $n$ is large, the second term should not be ignored: for example, $n = 1000$ and $p = 0.002$ gives the exact expectation $1.955009\ldots$; whereas $np \log np = 2 \log 2 \approx 1.38629$, and $np \log np + (1-p)/2 \approx 1.885294$. Higher order approximations are of course possible.
*It is important to state this caveat: otherwise, strictly speaking the expectation is not defined. However, because the limiting behavior of $g(x) = x \log x$ as $x \to 0^+$ is well-behaved, it is reasonable to in fact define the function $g$ as $$g(x) = \begin{cases} x \log x, & x > 0 \\ 0, & x = 0. \end{cases}$$
The distribution of a binomial is strongly concentrated around $np$, so under reasonable assumptions, the answer would be roughly $np \log (np)$. For example, when $n=1000$ and $p=1/2$, the true answer is 3107.55, while this approximation gives 3107.30.