Find E(X) and Var(X)

318 Views Asked by At

Javier writes an article containing 52 460 words. He plans to upload the article to his website, but he knows that this process sometimes introduces errors.

He assumes that for each word in the uploaded version of his article, the probability that it contains an error is 0.000 08. The number of words containing an error is denoted by X.

What is E(X) and Var(X), This is my working any help is much appreciated. Thank you. enter image description here

3

There are 3 best solutions below

2
On BEST ANSWER

You do not say what distribution you are using in your Question.

@Stacker (in now deleted post) modeled the number of of words with errors as $X \sim \mathsf{Binom}(n = 52\,460, p = 0.000\,08).$ with $E(X) = np = 52460(0.00008) = 4.1968$ and $V(X) = np(1-p) = 4.1968(1-0.00008) = 4.196464.$

In such problems with large $n$ and small $p,$ one often uses the Poisson distribution as a good approximation. If $X \sim \mathsf{Pois}(\lambda = 4.1968),$ then $E(X) = 4.1968$ and $Var(X) = 4.1968,$ which is not much different.

I'm am not clear about @silver's assumptions (my guess is binomial), but if I read the formula and intentions of that Answer correctly, I get $Var(X) = 1.1964,$ using R as a calculator--again, not much different.

(n^2 - n)*p^2 +n*p - (n*p)^2
[1] 4.196464

To three decimal places we have $E(X) - 4.197, Var(X) = 4.197$ for a Poisson model and $Var(X) = 4.196$ for a binomial model. Maybe you can look at the original context of your problem and figure out whether you are supposed to use a Poisson or a binomial model.

By either model you are not likely to see more than 10 mis-spelled words; the probability $P(X \le 10) = 0.9960 \approx 1$ either way.

pbinom(10, 52460, p = 0.00008)
[1] 0.9959554
ppois(10, 4.1968)
[1] 0.9959537
2
On

Let $n = 52460$ and $p = 0.00008$. If $X_i$ is the RV that fires if there is an error in word $i$, then $X = \sum_{i = 1}^n X_i$. Thus, $X^2 = \sum_{i,j \le n} X_iX_j = \sum_{\substack{i,j \le n \\ i \neq j}} X_i X_j + \sum_{\substack{i,j \le n \\ i = j}} X_i X_j$. It follows that

$$ \mathbb E(X^2) = \sum_{\substack{i,j \le n \\ i \neq j}} \mathbb E(X_i X_j) + \sum_{i \le n} \mathbb{E}(X_i^2) = (n^2 - n) p^2 + np$$

Then, use the formula $\mathbb V(X) = \mathbb E (X^2) - \mathbb E(X)^2$ as in your calculation.

0
On

You are not evaluating the expectation and variance for a multiple of a single random variable (eg $nX_1$), but rather for a series of independent and identically distributed random variables (eg $\sum_{i=1}^n X_i$ ).


Let $X=\sum_{i=1}^n X_i$ where $X_i=1$ if the $i^{th}$ word contains an error, or $0$ if it does not. $X$ is thus the count for words containing an error.

We assume sequence of $(X_i)_{i=1}^n$ are mutually independent and identically distributed, and define $p:=\mathsf P(X_1\,{=}\,1)$.

Since $X_1$ is Bernoulli distributed, then $\mathsf E(X_1)=p$ and $\mathsf{Var}(X_1) = p(1-p)$.

Thus:-

$\qquad\begin{align}\mathsf E(X) &= \mathop{\sf E}\left(\sum_{i=1}^n X_i\right)\\&=\sum_{i=1}^n \mathsf E(X_i) &&\text{Linearity of Expectation} \\ & = n\,\mathsf E(X_1) &&\text{Identically Distributed} \\& = np\\[3ex]\mathsf {Var}(X) &= \mathop{\sf Var}\left(\sum_{i=1}^n X_i\right)\\&=\mathop{\sf Cov}\left(\sum_{i=1}^n X_i,\sum_{j=1}^n X_j\right) \\ &= \sum_{i=1}^n\sum_{j=1}^n \mathsf {Cov}(X_i,X_j)&&\text{Bilinearity of Covariance}\\&=\sum_{i=1}^n\mathsf {Cov}(X_i,X_i) ~+~0&&\text{Independence}\\&=\sum_{i=1}^n\mathsf {Var}(X_i)\\ &= n\,\mathsf{Var}(X_1) &&\text{Identical Distributions}\\& = np(1-p)\end{align}$