Want to compute $E(\sum_{i = 1}^n X_i^2 | \sum_{i = 1}^n X_i = t)$ for a random sample

73 Views Asked by At

I have $X_1, X_2, \dots, X_n$ an iid sample. My idea was to use that $\sum_{i = 1}^n X_i^2 = (\sum_{i = 1}^n X_i)^2 - \sum_{i\neq j} X_iX_j$.

Thus $$E(\sum_{i = 1}^n X_i^2 | \sum_{i = 1}^n X_i = t) = t^2 - E( \sum_{i\neq j} X_iX_j| \sum_{i = 1}^n X_i = t) $$ Now since the $X_i$ are iid \begin{align*}E(\sum_{i = 1}^n X_i^2 | \sum_{i = 1}^n X_i = t) &= t^2 - \sum_{i\neq j}(E( X_1| \sum_{i = 1}^n X_i = t))^2\\ &= t^2 - (n-1)(E( \sum_{i = 1}^n X_i| \sum_{i = 1}^n X_i = t))^2\\ &= t^2 - (n-1)t^2\\ &= (2-n)t^2 \end{align*}

Is this correct? I feel like I am going wrong somewhere.

1

There are 1 best solutions below

1
On BEST ANSWER

You are incorrect. A simple example: Let $X_1,X_2$ be two fair coin-tosses, i.e. $X_1,X_2$ are iid. $\text{Bern}(0.5)$-distributed. Their conditional expectation given the event $X_1+X_2=1$ is given by \begin{align*}\textbf{E}[X_1^2+X_2^2|X_1+X_2=1]&=\textbf{E}[(X_1+X_2)^2-2X_1X_2|X_1+X_2=1]\\&=\textbf{E}[(X_1+X_2)^2|X_1+X_2=1]+\underbrace{\textbf{E}[-2X_1X_2|X_1+X_2=1]}_{=0}=1.\end{align*} In your formula, $n=2$, so the result would be $0$.

Why are you incorrect? As has been pointed out, $\textbf{E}[X_1X_2|X_1+X_2=1]=\textbf{E}[X_1|X_1+X_2=1]\textbf{E}[X_2|X_1+X_2=1]$ does not hold in the example, even though $X_1$ and $X_2$ are independent.

I think what you have done is you have confused $\textbf{independence}$ with $\textbf{conditional independence}$ given an event $A$. Even though $X_1\text{ and }X_2$ are independent, they are not conditionally independent given $X_1+X_2=t\in\mathbb{R}$. They can't be (unless one of them is a constant): If you know one of the variables, you immediatly know the other (since the sum is given). The same holds true in the general case: If you know $n-1$ of the variables, you immediatly know the one that is left.

Is your calculation ever correct (in the non-trivial case)? Possibly. The calculation might work out "on accident", if the values just happen to overlap. Your justification however only really makes sense if $X_1,X_2,\dots,X_n$ are linearly uncorrelated given the event $A:=\{\sum X_i=t\}$. This is a very different (allthough $\textbf{not}$ stronger) condition than independence, which rarely holds. If someone is more interested on (conditional) independence I recommend the following lecture:

https://www.youtube.com/watch?v=JzDvVgNDxo8&list=PL2SOU6wwxB0uwwH80KTQ6ht66KWxbzTIo&index=5