I am struggling with this problem from an introduction to Machine Learning Course at a my University that I never fully wrapped my head around. Here's the problem as it was written:
For any two random variables $X$, $Y$ the covariance is defined as $\text{Cov}(X, Y ) = E\left[\left(X − E[X]\right)\left(Y − E[Y ]\right)\right]$. You may assume $X$ and $Y$ take on a discrete values if you find that is easier to work with.
a. [$1$ point] If $E[Y | X = x] = x$ show that $$ \text{Cov}(X,Y) = E[(X-E[X])^2] $$ b. [1 point] If $X, Y$ are independent show that $$ \text{Cov}(X, Y ) = 0 $$
I am mainly struggling on part $A$. I think that I can see how to get from the proof of part $A$ to the proof of part $B$.
Any help would be much appreciated, I'll show things I've tried and where I get stuck, at times it's seemed like I might have proved that $E[X] = E[Y]$ at least but these proofs are dubious at best.
Attempt 1:
$E[Y|X=x]=x$ is essentially the same (I think) as $E[Y|X]=X$ since for any value of $x$ we are saying that the expected value of $Y$ takes on $x$.
so: $$ E[Y|X]=X $$ taking the expected value of both sides: $$ E[E[Y|X]]=E[X] $$ Here I want to say that $E[E[Y|X]]$ is the same as $E[Y]$ because of the way that the expected value sums over all the $Y$'s. Pretty sure that's not the right thing to do.
Attempt 2:
Let $Z$ be a random variable defined by: $$ Z=Y-E[Y|X] $$ taking the expected value: $$ E[Z]=E[Y]-E[E[Y|X]] $$ again if the $E[E[Y|X]] = E[Y]$ I could prove that the $E[Z] = 0$ and get around to proving that $E[Y] = E[X]$, this is essentially a round about way of doing the above.
I'm pretty stuck for ideas at this point and any help is appreciated!
I hope that this works (I'll assume that X and Y take on a discrete values):
a. $\text{If } \mathbb{E}[Y|X=x]=x, \text{ then } \text{cov}(X,Y)= \mathbb{E}[(X-\mathbb{E}(X))^2]$
Solution:
First, lets analyze:
$$\text{cov}(X,Y) = \mathbb{E}[(X-\mathbb{E}[X])(Y-\mathbb{E}[Y])] = \mathbb{E}[XY - X\mathbb{E}[Y] -\mathbb{E}[X]Y + \mathbb{E}[X]\mathbb{E}[Y]] = \mathbb{E}[XY] - \mathbb{E}[X\mathbb{E}[Y]] - \mathbb{E}[Y\mathbb{E}[X]] + \mathbb{E}[\mathbb{E}[X]\mathbb{E}[Y]] = \mathbb{E}[XY] -2\mathbb{E}[X]\mathbb{E}[Y] + \mathbb{E}[X]\mathbb{E}[Y] = \mathbb{E}[XY] - \mathbb{E}[X]\mathbb{E}[Y]$$
Don't foget that, $$\mathbb{E}[Y|X=x_k] = x_k, \text{ for all } x_k \in R_X$$
Second, (I'll use the hypothesis on the second "="):
$$ \mathbb{E}[X] = \sum_{x_i \in R_X} x_i\mathbb{P}[X] = \sum_{x_i \in R_X} \mathbb{E}[Y|X=x_i]\mathbb{P}[X] = \mathbb{E}[\mathbb{E}[Y|X=x_i]] = \mathbb{E}[Y] $$
The last term is a result of theorem in Probability Theory.
Moving on:
$$ \mathbb{E}[XY] = \sum_{x_i \in R_X} \sum_{y_i \in R_Y} x_i y_i \mathbb{P}[X=x_i, Y=y_i] = \sum_{x_i \in R_X} x_i ( \sum_{y_i \in R_Y} y_i \mathbb{P}[X=x_i, Y=y_i]) = \sum_{x_i \in R_X}x_i ( \sum_{y_i \in R_Y} y_i \mathbb{P}[Y=y_i|X=x_i] \mathbb{P}[X=x_i]) = \sum_{x_i \in R_X}x_i \mathbb{P}[X=x_i] \sum_{y_i \in R_Y} y_i \mathbb{P}[Y=y_i|X=x_i] = \sum_{x_i \in R_X}x_i \mathbb{P}[X=x_i] \mathbb{E}[Y|X=x_i] = \sum_{x_i \in R_X}x_i \mathbb{P}[X=x_i] x_i = \sum_{x_i \in R_X}x_i^2 \mathbb{P}[X=x_i] = \mathbb{E}[X^2]$$
Finally, if we substitute on covariance:
$$\text{cov}(X,Y) = \mathbb{E}[XY] - \mathbb{E}[X]\mathbb{E}[Y] = \mathbb{E}[X^2] - \mathbb{E}[X]\mathbb{E}[X] = \mathbb{E}[X^2] - \mathbb{E}[X]^2 = \text{Var}[X] = \mathbb{E}[(X-\mathbb{E}[X])^2]$$
b. If X,Y are independet show that: $$\text{cov}(X,Y)=0$$
Solution:
$\text{If X,Y are independent, then } \mathbb{E}[XY]=\mathbb{E}[X]\mathbb{E}[Y]$
So,
$$\text{cov}(X,Y) = \mathbb{E}[XY] - \mathbb{E}[X]\mathbb{E}[Y] = \mathbb{E}[X]\mathbb{E}[Y] - \mathbb{E}[X]\mathbb{E}[Y] = 0$$