Statistical error in the approximation-estimation tradeoff

74 Views Asked by At

Show that $$E(g_\tau ^G(X)-g^* (X))^2 = E(X^T \hat{\beta}-X^T\beta^G)^2+E(X^T\beta^G-g^*(X))^2$$

where $g_\tau ^G(X) = X^T \hat{\beta}$ and $g^G(X) = X^T \beta^G$ where G is a class of linear functions, $\beta$ is a parameter vector.

What I've done $$E(g_\tau ^G(X)-g^* (X))^2 = E(g_\tau ^G(X) - g^G(X) + g^G(X) - g^* (X))^2 = E(X^T \hat{\beta} - X^T\beta^G + X^T\beta^G - g^* (X))^2=E(X^T \hat{\beta}-X^T\beta^G)^2+E(X^T\beta^G-g^*(X))^2+2E[(X^T\hat{\beta} - X^T\beta^G)(X^T\beta ^G - g^*(X))]$$

What's left to show $$2E[(X^T\hat{\beta} - X^T\beta^G)(X^T\beta ^G - g^*(X))]=0$$ would solve the problem

Attempt but stuck $$2E[(X^T\hat{\beta} - X^T\beta^G)(X^T\beta ^G - g^*(X))]=2E[X^T\hat{\beta}X^T\beta ^G-X^T\hat{\beta}g^*(X) - X^T\beta^GX^T \beta^G+X^T\beta^Gg^*(X)]$$

From here I'm not sure how to go on. This might be the wrong way to solve the question.

Please comment if something is unclear. I'm just trying to learn!

1

There are 1 best solutions below

0
On

I’ve been wondering about this one, too (I assume we’re reading the same book). After doing some searching online, I think the way to think about this is to consider the expectation as over two variables: a training set $\tau$ and a test point $(X,Y)$. So, $\mathbb{E}_{(X,Y),\tau}[ \cdots ] = \mathbb{E}_{(X,Y)}[\mathbb{E}_{\tau}[ \cdots ]]$. You can do this because the test point is independent of the training set, and so you can swap the inner and outer integrals. Then, you want to be able to say something like $\mathbb{E}_{\tau}[\hat{\beta}] = \beta^{\mathcal{G}}$, which would kill the inner integral, since everything in it but the $\hat{\beta}$ is fixed.

Now I just need to justify that to myself, which, you’d think, would have to do with the fact that we’re dealing with linear functions, since we haven’t really used that anywhere else.

Update:

Ok, I think I know what to do, based on more online reading. Note that $\hat{\beta}$ and $\beta^{\mathcal{G}}$ do not depend on the test point $(X,Y)$. So you can basically write $\mathbb{E}[(X^T \hat{\beta} - X^T \beta^{\mathcal{G}})(X^T \beta^{\mathcal{G}} - g^{*}(X))] = \mathbb{E}[(X^T \hat{\beta} - X^T \beta^{\mathcal{G}})(X^T \beta^{\mathcal{G}} - Y)]$. You can do this (I think), by noticing that we basically have $\mathbb{E}[c \cdot r(X)\cdot E[Y|X]]$, where $c$ represents constants and $r(X)$ represents stuff only depending on $X$. This is then $\mathbb{E}[\mathbb{E}[c \cdot r(X) \cdot Y| X]] = \mathbb{E}[c\cdot r(X)\cdot Y]$. Anyway, we then have $\mathbb{E}[(\hat{\beta}-\beta^{\mathcal{G}})^T X (X^T \beta^{\mathcal{G}}-Y)]$. The normal equations that define $\beta^{\mathcal{G}}$ give that $\mathbb{E}_{(X,Y)}[X(X^T\beta^{\mathcal{G}}-Y)] = 0$. So now use independence to break $\mathbb{E}_{\tau,(X,Y)}[\cdots] = \mathbb{E}_\tau[(\hat{\beta}-\beta^{\mathcal{G}})\mathbb{E}_{(X,Y)}[\cdots]]$, and the inner integral will go to 0.