Theoretical Machine Learning: How to calculate the expected risk of a model with unknown distribution $\hat{h}$?

42 Views Asked by At

If we have fixed, deterministic feature vectors $x_1, x_2, ..., x_n \in \mathbb{R^d}$ with an unknown model parameter $\theta^*$ and the error $z$ with $N(0,\sigma^2)$. For the feature vector $x_i$ applies $N(0,\sigma^2)$. Then $y_i$ is calculated like this:

$$y_i = \langle x_i\,,\theta^*\rangle + z_i$$

How do we come from the second last to the last step in the following equations?

$$R(\hat h) = \mathbb{E_{x,y}[(\hat h(x) - y)^2]} = \mathbb{E_{x,y}[(\langle x,\hat \theta \rangle - \langle x,\theta^* \rangle + z)^2]} = {||\hat \theta - \theta^*||}^2_2 + \sigma^2$$

My approach was like this:

\begin{array}{l} R(\hat{h})=\mathbb{E}_{x, y}\left[(\hat{h}(x)-y)^{2}\right]=\mathbb{E}_{x, y}\left[\left(\langle x, \hat{\theta}\rangle-\left\langle x, \theta^{*}\right\rangle+z\right)^{2}\right]\\ =\mathbb{E}_{x,y}\left[\left(\left\|x \cdot \hat{\theta}\right\|_{2}-\|x \cdot {\theta^*}\|_{2}+z\right)^{2}\right]\\ \text { with Var }(\hat{h})=\mathbb{E}\left[\hat{h}^{2}\right]-\mathbb{E}[\hat{h}]^{2}\\ \Leftrightarrow \sigma^{2}=\mathbb{E}\left[\left(x \cdot \hat{\theta}\|_{2}-\left\|x \cdot \theta^*\right\|_{2}+z\right)^{2}\right]-\mathbb{E}\left[\left(\| x \cdot\hat{\theta}\|_{2}-\left\|x \cdot \theta^{*}\right\|_{2}+z\right)\right]^{2}\\ \Leftrightarrow R(\hat{h})=\mathbb{E}\left[\left(\|x \cdot\hat{\theta}\|_{2}-\|\left. x \cdot \theta^{*}\right\|_{2}+z\right)^{2}\right]=\mathbb{E}\left[\left(\|x \cdot \hat{\theta}\|_{2}-\left\| x \cdot\theta^* \|_{2}+z\right)\right)\right]^{2}+\sigma^{2}\\ =(\mathbb{E}\left[\|x \cdot \hat{\theta}\|_{2}\right]-\mathbb{E}\left[\left\|x\cdot \theta^{*}\right\|_{2}\right]+\underbrace{\mathbb{E}[z]}_{=0})^{2}+\sigma^{2}\\ =\mathbb{E}\left[\left\|x \cdot \hat{\theta}\right\|_{2}\right]^{2}+\mathbb{E}\left[\left\|x \cdot \theta^{*}\right\|_{2}\right]^{2} - 2\mathbb{E}\left[\left\|x \cdot \hat{\theta}\right\|_{2}\right] \mathbb{E}\left[\left\|x \cdot \theta^{*}\right\|_{2}\right]\\ =\left(\mathbb{E}\left[\|x \cdot \hat{\theta}\|_{2}-\left\|x \cdot {\theta^*}\right\|_{2}\right]\right)^{2}+\sigma^{2} \end{array}