closed form expression for training error in ridge regression

11 Views Asked by At

I'm reading the paper a random matrix approach to neural network but i'm stucked at page 5.

They start from

  • $\Sigma \in \mathbb{R}^{n\times T}$ where $T$ is the number of data points and $n$ is the dimension of each data point.
  • $Y \in \mathbb{R}^{1\times T}$ are the values to be predicted.

NOTE: more frequently the first dimension is the number of data points and the second is the dimension where the data points live, but in this paper they did the opposite.

Then, they define

  • $Q = \Bigr(\frac{1}{T}\Sigma^T\Sigma+\gamma I_T\Bigr)^{-1}$

So the vector of parameters that solves the ridge regression problem is

  • $\beta = \frac{1}{T}\Sigma \Bigr(\frac{1}{T}\Sigma^T\Sigma+\gamma I_T\Bigr)^{-1}Y^T = \frac{1}{T}\Sigma QY^T $

Finally, before equation (1) at page 5, they say "notably, the mean-square error $E_{train}$ on the training dataset is given by"

  • $E_{train} = \frac{1}{T}\Bigr|\Bigr|Y^T-\Sigma^T\beta\Bigr|\Bigr|_{2}^{2} = \frac{\gamma^2}{T}trY^TYQ^2$

but I don't understand how to derive this result. From their words, it seems that this is a well-known result, but I can't find anything about it on the internet, so I was wondering if any of you could explain how to prove this equivalence or could tell me where to look to see a proof.