I am trying to compute the hessian from a linear mse (mean square error) function using the index notation. I would be glad, if you could check my result and tell me if the way that I use the index notation is correct ?
The linear MSE: $$L(w) = \frac{1}{2N} e^T e$$where $e=(y-Xw)$,
$y \in R^{Nx1} (vector)$
$X \in R^{NxD} (matrix)$
$w \in R^{Dx1} (vector)$
Now the aim is to calculate the Hessin: $\frac{\partial L(w)}{\partial^2 w}$
I proceed as follows:
$\frac{\partial L(w)}{\partial w_i w_j}=\frac{1}{\partial w_i \partial w_j} [\frac{1}{2N}(y_i-x_{ij} w_j)^2]$
$=\frac{1}{\partial w_i}\frac{1}{\partial w_j} [\frac{1}{2N}(y_i-x_{ij} w_j)^2]$
$=\frac{1}{\partial w_i}[\frac{1}{2N}\frac{1}{\partial w_j} (y_i-x_{ij} w_j)^2]$
$=\frac{1}{\partial w_i}[\frac{1}{N}(y_i-x_{ij} w_j)\frac{1}{\partial w_j} (y_i-x_{ij} w_j)]$
$=\frac{1}{\partial w_i}[\frac{1}{N}(y_i-x_{ij} w_j)\frac{-x_{ij} w_j}{\partial w_j}]$
$=\frac{1}{\partial w_i}[\frac{1}{N}(y_i-x_{ij} w_j) (-x_{ij})]$
$=\frac{1}{N}\frac{1}{\partial w_i}[(y_i-x_{ij} w_j) (-x_{ij})]$
$=\frac{1}{N}\frac{-x_{ij} w_j}{\partial w_i}(-x_{ij})]$
$=\frac{1}{N}(-x_{ij}\delta_{ji})(-x_{ij})]$
$=\frac{1}{N}(-x_{ji})(-x_{ij})]$
If I now convert it back to matrix notation the result would be:
$$\frac{\partial L(w)}{\partial^2 w} = \frac{1}{N} X^T X $$
Is it correct how I used the index notation ?
For ease of typing, I'll represent the differential operator $\frac{\partial}{\partial w_k}$ by $d_k$
The known relationships are $$\eqalign{ e_i &= X_{ij}w_j - y_i \cr d_ke_i &= X_{ij}\,d_kw_j =X_{ij}\,\delta_{jk} = X_{ik} \cr }$$ Use this to find the derivatives of the objective function $$\eqalign{ L &= \frac{1}{2N} e_ie_i \cr d_kL &= \frac{1}{N} e_i\,d_ke_i = \frac{1}{N} e_iX_{ik} \cr d_md_kL &= \frac{1}{N} X_{ik}\,d_me_i = \frac{1}{N} X_{ik}X_{im} \cr \cr }$$