Finding Hessian of linear MSE using Index Notatio

2.4k Views Asked by At

I am trying to compute the hessian from a linear mse (mean square error) function using the index notation. I would be glad, if you could check my result and tell me if the way that I use the index notation is correct ?

The linear MSE: $$L(w) = \frac{1}{2N} e^T e$$where $e=(y-Xw)$,

$y \in R^{Nx1} (vector)$

$X \in R^{NxD} (matrix)$

$w \in R^{Dx1} (vector)$

Now the aim is to calculate the Hessin: $\frac{\partial L(w)}{\partial^2 w}$

I proceed as follows:

$\frac{\partial L(w)}{\partial w_i w_j}=\frac{1}{\partial w_i \partial w_j} [\frac{1}{2N}(y_i-x_{ij} w_j)^2]$

$=\frac{1}{\partial w_i}\frac{1}{\partial w_j} [\frac{1}{2N}(y_i-x_{ij} w_j)^2]$

$=\frac{1}{\partial w_i}[\frac{1}{2N}\frac{1}{\partial w_j} (y_i-x_{ij} w_j)^2]$

$=\frac{1}{\partial w_i}[\frac{1}{N}(y_i-x_{ij} w_j)\frac{1}{\partial w_j} (y_i-x_{ij} w_j)]$

$=\frac{1}{\partial w_i}[\frac{1}{N}(y_i-x_{ij} w_j)\frac{-x_{ij} w_j}{\partial w_j}]$

$=\frac{1}{\partial w_i}[\frac{1}{N}(y_i-x_{ij} w_j) (-x_{ij})]$

$=\frac{1}{N}\frac{1}{\partial w_i}[(y_i-x_{ij} w_j) (-x_{ij})]$

$=\frac{1}{N}\frac{-x_{ij} w_j}{\partial w_i}(-x_{ij})]$

$=\frac{1}{N}(-x_{ij}\delta_{ji})(-x_{ij})]$

$=\frac{1}{N}(-x_{ji})(-x_{ij})]$

If I now convert it back to matrix notation the result would be:

$$\frac{\partial L(w)}{\partial^2 w} = \frac{1}{N} X^T X $$

Is it correct how I used the index notation ?

2

There are 2 best solutions below

2
On BEST ANSWER

For ease of typing, I'll represent the differential operator $\frac{\partial}{\partial w_k}$ by $d_k$

The known relationships are $$\eqalign{ e_i &= X_{ij}w_j - y_i \cr d_ke_i &= X_{ij}\,d_kw_j =X_{ij}\,\delta_{jk} = X_{ik} \cr }$$ Use this to find the derivatives of the objective function $$\eqalign{ L &= \frac{1}{2N} e_ie_i \cr d_kL &= \frac{1}{N} e_i\,d_ke_i = \frac{1}{N} e_iX_{ik} \cr d_md_kL &= \frac{1}{N} X_{ik}\,d_me_i = \frac{1}{N} X_{ik}X_{im} \cr \cr }$$

0
On

Matrix notations: $$ \frac{\partial}{\partial w} (Y - Xw)'(Y-Xw) = 2X'(Y-Xw). $$ Using indices you are taking derivative of the sum of squares w.r.t. each of the $w_j$, i.e., $$ \frac{\partial}{\partial w_j} ( \sum_{i=1}^N(y_i - \sum_{j=1}^Dx_{ij} w_j))^2= -2 \sum_{i=1}^N(y_i - \sum_{j=1}^Dx_{ij} w_j)x_{ij}. $$ Back to the matrix notation for the second derivative (the Hessian matrix), $$ \frac{\partial}{\partial w w'} (Y - Xw)'(Y-Xw) = \frac{\partial}{\partial w'} 2X'(Y-Xw) = 2X'X. $$ Where using index notations, you are taking derivative w.r.t. to each $w_j$, $j=1,..., D$ , from each of the aforementioned $D$ equations, i.e., $$ \frac{\partial}{\partial w_j^2} ( \sum_{i=1}^N(y_i - \sum_{j=1}^Dx_{ij} w_j))^2 = \frac{\partial}{\partial w_j}(-2 \sum_{i=1}^N(y_i - \sum_{j=1}^Dx_{ij} w_j)x_{ij}) = 2\sum_{i=1}^Nx_{ij}^2, $$ and for the cross terms, $$ \frac{\partial}{\partial w_jw_k} ( \sum_{i=1}^N(y_i - \sum_{j=1}^Dx_{ij} w_j))^2 = \frac{\partial}{\partial w_k}(-2 \sum_{i=1}^N(y_i - \sum_{j=1}^Dx_{ij} w_j)x_{ij}) = 2\sum_{i=1}^Nx_{ij}x_{ik}. $$ Where the last expression is the $jk$-th (and the $kj$-th) entry of $2X'X$ such that $j\neq k$. And the equation before represents the entries on the main diagonal of $2X'X$.