Hessian Calculation in Higher Dimensions

81 Views Asked by At

In this paper https://arxiv.org/pdf/2003.00307.pdf, section $3$, author has written the hessian in compact form which is not clear to me.

He considers an over-parametrisation system $\mathcal{F}(\mathbf{w}):\mathbb{R}^m\rightarrow\mathbb{R}^n$, where $m>n$, with the square loss function $\mathcal{L}(F(\mathbf{w}),\mathbf{y})=\frac{1}{2}\|\mathcal{F}(\mathbf{w})-\mathbf{y}\|^2$. Then the hessian matrix of the loss function takes the form $$H_{\mathcal{L}}(\mathbf{w})=D\mathcal{F}(\mathbf{w})^T\frac{\partial^2\mathcal{L}}{\partial\mathcal{F}^2}D\mathcal{F}(\mathbf{w}) +\sum_{i=1}^n(\mathcal{F}(\mathbf{w})-\mathbf{y})_iH_{\mathcal{F}_i}(\mathbf{w})$$.

My Approach: I tried to apply chain rule as follows: $$\frac{\partial\mathcal{L}(F(\mathbf{w}),\mathbf{y})}{\partial\mathbf{w}}=\frac{\partial\mathcal{L}(F(\mathbf{w}),\mathbf{y})}{\partial\mathcal{F}(\mathbf{w})}\times\frac{\partial\mathcal{F}(\mathbf{w})}{\partial\mathbf{w}}\\ =(\mathcal{F}(\mathbf{w})-\mathbf{y})D\mathcal{F}(\mathbf{w})$$ Again, we differentiate with $\mathbf{w}$ as follows: $$\frac{\partial^2\mathcal{L}(F(\mathbf{w}),\mathbf{y})}{\partial\mathbf{w}^2}=\frac{\partial(\mathcal{F}(\mathbf{w})-\mathbf{y})}{\partial\mathbf{w}}D\mathcal{F}(\mathbf{w})+(\mathcal{F}(\mathbf{w})-\mathbf{y})D^2\mathcal{F}(\mathbf{w})$$

please explain me in detail if possible.

1

There are 1 best solutions below

0
On

Use the total derivative rule $$\frac{\partial\mathcal L(\mathcal F)}{\partial w_m}=\sum_i^n\frac{\partial\mathcal L}{\partial\mathcal F_i}\frac{\partial\mathcal F_i}{\partial w_m}$$ $$\frac{\partial^2\mathcal L(\mathcal F)}{\partial w_k\partial w_m}=\sum_i^n\frac{\partial}{\partial w_k}\bigg(\frac{\partial\mathcal L}{\partial\mathcal F_i}\bigg)\frac{\partial\mathcal F_i}{\partial w_m}+\sum_i^n\frac{\partial\mathcal L}{\partial\mathcal F_i}\frac{\partial^2\mathcal F_i}{\partial w_k\partial w_m}=$$ $$=\sum_i^n\sum_j^n\frac{\partial\mathcal F_j}{\partial w_k}\frac{\partial^2\mathcal L}{\partial\mathcal F_j\partial\mathcal F_i}\frac{\partial\mathcal F_i}{\partial w_m}+\sum_i^n\frac{\partial\mathcal L}{\partial\mathcal F_i}\frac{\partial^2\mathcal F_i}{\partial w_k\partial w_m}$$ Passing from index to matrix notation $$\bigg(\frac{\partial^2\mathcal L(\mathcal F)}{\partial w_k\partial w_m}\bigg)_{k,m}=H_{\mathcal L}(w)$$ $$\bigg(\frac{\partial^2\mathcal F_i}{\partial w_k\partial w_m}\bigg)_{k,m}=H_{\mathcal F_i}(w)$$ $$\bigg(\sum_i^n\sum_j^n\frac{\partial\mathcal F_j}{\partial w_k}\frac{\partial^2\mathcal L}{\partial\mathcal F_j\partial\mathcal F_i}\frac{\partial\mathcal F_i}{\partial w_m}\bigg)_{k,m}=D\mathcal F(w)^T\frac{\partial^2\mathcal L}{\partial\mathcal F^2}D\mathcal F(w)$$ We get exactly the formula in the question.