Any assured method for finding the derivative of p Euclidian norms?

51 Views Asked by At

Assuming both $y$ and $\beta$ are $p \times 1$ vectors, and $W$ is a $p \times p - 2$ matrix, how would one take the first derivative of this: $L(\beta) = || y - \beta||^2_2 + || W\beta||^2_2$.

I'm aware that this essentially means $\frac{\partial L}{\partial \beta} \sum_{i = 1}^p (y_i - \beta_i)^2 + \frac{\partial L}{\partial \beta} || W\beta||^2_2$.

After looking online, I found that the identity of $||A\mathbf{x}||^2_2 = 2A^TA\mathbf{x}$. While this is suffice for my proofs, I fail to understand why this is the case. I do know that this will apply to $|| W\beta||^2_2$.

So my question is how to differentiate $\frac{\partial L}{\partial \beta} \sum_{i = 1}^p (y_i - \beta_i)^2$ in a very straightforward and logical way, with fundamentals that I can apply to other $p$-value norms at any time without worrying about identities and such.

Many thanks, this has been driving me utterly mad.

3

There are 3 best solutions below

2
On BEST ANSWER

The symbol $\frac{\partial L}{\partial \beta}$ will be a vector with entries $\frac{\partial L}{\partial \beta_k}$, in this case for $k=1,\ldots, p$. We can calculate $$\frac{\partial }{\partial \beta_k}\sum_{i=1}^p (y_i-\beta_i)^2 = \frac{\partial}{\partial \beta_k} (y_k-\beta_k)^2 = -2(y_k-\beta_k).$$ This is because $\frac{\partial}{\partial \beta_k}$ means to take the derivative with respect to the variable $\beta_k$ while treating all other $\beta_i$s as constants. And each $y_i$ is truly a constant. So only the $i=k$ term varies with $\beta_k$.

For $\|W\beta\|_2^2$, we need to first note that if $v$ is a vector with entries $v_i$, $i=1,\ldots, n$, then $\|v\|_2^2=\sum_{i=1}^n v_i^2$. We then note that the $i^{th}$ entry of $W\beta$ is $\sum_{j=1}^p w_{i,j}\beta_j$, so $$\|W\beta\|_2^2=\sum_{i=1}^n \bigl(\sum_{j=1}^p w_{i,j}\beta_j\bigr)^2.$$ Here, $W_{i,j}$ is the row $i$, column $j$ entry of $W$. We need to take $\frac{\partial}{\partial \beta_k}$ of this, which is $$\sum_{i=1}^n 2\bigl(\sum_{j=1}^p w_{i,j}\beta_j\bigr) \cdot \frac{\partial}{\partial \beta_k}\Bigl[\sum_{j=1}^p w_{i,j}\beta_j\Bigr]=\sum_{i=1}^n 2\bigl(\sum_{j=1}^p w_{i,j}\beta_j\bigr) \cdot w_{i,k}.$$ Here we just use power rule (so the exponent of $2$ comes down as a factor) and chain rule (we multiply by the derivative of the function inside the square). Again, to calculate $\frac{\partial}{\partial \beta_k}\sum_{j=1}^p w_{i,j}\beta_j$, we note that only the $j=k$ term survives.

One can check directly by playing with lots of indices that $$\frac{\partial}{\partial \beta_k}\|W\beta\|_2^2= 2\sum_{i=1}^n \sum_{j=1}^p w_{i,p}w_{i,j}\beta_j$$ is the row $k$ entry of $2W^TW\beta$, which is the identity you wrote.

0
On

Essentially coming back to the definition of the differential $$\|A(y_0+h)\|^2=\|Ay_0\|^2+2\langle Ay_0,Ah\rangle+\|Ah\|^2$$$$= \|Ay_0\|^2+2\langle A^TAy_0,h\rangle +o(h)$$ Thus the differential of $y\mapsto \|Ay\|^2$ calculated at the point $y_0$is the linear form $h\mapsto 2\langle A^TAy_0,h\rangle$ or, equivalently, the gradient of $y\mapsto \|Ay\|^2$ calculated at the point $y_0$ is $2A^TAy_0.$

0
On

The Frobenius $(:)$ product is extremely useful in Matrix Calculus and has these properties $$\eqalign{ \def\b{\beta} \def\p{\partial} \def\LR#1{\left(#1\right)} A:B &= \sum_{i=1}^m\sum_{i=1}^n A_{ij}B_{ij} \;=\; \operatorname{Trace}\LR{A^TB} \\ B:B &= \|B\|_F^2 \\ A:B &= B:A \;=\; B^T:A^T \\ }$$ Calculating the differential of the Frobenius norm is easy using this product $$\eqalign{ d\,\|B\|_F^2 &= d\LR{B:B} \\&= B:dB + dB:B \\&= 2B:dB \\ }$$ Use this result to calculate the differential of the $L$ function and recover its gradient $$\eqalign{ L &= \LR{\b-y}:\LR{\b-y} \;+\; \LR{W\b}:\LR{W\b} \\ dL &= 2\LR{\b-y}:{d\b} \;+\; 2\LR{W\b}:\LR{W\,d\b} \\ &= 2\LR{\b-y}:{d\b} \;+\; 2\LR{W^TW\b}:d\b \\ \frac{\p L}{\p\b} &= 2\b-2y \;+\; 2W^TW\b \\ }$$