Assuming both $y$ and $\beta$ are $p \times 1$ vectors, and $W$ is a $p \times p - 2$ matrix, how would one take the first derivative of this: $L(\beta) = || y - \beta||^2_2 + || W\beta||^2_2$.
I'm aware that this essentially means $\frac{\partial L}{\partial \beta} \sum_{i = 1}^p (y_i - \beta_i)^2 + \frac{\partial L}{\partial \beta} || W\beta||^2_2$.
After looking online, I found that the identity of $||A\mathbf{x}||^2_2 = 2A^TA\mathbf{x}$. While this is suffice for my proofs, I fail to understand why this is the case. I do know that this will apply to $|| W\beta||^2_2$.
So my question is how to differentiate $\frac{\partial L}{\partial \beta} \sum_{i = 1}^p (y_i - \beta_i)^2$ in a very straightforward and logical way, with fundamentals that I can apply to other $p$-value norms at any time without worrying about identities and such.
Many thanks, this has been driving me utterly mad.
The symbol $\frac{\partial L}{\partial \beta}$ will be a vector with entries $\frac{\partial L}{\partial \beta_k}$, in this case for $k=1,\ldots, p$. We can calculate $$\frac{\partial }{\partial \beta_k}\sum_{i=1}^p (y_i-\beta_i)^2 = \frac{\partial}{\partial \beta_k} (y_k-\beta_k)^2 = -2(y_k-\beta_k).$$ This is because $\frac{\partial}{\partial \beta_k}$ means to take the derivative with respect to the variable $\beta_k$ while treating all other $\beta_i$s as constants. And each $y_i$ is truly a constant. So only the $i=k$ term varies with $\beta_k$.
For $\|W\beta\|_2^2$, we need to first note that if $v$ is a vector with entries $v_i$, $i=1,\ldots, n$, then $\|v\|_2^2=\sum_{i=1}^n v_i^2$. We then note that the $i^{th}$ entry of $W\beta$ is $\sum_{j=1}^p w_{i,j}\beta_j$, so $$\|W\beta\|_2^2=\sum_{i=1}^n \bigl(\sum_{j=1}^p w_{i,j}\beta_j\bigr)^2.$$ Here, $W_{i,j}$ is the row $i$, column $j$ entry of $W$. We need to take $\frac{\partial}{\partial \beta_k}$ of this, which is $$\sum_{i=1}^n 2\bigl(\sum_{j=1}^p w_{i,j}\beta_j\bigr) \cdot \frac{\partial}{\partial \beta_k}\Bigl[\sum_{j=1}^p w_{i,j}\beta_j\Bigr]=\sum_{i=1}^n 2\bigl(\sum_{j=1}^p w_{i,j}\beta_j\bigr) \cdot w_{i,k}.$$ Here we just use power rule (so the exponent of $2$ comes down as a factor) and chain rule (we multiply by the derivative of the function inside the square). Again, to calculate $\frac{\partial}{\partial \beta_k}\sum_{j=1}^p w_{i,j}\beta_j$, we note that only the $j=k$ term survives.
One can check directly by playing with lots of indices that $$\frac{\partial}{\partial \beta_k}\|W\beta\|_2^2= 2\sum_{i=1}^n \sum_{j=1}^p w_{i,p}w_{i,j}\beta_j$$ is the row $k$ entry of $2W^TW\beta$, which is the identity you wrote.