Given the least-square loss function$$\boldsymbol{L}=||\boldsymbol{y}-\boldsymbol{\Phi\theta}||$$
where $\boldsymbol{\theta}\in R^{M\times 1}$ and $\boldsymbol{\Phi}\in R^{N\times M}$
Find $\displaystyle{\frac{\partial \boldsymbol{L}}{\partial \boldsymbol{\theta}}}$.
My attempt:
$$ \boldsymbol{L}=(\boldsymbol{y^T}-\boldsymbol{\theta^T\Phi^T})(\boldsymbol{y}-\boldsymbol{\Phi \theta}) $$ By the multiplication rule: $$ \frac{\partial \boldsymbol{L}}{\partial \boldsymbol{\theta}} =(-\frac{\partial }{\partial \boldsymbol{\theta}}(\boldsymbol{\theta^t \Phi^t}))(\boldsymbol{y}-\boldsymbol{\Phi \theta})+(\boldsymbol{y^T}-\boldsymbol{\theta^T\Phi^T}) (-\boldsymbol{\Phi}) $$ How to find$ \frac{\partial }{\partial \boldsymbol{\theta}}(\boldsymbol{\theta^t \Phi^t}) $?
By intuition, $$ \frac{\partial }{\partial \boldsymbol{\theta}}(\boldsymbol{\theta^t \Phi^t})=\boldsymbol{A} $$ where $\boldsymbol{A}\in R^{1\times N}$ with entries $\Phi_{i}$ corresponding to $\theta_i$.
How do I proceed from here?
I have found the answer from other threads. $$ \boldsymbol{L}=(\boldsymbol{y^T}-\boldsymbol{\theta^T\Phi^T})(\boldsymbol{y}-\boldsymbol{\Phi \theta}) $$ By the correct usage of product rule: $$\rightarrow (\boldsymbol{y^T}-\boldsymbol{\theta^T\Phi^T}) (-\boldsymbol{\Phi})+ (\boldsymbol{y}-\boldsymbol{\Phi \theta})^T(-\boldsymbol{\Phi}) $$ where I used $$ \frac{\partial }{\partial \boldsymbol{\theta}}(\boldsymbol{\theta^T \Phi^T})=(\boldsymbol{\Phi}^T)^T $$ by noting that the derivative of $v^T u$ with respect to $v_i$ is $u_i$ generalizing to $u^T$ for the gradient.
Or (im not sure) by another application of the product rule for the gradient of the product: $$ \rightarrow \boldsymbol{0}+(\boldsymbol{\Phi}^T)^T(\frac{\partial }{\partial \boldsymbol{\theta}}((\boldsymbol{\theta^T})^T)) $$
The product rule for the gradient of the product specified above holds particularly for the case $v\cdot v$ where $v\in R^{1\times N}$.
I do not have a generalization for that method however this is sufficient for most of my needs.