I know $\frac{d(x^T*A*X)}{dx}=A^T*X+A*X$. How can I use this to express: $\frac{dR(w)}{dw}$ where $R(w)$ is $(X*W-Y)^T*A*(X*W-Y)$?
X is n by d.
Y is n by 1.
w is d by 1.
A is a diagonal n by n matrix.
I can assume that $X^T*A*X$ is full rank. (What does this allow us to do here?!)
My guess is that I can maybe distribute the $^T$ and simplify to transform $R(w)$ into a form where the formula can be applied, but I don't see exactly how.
Below is my work, I'm not sure how to continue:
$\begin{array}{l@{}l} R(w) &{}= \left(\left(W^T*X^T\right)^T-Y\right)^T*A*\left(\left(W^T*X^T\right)^T-Y\right)\\ &{}= \left(X*W-Y\right)^T*A*\left(X*W-Y\right)\\ \end{array}$
I rewrite R(W) to try to be able to use the formula and take the derivative of R(W) with respect to w. $\frac{d\left(X^T*A*X\right)}{dx}=A^T*X+A*X$
We take $\frac{dR(W)}{dW}$ and set it to $0$.
$\begin{array}{l@{}l} R(w) &{}= \left(\left(X*W\right)^T-Y^T\right)*A*\left(X*W-Y\right)\\ &{}= \left(\left(X*W\right)^T*A-Y^T*A\right)*\left(X*W-Y\right)\\ &{}= \left(X*W\right)^T*A*\left(X*W\right)-\left(X*W\right)^T*A*Y-Y^T*A*X*W+Y^T*A*Y\\ \end{array}$
Not sure how to continue / take the derivative from here.
The reason I need to do this is I want to find vector w that minimizes R(w). Do I do dR(w)/dw (with respect to w, like described above) and set it to...zero?
For convenience, define the vector $$\eqalign{ z &= Xw-y \cr dz &= X\,dw \cr }$$ Now write down the function in terms of the Frobenius (:) inner product and find its differential $$\eqalign{ R &= z:Az \cr dR &= dz:Az + z:A\,dz \cr &= (A + A^T)\,z:dz \cr &= (A + A^T)\,(Xw-y):X\,dw \cr &= X^T(A + A^T)\,(Xw-y):dw \cr }$$ Since $dR=\Big(\frac{\partial R}{\partial w}:dw\Big),\,$ the gradient must be $$\eqalign{ \frac{\partial R}{\partial w} &= X^T(A + A^T)\,(Xw-y) \cr }$$ Setting the gradient to zero and solving for $w$ yields $$\eqalign{ X^T(A + A^T)\,Xw &= X^T(A + A^T)\,y \cr w &= \Big(X^T(A + A^T)X\Big)^{-1}X^T(A + A^T)\,y \cr }$$