I want to differentiate:
$f(w) = w^TF^TFw - w^TF^Tt- t^TFw$
with respect to w. F is a $n*d$ matrix, w is a $d*1$ vector, y is a $n*1$ vector.
I read sometimes that $(w^TF^Tt)'$ = $(F^Tt)^T$, and sometimes that it is $(F^Tt)$ - why is that?
Furthermore, I know that generally $(w^TAw)' = w^T(A+A^T)$. Should it not follow that $w^TF^TFw = w^T(F^TF+F^TF)$?
Let's use a colon denote the inner/Frobenius product, i.e. $$A:B={\rm tr}(A^TB)$$ Write the function in terms of the Frobenius product. Then finding its differential and gradient is easy $$\eqalign{ f &= Fw:Fw - Fw:t - t:Fw \cr &= Fw:Fw - 2t:Fw \cr \cr df &= 2Fw:F\,dw - 2t:F\,dw \cr &= 2F^TFw:dw - 2F^Tt:dw \cr &= 2(F^TFw - F^Tt):dw \cr \cr \frac{\partial f}{\partial w} &= 2F^T(Fw - t) \cr\cr }$$ The cyclic properties of the trace translate into rules for rearranging the terms in a Frobenius product. For example $$\eqalign{ AB:C &= A:CB^T \cr AB:C &= B:A^TC \cr A:B &= B:A \cr }$$