I have been working through Goodfellow's textbook (which is free online) as it is a requirement for a course I'm taking. However, I have tried to work through some of the calculations not shown in the book myself, and have got stuck on this one, which is from "4.5 Example: Linear Least Squares" equation (4.21). I am trying to differentiate the function below, f(x), but am having trouble. Making some assumptions, I have managed to get the correct answer, but I am wondering if the working is right. Thanks!
"Suppose we want to find the value of x that minimizes:"
f(x) = $\frac{1}{2}$||A x - b||$_2^2$
rewriting
= (A x - b)$^T$(A x - b)
expanding the brackets
= $\frac{1}{2}$((A x)$^T$A x - b$^T$A x - (A x)$^T$b + b$^T$b)
= $\frac{1}{2}$((A x)$^T$A x - 2b$^T$A x + b$^T$b)
let: u(x) = (A x)$^T$ ; v(x) = A x
then: u'(x) = A$^T$ --- not sure if this is right ?
v'(x) = A
therefore, employing the chain rule:
f '(x) = $\frac{1}{2}$ ( A$^T$A x + (A x)$^T$A - 2b$^T$A )
and (A x)$^T$A = A$^T$A x
f '(x) = $\frac{1}{2}$ ( 2 A$^T$A x - 2b$^T$A )
= A$^T$A x - b$^T$A
= A$^T$A x - A$^T$b
The term which you are questioning is the scalar product $$s=(Ax)^T(Ax)$$ Define a new variable $\,w=Ax,\,$ then $s=w^Tw\,\,$ whose gradient is easily found $$\eqalign{ s &= w^Tw \cr ds &= 2w^Tdw \cr s' &= 2w = 2Ax \cr }$$ Actually, if you use the variable $y=(Ax-b),\,$ then the entire problem can be handled as $$\eqalign{ f &= \tfrac{1}{2}y^Ty \cr df &= y^T\,dy &= y^TA\,dx \cr f' &= A^Ty &= A^T(Ax-b) \cr }$$