Linear Least Squares Differentiated - from Deep Learning (Goodfellow)

91 Views Asked by At

I have been working through Goodfellow's textbook (which is free online) as it is a requirement for a course I'm taking. However, I have tried to work through some of the calculations not shown in the book myself, and have got stuck on this one, which is from "4.5 Example: Linear Least Squares" equation (4.21). I am trying to differentiate the function below, f(x), but am having trouble. Making some assumptions, I have managed to get the correct answer, but I am wondering if the working is right. Thanks!

"Suppose we want to find the value of x that minimizes:"

f(x) = $\frac{1}{2}$||A x - b||$_2^2$

rewriting

= (A x - b)$^T$(A x - b)

expanding the brackets

= $\frac{1}{2}$((A x)$^T$A x - b$^T$A x - (A x)$^T$b + b$^T$b)

= $\frac{1}{2}$((A x)$^T$A x - 2b$^T$A x + b$^T$b)

let: u(x) = (A x)$^T$ ; v(x) = A x

then: u'(x) = A$^T$ --- not sure if this is right ?

v'(x) = A

therefore, employing the chain rule:

f '(x) = $\frac{1}{2}$ ( A$^T$A x + (A x)$^T$A - 2b$^T$A )

and (A x)$^T$A = A$^T$A x

f '(x) = $\frac{1}{2}$ ( 2 A$^T$A x - 2b$^T$A )

= A$^T$A x - b$^T$A

= A$^T$A x - A$^T$b

1

There are 1 best solutions below

0
On BEST ANSWER

The term which you are questioning is the scalar product $$s=(Ax)^T(Ax)$$ Define a new variable $\,w=Ax,\,$ then $s=w^Tw\,\,$ whose gradient is easily found $$\eqalign{ s &= w^Tw \cr ds &= 2w^Tdw \cr s' &= 2w = 2Ax \cr }$$ Actually, if you use the variable $y=(Ax-b),\,$ then the entire problem can be handled as $$\eqalign{ f &= \tfrac{1}{2}y^Ty \cr df &= y^T\,dy &= y^TA\,dx \cr f' &= A^Ty &= A^T(Ax-b) \cr }$$