How is multiplying A by A transpose related to the gradient in the least squares problem?

217 Views Asked by At

I have a function:

$f(\textbf{x}) = \frac{1}{2} || \textbf{Ax - b} ||_2^2$

I am trying to minimize this function on values of x using gradient based optimization. The textbook I am following suggests the gradient can be computed as:

$\nabla_\textbf{x}f(\textbf{x}) = \textbf{A}^T(\textbf{Ax} - \textbf{b})$.

I am not understanding what has happened here. I am familiar with the notion of gradient, and how the function itself works, but I am not understanding why this computation represents the gradient.

1

There are 1 best solutions below

2
On

In $$f(\textbf{x}) = \frac{1}{2} || \textbf{Ax - b} ||_2^2$$ you have a vector $\textbf v=\textbf{Ax} - \textbf{b}$. The absolute value is obtained either by dot product, or you can write it in vector form $\textbf v^T\textbf v$

Now the gradient is the same as taking a derivative with respect to $\textbf x$

$$\nabla_\textbf{x}f(\textbf{x}) = \frac{d\textbf v^T}{d\textbf x}\textbf v+ \textbf v^T\frac{d\textbf v}{d\textbf x} =\textbf{A}^T(\textbf{Ax} - \textbf{b})$$ In the last equation I've used the property of transpose of the product of a matrix with a vector.