Gradient of $||Ax - y||^2$ with respect to $A$

2.4k Views Asked by At

How do I proceed to find $\nabla_A||Ax - y||^2$ where $A \in \mathbb{R}^{n\times n}$ and $x,y \in \mathbb{R}^n$ and the norm is the Euclidean norm.

Attempt so far

$$||Ax - y||^2 = (Ax-y)^T(Ax-y) = x^TA^TAx - 2x^TAy + y^Ty $$

$$ \nabla_A(x^TAy) = xy^T$$

Where I am stuck

I don't know how to tackle the $x^TA^TAx$ term since if I try to apply chain rule, I will have to differentiate a matrix with respect to a matrix.

2

There are 2 best solutions below

2
On

Before we start deriving the gradient, some facts and notations for brevity:

  • Trace and Frobenius product relation $$\left\langle A, B C\right\rangle={\rm tr}(A^TBC) := A : B C$$
  • Cyclic properties of Trace/Frobenius product \begin{align} A : B C &= BC : A \\ &= A C^T : B \\ &= {\text{etc.}} \cr \end{align}

Let $f := \left\|Ax-y \right\|^2 = Ax-y:Ax-y$.

Now, we can obtain the differential first, and then the gradient. \begin{align} df &= d\left( Ax-y:Ax-y \right) \\ &= \left(dA \ x : Ax-y\right) + \left(Ax-y : dA \ x\right) \\ &= 2 \left(Ax - y\right) : dA \ x \\ &= 2\left( Ax-y\right)x^T : dA\\ \end{align}

Thus, the gradient is \begin{align} \frac{\partial}{\partial A} \left( \left\|Ax-y \right\|^2 \right)= 2\left( Ax-y\right)x^T. \end{align}

2
On

For matrices, the most easy way is often to get back to definition of differentiability, i.e. :

$$||(A+H)x - y||^2-||Ax - y||^2=L_A(H) + o(\|H\|)$$

With $L_A$ a linear map.

We begin with : $$||(A+H)x - y||^2=\langle Ax+Hx-y , Ax+Hx-y \rangle.$$

Then we have :

$$||(A+H)x - y||^2= \langle Ax-y,Ax-y\rangle + 2\langle Hx,Ax -y\rangle + \langle Hx,Hx\rangle.$$

To conclude : $$||(A+H)x - y||^2-||Ax - y||^2=2\langle Hx,Ax -y\rangle + o (\|H\|)$$

Thus :

$$\left(\nabla_A||Ax - y||^2\right)_{i,j}=2\langle E_{i,j}x,Ax -y\rangle$$

With $E_{i,j}$ the matrix with a $1$ at row $i$ and column $j$ and $0$ otherwise.