Differentiating $\langle Ax,x\rangle$

217 Views Asked by At

If $f\colon\Bbb R^n\rightarrow\Bbb R^m$ and $g\colon\Bbb R^n\rightarrow\Bbb R^m$ are differentiable at a point $x_0\in\Bbb R^n$, and $F(x)=\langle f(x),g(x)\rangle$, then $((DF)(x_0))(h)=\langle((Df)(x_0))(h),g(x_0)\rangle + \langle f(x_0),((Dg)(x_0))(h)\rangle\ \forall h\in\Bbb R^n$.

Now let $A$ be a real, symmetric $n$ by $n$ matrix and define $G\colon\Bbb R^n\rightarrow\Bbb R$ by $G(x)=\langle Ax,x\rangle $. I have seen the result that $((DG)(x))(h)=2Ah$ $\forall x\in\Bbb R^n$ ($(DG)(x)$ is the linear map, evaluated at $h\in\Bbb R^n$), but how do I arrive here using the above result? I know the derivative of $Ax$ is the linear map $A$ itself and the derivative of $x$ is $I_n$ (the identity matrix), but I can't see why $((DG)(x))(h)=\langle Ah,x\rangle + \langle Ax, h\rangle=2Ah$.

I must be missing something obvious. I haven't been able to get far with the polarisation identity either. Perhaps I have confused the linear map that is the derivative with the function value of the derivative somewhere?

$\bf Edit:$ dimensional mess-ups have been pointed out above!

2

There are 2 best solutions below

0
On

Using the linearity of the inner product, we get the following:

$G(x+\epsilon h) = \langle A(x+\epsilon h),x+\epsilon h\rangle = \langle Ax+\epsilon Ah,x+\epsilon h\rangle$

$= \langle Ax,x \rangle + \langle Ax , \epsilon h \rangle + \langle \epsilon Ah , x \rangle + \langle \epsilon Ah,\epsilon h \rangle$

$= \langle Ax,x \rangle + \epsilon \langle Ax , h \rangle + \epsilon\langle Ah , x \rangle + \epsilon^2\langle Ah, h \rangle$

$= G(x) + \epsilon\left(\langle Ax , h \rangle + \langle Ah , x \rangle\right) + O(\epsilon^2)$

Now, apply the definition of the derivative to get the result.

2
On

If $F(x) = f^T(x) g(x)$ the product rule gives $DF(x)(h) = (Df(x)(h))^T g(x) = f^T(x) Dg(x)(h)$.

With $f(x) = Ax$, you have $Df(x)(h) = Ah$ and with $g(x) = x$, you have $Dg(x)(h) = h$.

Substituting gives $DF(x)(h) = h^T A^T x + x^T A^Th = 2 x^T Ah$ (using $A=A^T$).

This is sometimes written as $Df(x) = 2 x^T A$.

Aside: This is easy to verify directly by computing $F(x+h)-F(x)$ and identifying the term that is linear in $h$. In particular, $F(x+h)-F(x)= 2 x^T Ah + h^T Ah$.