Understanding a specific process of finding the derivative of $x^TAx$

214 Views Asked by At

I am referring to @copper.hat's response to : Derivative of Quadratic Form. I do not have the reputation to reply directly. My goal is to find a way to better differentiate and understand these functions for the purpose of learning the gradient. I will try to write everything here to make this question independent.

Let $Q(x) = x^TAx$. $x∈R^n$ and $A∈R^{nxn}$

Fitting to $Q(x+h)−Q(x)$, $Q(x)$ expands to $(x+h)^TA(x+h)$ which expands further to $x^TAx+x^TAh+h^TAx+h^TAh-x^TAx$ simplifying to $x^TAh+h^TAx+h^TAh$.

How is it that this changes to $x^TAh+h^TAx$ in the reply on the linked post, dropping the $h^TAh$?

I see the references to $|h^TAh|≤∥A∥∥h∥^2$ however Googling "Cauchy Scwarz" gives very broad results and I'm having trouble understanding.

I've looked up many ways to find the gradient of matrix functions (ex: $b^TX^TXb$ , $1/2x^TAx+b^Tx$) and this process seems to be the most intuitive for me.

2

There are 2 best solutions below

0
On BEST ANSWER

By definition a derivative at a point $x$ is such a linear function $Dh$ that this holds $$f(x+h)=f(x)+Dh+g(h),\ g(h)\in o(||h||)$$ where by $g(h)\in o(||h||)$ we mean some function $g(h)$ such that $\lim_{h\to 0}\frac{|g(h)|}{||h||}=0$.

In the case of quadratic form $f(x)=x^TAx$ we have $$f(x+h)=\dots=x^TAx+2x^TAh+h^TAh$$ and this is exactly in the form of the first formula with $$f(x)=x^TAx,\ Dh=2x^TAh,\ g(h)=h^TAh\in o(||h||)$$ The last statement follows indeed from Cauchy–Schwarz inequality $|a^Tb|\leq||a||||b||$ for vectors $a,b$ and also from matrix norm inequality $||Ax||\leq||A||||x||$. $$\lim_{h\to 0}\frac{|h^TAh|}{||h||}\leq\lim_{h\to 0}\frac{||h||||Ah||}{||h||}\leq \lim_{h\to 0}\frac{||h||||A||||h||}{||h||}=\lim_{h\to 0}||h||||A||=0$$

1
On

The derivative by a vector $v$ is the Gateaux derivative in the direction $v,$ meaning we take an extra variable $t,$ near a point $x_0$ write out the function evaluated at $x_0 + t h$ and see what happens. Your $$ x^T A x = (x_0 + tv)^T A (x_0 + t v) = x_0 A x_0 + t \; (x_0^T Av + v^T A x_0) + t^2 \; v^T A v $$ As a 1 by 1 matrix is symmetric, we have $v^T A x_0 =x_0^T Av $ and $$ x^T A x = x_0^T A x_0 + 2t \; x_0^T Av + t^2 \; v^T A v $$ and its derivative at $t=0$ is the scalar $$2 x_0^T Av \;.$$ This is the dot product of $v$ with $2Ax_0,$ so the gradient is $2Ax_0$ when written as a column vector