I am referring to @copper.hat's response to : Derivative of Quadratic Form. I do not have the reputation to reply directly. My goal is to find a way to better differentiate and understand these functions for the purpose of learning the gradient. I will try to write everything here to make this question independent.
Let $Q(x) = x^TAx$. $x∈R^n$ and $A∈R^{nxn}$
Fitting to $Q(x+h)−Q(x)$, $Q(x)$ expands to $(x+h)^TA(x+h)$ which expands further to $x^TAx+x^TAh+h^TAx+h^TAh-x^TAx$ simplifying to $x^TAh+h^TAx+h^TAh$.
How is it that this changes to $x^TAh+h^TAx$ in the reply on the linked post, dropping the $h^TAh$?
I see the references to $|h^TAh|≤∥A∥∥h∥^2$ however Googling "Cauchy Scwarz" gives very broad results and I'm having trouble understanding.
I've looked up many ways to find the gradient of matrix functions (ex: $b^TX^TXb$ , $1/2x^TAx+b^Tx$) and this process seems to be the most intuitive for me.
By definition a derivative at a point $x$ is such a linear function $Dh$ that this holds $$f(x+h)=f(x)+Dh+g(h),\ g(h)\in o(||h||)$$ where by $g(h)\in o(||h||)$ we mean some function $g(h)$ such that $\lim_{h\to 0}\frac{|g(h)|}{||h||}=0$.
In the case of quadratic form $f(x)=x^TAx$ we have $$f(x+h)=\dots=x^TAx+2x^TAh+h^TAh$$ and this is exactly in the form of the first formula with $$f(x)=x^TAx,\ Dh=2x^TAh,\ g(h)=h^TAh\in o(||h||)$$ The last statement follows indeed from Cauchy–Schwarz inequality $|a^Tb|\leq||a||||b||$ for vectors $a,b$ and also from matrix norm inequality $||Ax||\leq||A||||x||$. $$\lim_{h\to 0}\frac{|h^TAh|}{||h||}\leq\lim_{h\to 0}\frac{||h||||Ah||}{||h||}\leq \lim_{h\to 0}\frac{||h||||A||||h||}{||h||}=\lim_{h\to 0}||h||||A||=0$$