Consider $f:x\in \mathbb{R}^n \rightarrow f(x):= \frac{1}{2}x^\top Ax +b^\top x,$ with $n\geq 1$, $b\in \mathbb{R}^n$ and $A\in \mathbb{R}^{n\times n}$
Check that the gradient does fulfill the following fundamental property
$$ \forall x \in \mathbb{R}^d f(x+h) = f(x) + \langle\nabla f(x), h\rangle + o(\|\textbf{h}\|) (\textbf{h}\in \mathbb{R}^d) $$ Let $x\in \mathbb{R}^d, h\in \mathbb{R}^d, h\ne O_d$ \begin{align*} f(x+h) &= \frac{1}{2}(x+h)^\top A(x+h) + b^\top(x+h) \\ &=\underbrace{\frac{1}{2}x^\top Ax + b^\top x}_{f(x)} + \underbrace{\frac{x^\top Ah}{2}+\frac{h^\top Ax}{2}}_{\frac{(x^\top Ah)^\top+h^\top Ax}{2}}+\frac{1}{2}h^\top Ah+\underbrace{b^\top h}_{h^\top b}\\ \text{and we know that: }& (x^\top Ah)^\top = xA^\top h^\top\\ \text{and we can write: }&\frac{x^\top Ah}{2}+\frac{h^\top Ax}{2} + h^\top b = \underbrace{h^\top\big[(A^\top + A)x + b\big]}_{\langle h, \nabla f(x) \rangle} \end{align*} We need to prove that: $\frac{1}{2}h^\top Ah = o(\|h\|) \Leftrightarrow \exists \varepsilon(h) \xrightarrow[h \rightarrow 0]{}0 $, $\|h\|\varepsilon (h) = \frac{h^\top Ah}{2}$ $$ \frac{1}{2}h^\top Ah = \frac{\|h\|^2}{2}\big(\frac{h}{\|h\|}\big)^\top A\big(\frac{h}{\|h\|}\big) = \|h\|\underbrace{\big[\frac{\|h\|}{2}\big(\frac{h}{\|h\|}\big)^\top A\big(\frac{h}{\|h\|}\big)\big]}_{\varepsilon(h)} $$ and $\big(\frac{h}{\|h\|}\big)^\top A\big(\frac{h}{\|h\|}\big)$ is bounded for all $h$ by $|||A||| = \max_{x\in \mathbb{R}^d, \|x\|=1} x^\top A x$ $$ \epsilon (h) = \|h\|\cdot\big|\frac{1}{2}\big(\frac{h}{\|h\|}\big)^\top A \big(\frac{h}{\|h\|}\big)\big| \leq \|h\| \cdot |||A||| $$ Therefore $\frac{1}{2}h^\top Ah = \|h\|\cdot \epsilon(h) \frac{\rightarrow}{h \rightarrow 0} 0= o(\|h\|)$
$\forall x \in \mathbb{R}^d f(x+h) = f(x) + \langle\nabla f(x), h\rangle + o(\|\textbf{h}\|)$
Here are my 2 questions:
1) where does the definition $\varepsilon(h) = \big[\frac{\|h\|}{2}\big(\frac{h}{\|h\|}\big)^\top A\big(\frac{h}{\|h\|}\big)\big]$ come from?
2) Is the following equivalent to the definition of the matrix norm: $|||A|||^{*4} = \max_{x\in \mathbb{R}^d, \|x\|=1} x^\top A x$. If yes, to which norm is it equivalent?
one norm $$\|\textbf{A}\|_{1} = \max\limits_{1 \leq i \leq n}{\Big(\sum_{i=1}^{m}|x_{i,j}|\Big)} = \quad \text{maximal column sum}$$
infinity norm $$\|\textbf{A}\|_{\infty} = \max\limits_{1 \leq j \leq m}{\Big(\sum_{j=1}^{n}|x_{i,j}|\Big)} = \quad \text{maximal row sum} $$
two norm the most important norm $\|\textbf{A}\| = \|\textbf{A}\|_{2}$ is not easily computed. $$\text{If $\textbf{A}$ is ssymmetric }\:\to \:\|\textbf{A}\|_{2} = \max\limits_{i}|\lambda_i|$$