Let be $$f(x) = \frac{1}{2}x^TAx +b^Tx$$
where $A$ is a symmetric positive definite matrix and $b \in \mathbb{R}^d$. We know that such a function is systematically twice (and even infinitely) differentiable with
$$ \begin{cases} \nabla f(x) = Ax + b \\ \nabla^2 f(x) = A \\ \end{cases} $$
Let us note $g_k:=\nabla f(x^{(k)})$, so that here $g_k = Ax^{(k)}+b$. In this case, the optimal step size in direction $-g_k$ can be obtained analytically by solving
$$ \begin{align} 0 &= \frac{d}{d\lambda}f(x^{(k)}-\lambda g_k)\tag{1}\label{eq1} \\ &= \langle A(x^{(k)}-\lambda g_k) + b, - g_k\rangle\tag{2}\label{eq2} \\ &= - \|g_k\|^2 + \lambda \langle Ag_k, g_k\rangle\tag{3}\label{eq3}\ \end{align} $$
Here are my questions:
- how do you get from (\ref{eq1}) to (\ref{eq2}):
if $\nabla f(x) = Ax + b$ then $\frac{d}{d\lambda}f(x^{(k)}-\lambda g_k) = A(x^{(k)}-\lambda g_k) + b$. However, how do you get the dot product notation $\langle\cdot, \cdot \rangle$ notation as well as the additional $-g_k$
- how do you get from (\ref{eq2}) to (\ref{eq3})
Q1
(1) to (2) is followed from the below property of a multivariate function$${d\over dt}f(u+t v)=\nabla f(u+t v)^T\cdot v$$which has a simple and elementary proof using only the fact that $$f(u+t v)=f\Big(u_1+tv_1,u_2+tv_2,\cdots,u_n+tv_n\Big)$$
proof
Note that using calculus:$${d\over dt }f(u+t v){=\sum_{k=1}^n {\partial f\over \partial x_k}\Bigg|_{x=u+tv}\cdot {\partial u_k+tv_k\over \partial t}\\=\sum_{k=1}^n {\partial f\over \partial x_k}\Bigg|_{x=u+tv}\cdot v_k\\=\nabla f(u+t v)^T\cdot v}$$
Q2
(2) to (3) is followed so easily$$\langle A(x^{(k)}-\lambda g_k) + b, - g_k\rangle{=\langle Ax^{(k)}-\lambda A g_k + b, - g_k\rangle\\=\langle g_k-\lambda A g_k , - g_k\rangle\\=-||g_k||^2+\lambda \langle Ag_k,g_k\rangle}$$