I was looking at the gradient descent algorithm. Then I am confused about the second line on below. The instructor mentioned something related to the Taylor series expansion. I am not sure how the first line related to the second line in terms of the 'gradient' operator. Any ideas?
$\bigtriangleup f= f(w(1))-f(w(0)),$ where $w(1)=w(0)+\eta \vec{v}$
$=\eta \triangledown f(w(0))^T \vec{v} +O(\eta^2)$
$\geq-\eta||\triangledown f(w(0))||$
Use the multivariate Taylor series. Expanding about $w_0$, we get: $$ f(x)=f(w_0)+(x-w_0)^T\nabla f(w_0) + O(||x-w_0||_2^2) $$ So then evaluating the function at $w_0+\eta\hat{v}$, for some unit vector $\hat{v}$, gives: \begin{align} f(w_0+\eta\hat{v}) &=f(w_0)+(\eta\hat{v})^T\nabla f(w_0)+O(||\eta\hat{v}||_2^2)\\ &=f(w_0)+\eta\nabla f(w_0)^T\hat{v}+O(\eta^2) \end{align} Thus, the difference is given by: $$ \Delta f = f(w_0+\eta\hat{v})-f(w_0)= \eta\nabla f(w_0)^T\hat{v}+O(\eta^2) $$ as required.
As for the third line, use the fact that $\hat{v}$ is a unit vector: $$ \Delta f = \eta\,||\nabla f(w_0)||_2 \,\underbrace{||\hat{v}||_2}_1\underbrace{\cos(\theta_{[\hat{v},\nabla f(w_0)]})}_{\geq -1} \geq -\eta ||\nabla f(w_0)||_2 $$