I'm currently reading over a convergence proof of gradient descent, which states
Our assumption that $\nabla f$ is Lipschitz continuous with constant $L$ implies that $\nabla^2 f(x) \preceq LI$, or equivalently that $\nabla^2 f(x) - LI$ is a negative semidefinite matrix. Using this fact, we can perform a quadratic expansion of $f$ around $f(x)$ and obtain the following inequality: \begin{align*} f(y) &\leq f(x) + \nabla f(x)^T (y - x) + \frac{1}{2} \nabla^2 f(x) \lVert y - x \rVert^2 \\ &\leq f(x) + \nabla f(x)^T (y - x) + \frac{1}{2} L \lVert y - x \rVert^2 \end{align*} Now let's plug in the gradient descent update by letting $y = x^+ = x - t \nabla f(x)$. We then get:
\begin{align*} f(x^+) &\leq f(x) + \nabla f(x)^T (x^+ - x) + \frac{1}{2} L \lVert x^+ - x \rVert^2 \\ &= f(x) + \nabla f(x)^T (x - t \nabla f(x) - x) + \frac{1}{2} L \lVert x - t \nabla f(x) - x \rVert^2 \\ &= f(x) - \nabla f(x)^T t \nabla f(x) + \frac{1}{2} L \lVert t\nabla f(x) \rVert^2 \\ &= f(x) - t \lVert \nabla f(x) \rVert^2 + \frac{1}{2} L t^2 \lVert \nabla f(x) \rVert^2 \\ &= f(x) - \left( 1 - \frac{1}{2} Lt \right) t \lVert \nabla f(x) \rVert^2 \end{align*}
However, isn't the correct way to perform a quadratic expansion:
$$ f(x) = \underbrace{f(x_0)}_{\text{Constant}} + \underbrace{\nabla f(x_0) \cdot (x - x_0)}_{\text{Linear term}} + \underbrace{\frac{1}{2} (x - x_0)^T H_f(x_0) (x - x_0)}_{\text{Quadratic term}} $$
Or is it the case that
$$\frac{1}{2} (y - x)^T \nabla^2 f(x) (y - x) \leq \frac{1}{2} \nabla^2 f(x) \lVert y - x \rVert^2 $$
or taking advantage of the fact that the spectral norm of $\|\nabla^2 f(x) \| \leq L$?