I'm working through Numerical Optimization by Nocedal and Wright, and I'm having trouble with some of its proofs that seem too handwavy to me. Take the first theorem for example:
Theorem: If $x^∗$ is a local minimizer and $f$ is continuously differentiable in an open neighborhood of $x^∗$, then $∇ f (x^∗) = 0$.
Proof (excerpt): Suppose for contradiction that $∇ f (x^∗) \neq 0$. Define the vector $p = −∇ f (x^∗)$ and note that $$p^T ∇ f (x^∗) = − \| f (x^∗)\|^2 < 0.$$ Because $∇ f$ is continuous near $x^∗$, there is a scalar $T > 0$ such that $$\forall t ∈ [0, T ], p^T ∇ f (x^∗ + tp) < 0 $$
Similarly, in another proof (second-order necessary condition), the author assumes $∇^2 f (x^∗)$ is not positive definite (so $\exists p\in\mathbb{R}^n, p^T ∇^2 f (x^∗)p < 0$), and uses the (assumed) continuity of $∇^2 f$ near $x^*$ to conclude the existence of a scalar $T > 0$ such that $\forall t ∈ [0, T ], p^T ∇^2 f (x^∗+tp)p < 0$.
On the surface, the argument roughly seems to be: if $f$ is continuous at $x^*$, and a certain property $P$ holds for $f(x^*)$, then $P$ holds for $f(x)$ when $x$ is near $x^*$. But of course this is not true in general (for example, take $f$ to be $f(x)=x$, $x^*=0$, $P(f(x^*))$ to be $f(x^*)\geq0$).
I would appreciate some help justifying such statements as "Because $∇ f$ is continuous near $x^∗$, there is scalar $T > 0$ such that $\forall t ∈ [0, T ], p^T ∇ f (x^∗ + tp) < 0 $", specifically using the $\epsilon-\delta$ definition of a continuous function in a metric space (what's the appropriate $\epsilon$ here?).
The essence of this step is, if $\phi : X \to \Bbb{R}$ is continuous at a point $x_0 \in X$ and $\phi(x_0) < 0$, then there exists some $\delta > 0$ such that $\phi(x) < 0$ for all $x \in B(x_0; \delta)$. I hope you can justfy why, for fixed $p$, and continuously differentiable $f$, $$x \mapsto p^\top \nabla f(x)$$ is continuous.
To prove this lemma, take $\varepsilon = |f(x_0)| = -f(x_0) > 0$. By the $\varepsilon$-$\delta$ definition of continuity, we know that there must exist some $\delta > 0$ such that \begin{align*} d(x, x_0) < \delta &\implies |f(x) - f(x_0)| < \varepsilon = -f(x_0) \\ &\implies f(x_0) < f(x) - f(x_0) < -f(x_0) \\ &\implies 2f(x_0) < f(x) < 0. \end{align*} Hence, $f(x) < 0$ for all $x \in B(x_0; \delta)$.