The criterion of sufficient decrease (Armijo condition) requires $\lambda \in \mathbb{R}$ such that:
\begin{equation} \varphi(\lambda) = f(x + \lambda d) < f(x) + \alpha \lambda \nabla^T f(x) d = \varphi(0) + \alpha \lambda \varphi'(0), \tag{*} \end{equation}
where $\alpha \in (0, 1)$. If $f$ is a quadratic function, then $\varphi$ is a parabola. Prove that if the minimizer $\lambda$ of this parabola is feasible in ($*$), we must have $\alpha \in (0, 1/2)$.
For this question, I think I have the graphic intuition. But I am having trouble trying to translate this into algebraic terms. I will put bellow some of what I have by now and the questions I'm having.
It's said that $f(x)$ is a quadratic function. So I have found previously that the optimal step when it comes to quadratic function is: $$\lambda^* = - \frac{d^T \nabla f(x)}{d^T \nabla^2 f(x) d}$$
Furthermore, considering that $\varphi(\lambda)$ is quadratic and that the right side of the expression $*$ is a line. I see that when $\alpha = {1 \over 2}$ the line passes through the minimal point of the parabola. So for this mininal point to be 'available', $\alpha$ ought to be in $(0,{1 \over 2})$, becouse otherwise the slope of the line will make the minimal point above the line, and we don't want that.
Considering $f(x)$ is a quadratic function, we have its general form written as: $$f(x) = {1 \over 2} x^T A x+b^Tx$$ And its gradiente might be $$\nabla f(x) = Ax+b^T$$ And in this point I'm struggling on how to manipulate those information to get the proof. I have tried to substitute those values and isolate $\alpha$, like that: $$\alpha = {f(x+\lambda d) - f(x) \over \lambda \nabla f(x) d}$$ But couldn't get anywere.
Also, tryied to substute the optimal $\lambda ^*$, like that:
$$f(x) + \alpha [- { d^T \nabla f(x) \over d^T \nabla ^2 f(x) d}] \nabla ^T f(x) d$$
in the nominator, we gonna get the identity. But what to do in the denominator?
In the end, I don't know how to prove it, I'm having trouble.
For a rigorous treatment on this subject, pick up a book on optimization. Boyd's book covers it in Chapter 9: Unconstrained minimization. I will instead follow your graphical intuition.
First, we assume that $f(x)$ is convex, that is $\nabla^2f(x)$ is positive definite. I presume you found the optimal $\lambda^*$ by expanding $f(x+\lambda d)$ to the second order,
$$ \varphi(\lambda)=f(x+\lambda d)=f(x)+\lambda\nabla^Tf(x)d+{\lambda^2\over2}d^T\nabla^2f(x)d $$
and finding the minimum point of the above. This is indeed a parabolic function of $\lambda$. In order to get any decrease at all, we require that
$$ \varphi(\lambda)-\varphi(0)=\lambda\nabla^Tf(x)d+{\lambda^2\over2}d^T\nabla^2f(x)d < 0 $$
Calculating the Hessian is expensive so we use only the Jacobian when determining sufficiency of decrease. Put it on the right hand side with a multiplier $\alpha>0$:
$$ \lambda\nabla^Tf(x)d+{\lambda^2\over2}d^T\nabla^2f(x)d \le \alpha\lambda\nabla^Tf(x)d $$
Since $d$ is usually chosen such that $\nabla^T f(x)d < 0$, we can be sure that this condition ensures decrease for certain values of $\alpha$. The right hand side is a line function of $\lambda$. It intersects the parabola at $\lambda=0$ and
$$\lambda = 2(\alpha-1){\nabla^Tf(x)d \over d^T\nabla^2f(x)d}$$
Therefore, $\lambda$ must be chosen such that
$$ 0 < \lambda \le 2(\alpha-1){\nabla^Tf(x)d \over d^T\nabla^2f(x)d} $$
We want to ensure $\lambda>0$ so that we don't take a step backwards, so to ensure upper bound is valid, that is to say it stays positive, we require $\alpha < 1$.
On the other hand, for a quadratic function, the optimal $\lambda^*$ is, as you have already discovered,
$$ \lambda^* = -{\nabla^Tf(x)d \over d^T\nabla^2f(x)d} $$
We don't want $\lambda^*$ to be rejected, thus we want it to be within the range given above. Therefore, we require
$$ -{\nabla^Tf(x)d \over d^T\nabla^2f(x)d} \le 2(\alpha-1){\nabla^Tf(x)d \over d^T\nabla^2f(x)d} $$ Being careful of the the sign change due to $\nabla^Tf(x)d<0$, we find the upper bound for $\alpha$ to be $$ \alpha \le {1\over 2} $$