I did it on a "by cases" manner, as follows
"There are four possible cases,
- $x$ does not occur free in neither $\alpha$ nor $\beta$.
Then, a structure will only model $\forall x \alpha$ if it models $\alpha$, and it will only model $\forall x (\alpha \to \beta)$ if it models $\alpha \to \beta$. If it models those two, then it must also model $\beta$, otherwise we'd have a true implicant with a false consequent. And since the structure models $\beta$ and $x$ does not occur free in it, then the structure also models $\forall x \beta$.
- $x$ occurs free only in $\alpha$.
If a structure models $\forall x \alpha$, then for every element $c$ of the universe, $\alpha (c)$ holds, and for the structure to model $\forall x (\alpha \to \beta)$, for every element of the universe $\alpha (c) \to \beta$ must hold. Since the structure models $\forall x \alpha$, the implicant is always true thus the consequent must also be true, thus $\beta$ holds, and since $x$ does not occur free in it, so does $\forall x \beta$.
- $x$ occurs free only in $\beta$.
In that case, $\forall x (\alpha \to \beta)$ will only hold if the implicant is false, or false whenever $\beta(c)$ is false, or $\beta$ is always true. Since $\forall x \alpha$ and thus (as $x$ does not occur free in it) $\alpha$ holds, it must be that $\beta(c)$ must always hold and thus $\forall x \beta$ holds.
- $x$ occurs free in both.
If the structure models $\forall x \alpha$ then $\alpha(c)$ holds for every $c$ in the universe, and thus for the structure to model $\alpha(c) \to \beta(c)$ for all $c$ in the universe, it must model $\beta(c)$ for every $c$, otherwise we'd have a false consequent with a true implicant. Since it models $\beta(c)$ for every $c$, it must also model $\forall x \beta$."
Now, this is lengthy, and for such a simple question. Is there a simpler way? [Or if the above does not suffice to prove it, why doesn't it?]
Given any structure $M$ such that $M \vDash \forall x\ ( α \to β )$ and $M \vDash \forall x\ ( α )$:
Given any element $c$ in $M$:
$M[x:=c] \vDash α \to β$, by semantics of $\forall$.
$M[x:=c] \vDash α$, by semantics of $\forall$.
Thus $M[x:=c] \vDash β$, by semantics of $\to$.
Therefore $M \vDash \forall x\ ( β )$.
Here "$[x:=c]$" denotes interpreting every free occurrence of "$x$" as the element $c$, and "$M[x:=c]$" denotes the composition, namely first using $[x:=c]$ to interpret some variables and then using $M$ to interpret the rest. The reason this proof works is that the cases are all bundled up in this notation. It is completely rigorous, because this notation can be rigorously defined recursively.
Some textbooks may allow wffs to have elements from the model, in which case you might write something like "$M \vDash ( α \to β )[c/x]$" instead of "$M[x:=c] \vDash α \to β$". In any case, this is just a presentation issue.