In the Introduction to Lambda Calculus, on pages 23-24, the author introduces one-step $\beta$-reduction and $\beta$-reduction, which I write as $\to_1$ and $\to$ respectively (cannot find the symbols in the document on Detexify). In Definition 4.2(i), $\to_1$ is explicitly defined as a compatible (by 4.1) relation on $\Lambda$. Next $\to$ is defined, and below the author claims that "$\to$ is a compatible relation by definition".
But consider the requirement that $M \to N \implies ZM \to ZN$. How is this relation provable from the given definitions? For this, we would require to somehow use the similar property from $\to_1$, i.e., $X \to_1 Y \implies ZX \to_1 ZY$. The definitions for $A \to B$ provide no information of the form $ (\ldots) \implies A' \to_1 B'$, so how is it possible for us to start from $M \to N$ and get back anything about $\to_1$ from just the definitions as axioms alone?
As I promote my comment to an answer, I’d considered it useful to digress into structural induction a bit in the middle.
So, if there is an inductive definition of some relation $R$ with a set of clauses of form $$\phi\wedge R(t_1)\wedge\ldots\wedge R(t_n)\Rightarrow R(t),\qquad{(*)}$$ where $\phi$ doesn’t contain any $R$s, then it means cases $R(x)$ holds are precisely those which could be obtained by a proof, using clauses of the definition as inference rules $\frac{\phi\quad R(t_1)\quad\cdots\quad R(t_n)}{R(t)}$.
With this, a structural induction comes in handy: to prove claim $\forall x.\;R(x)\Rightarrow A(x)$, it’s sufficient to prove, for each definition clause $(*)$, that $\phi\wedge A(t_1)\wedge A(t_2)\wedge\ldots\wedge A(t_n)\Rightarrow A(t)$, where $t_1,\ldots,t_n,t$ are exactly what they are in that clause.
This kind of induction is founded on ordinary mathematical induction for $\mathbb N$ applied with respect to proof length/depth, and it’s less verbose to justify it once somewhere, and then use it all the way instead—as I’ll do below.
In your case, we have clauses
C1. $M\twoheadrightarrow_{\beta} M$;
C2. $M\to_\beta N\Rightarrow M\twoheadrightarrow_{\beta} N$;
C3. $M\twoheadrightarrow_{\beta} N\wedge N\twoheadrightarrow_{\beta} L\Rightarrow M\twoheadrightarrow_{\beta} L$.
The structural induction proof of compatibility of $\twoheadrightarrow_{\beta}$ then is as follows:
Show that $ZM\twoheadrightarrow_{\beta} ZM$, $MZ\twoheadrightarrow_{\beta} MZ$, $(\lambda x.M)\twoheadrightarrow_{\beta} (\lambda x.M)$.
C1 three times, done.
Show that, if $M\to_\beta N$, then $ZM\twoheadrightarrow_{\beta} ZN$, $MZ\twoheadrightarrow_{\beta} NZ$, $(\lambda x.M)\twoheadrightarrow_{\beta} (\lambda x.N)$.
By compatibility of $\to_\beta$, we’ve got $ZM\to_\beta ZN$, $MZ\to_\beta NZ$, $(\lambda x.M)\to_\beta (\lambda x.N)$. Applying C2 to these, we’re done with this case.
(Now we actually have premises with $\twoheadrightarrow_{\beta}$ in them, hooray!) Show that, if
(a) $ZM\twoheadrightarrow_{\beta} ZN$, $MZ\twoheadrightarrow_{\beta} NZ$, $(\lambda x.M)\twoheadrightarrow_{\beta} (\lambda x.N)$, and
(b) $ZN\twoheadrightarrow_{\beta} ZL$, $NZ\twoheadrightarrow_{\beta} LZ$, $(\lambda x.N)\twoheadrightarrow_{\beta} (\lambda x.L)$, then
(c) $ZM\twoheadrightarrow_{\beta} ZL$, $MZ\twoheadrightarrow_{\beta} LZ$, $(\lambda x.M)\twoheadrightarrow_{\beta} (\lambda x.L)$.
Here we apply C3 for corresponding statements of (a) and (b) to get each statement of (c).
And that’s it, now we know for sure $\beta$-reduction is compatible! (Well, I liked a sketch in your comment more, but for the sake of clarity, this all had to be done.)