I am currently reading a text book on distributed computing systems that includes a short introduction to $\lambda$-calculus. There is an example of evaluating the sequence $(((if \space \space true) \space \space 4) \space \space 5)$ which is written below.
$\space\space(((\lambda b.\lambda t.\lambda e.((b \space t) \space e) \space \lambda x.\lambda y.x) \space 4) \space 5) $
$\space\space((\lambda t.\lambda e ((\lambda x.\lambda y.x \space \space t) \space e) \space 4) \space 5) $
$\space\space(\lambda e ((\lambda x.\lambda y.x \space \space 4) \space \space e) \space \space 5) $
$\space\space((\lambda x.\lambda y.x \space \space 4) \space \space 5) $
$\space\space(\lambda y.4 \space \space 5) $
$\space\space 4$
The author has used $\beta$-reduction on each line and I can follow up until the second to last reduction. Could someone explain how we get from line 4 to line 6?
In line 4. there is a $\beta$-redex $(\lambda x \lambda y. x \ 4)$, where $\lambda x \lambda y. x$ is applied to $4$. This means that $(\lambda x \lambda y. x \ 4)$ rewrites$-$via a $\beta$-step$-$to $(\lambda y. x)\{4 / x\} $ (which is $(\lambda y. x)$ where the free occurrence of $x$ is replaced by $4$), i.e. $ \lambda y. 4$.
Since the $\beta$-redex $(\lambda x \lambda y. x \ 4)$ is inside a context (i.e. it is a sub-term of a bigger term), then from line 4., if we put the sub-term $(\lambda x \lambda y. x \ 4)$ into its context, we have: \begin{align} ((\lambda x \lambda y. x \ 4) \ 5) &\to_\beta (\lambda y. x \ 5)\{4 / x\} = (\lambda y. 4 \ 5) \\ &\to_\beta 4\ \{5/y\} = 5 \end{align} where, again, $(\lambda y. 4 \ 5) $ is a $\beta$-redex and then rewrites$-$via a $\beta$-step$-$to $4\ \{5/y\}$ (which is $4$ where the free occurrences of $y$ are replaced by $5$), i.e. $4$ because there are not free occurrences of $y$ in $4$.