Let $X_1,\dots, X_n$ be iid data with common distribution $F$ and $M_n:=\max\{X_1,\dots, X_n\}$. The distribution of $M_n$: $$ P(M_n\le x)=F^n(x). $$
In the textbook, it says
we proceed by looking at the behaviour of $F^n$ as $n\to \infty$. But this alone is not enough: for any $z<z_+$ where $z_+$ is the upper end point of $F$ (the smallest value of $z$ so that $F(z)=1$), as $n\to \infty$, $$ F^n(z)\to 0 $$ so the distribution of $M_n$ degenerates to a point mass at $z_+$.
This difficulty is avoided by allowing a linear transformation of $M_n$: $$ M_n^*=\frac{M_n-b_n}{a_n} $$ for sequences of constants $\{a_n>0\}$ and $\{b_n\}$. Appropriate choices of the these sequences stabilize the location and scale of $M_n^*$.
Question: why $M_n^*$ will be fine? It seems that it still $$ P(M_n^*\le z)=F^n(a_nz+b_n)\to 0? $$ (because $0\le F(a_nz+b_n)\le 1$, then $[F(a_nz+b_n)]^n\to 0$ as $n\to \infty$, right?)
While it is true that for each $n\in\mathbb N$ we have $0\leq F(a_nx+b_n)\leq 1$, that is not enough to conclude that $F(a_nx+b_n)^n\to 0$ as $n\to\infty$, since the quantity $F(a_nx+b_n)$ can increase (i.e. go to $1$) as $n\to\infty$.
Consider the following example: Let $F$ be the distribution function given by $$F(x)=\begin{cases}1-x^{-1} & x\geq 1 \\ 0 & x<1 \end{cases}$$ then choose $b_n=0$ as well as $a_n=n$ to obtain for $x>0$ and sufficiently large $n$: $$F(a_nx+b_n)^n=F(nx)^n=(1-n^{-1}x^{-1})^n \to \exp(-x^{-1})$$ which you might recognize as the Frechet distribution. Note that while each $F(a_nx)$ is between $0$ and $1$, the value $F(a_nx)=1-n^{-1}x^{-1}$ is increasing to $1$ as $n\to\infty$.