In Steele's Cauchy Schwarz Master Class Exercise 2.10 is about Akerberg's Refinement of AM-GM. (It's a refinement because AM-GM follows by iteration.)
Problem. For $a_1, \dots, a_n \geq 0$ and $n \geq 2$ prove $$ a_n \left(\frac{a_1+\dots+a_{n-1}}{n-1} \right)^{n-1} \leq \left(\frac{a_1+\dots+a_n}{n} \right)^n. $$ The book proposes to use (a consequence of Bernoulli) $$y \left(n-y^{n-1} \right) = ny-y^n \leq n-1 \quad \text{for} \quad y \geq 0.$$ By setting $$y^{n-1} = \frac{a_n}{\overline{a}} \quad \text{with} \quad \overline{a} = \frac{a_1+\dots+a_n}{n} $$ we are immediately done.
However, how is $y \left(n-y^{n-1} \right) \leq n-1$ related to the given problem? Where is the motivation to look at that? How would one think of the choice of $y$?
Motivation is linked to possible thought process - and of course multiple processes could motivate the same substitution / approach.
For e.g. if one were to observe the inequality to prove is homogeneous (i.e. invariant to scale), one possibility is to simply scale s.t. $a_1+a_2+\cdots+a_{n-1} = n-1$, so that the inequality to prove is simply $$a_n \leqslant \left(\frac{n-1+a_n}n \right)^n = \left(1+\frac{a_n-1}n \right)^n$$ which of course follows directly from Bernoulli's inequality. The scaling $a_1+a_2+\cdots a_n = n$ would also lead to a similar direct application.
In the case you mention, instead of scaling, perhaps the author noted that we can reduce the relevant variables $a_i$ to two (as $a_1+a_2 + \cdots + a_{n-1}$ always appears together), by setting the substitution in $\overline a$ and $y$, the inequality to prove reduces to $y(n-y^{n-1}) \leqslant n-1$, which in turn is a simple application of Bernoulli. When writing out, the order is reversed, as the implications are clearer then - unfortunately motivation is sometimes not obvious afterwards.