My textbook (Burden Faires Numerical Analysis Chapter 5.4) starts with:
The first step in deriving a Runge-Kutta method is to determine values for $a_1, \alpha_1, \beta_1$ with the property that $a_1 f(t + \alpha_1, y + \beta_1)$ approximates:
\begin{align*} T^{(2)}(t,y) &= f(t,y) + \frac{h}{2} f'(t,y) \\ \end{align*}
Can someone explain this setup? What is this $T$ function? Where did that definition come from and what does it represent? And why should this be approximated by $a_1 f(t + \alpha_1, y + \beta_1)$?
Here is the rest of the derivation, FWIW:
By chain rule:
\begin{align*} f'(t,y) &= \frac{df}{dt}(t,y) = \frac{\partial f}{\partial t}(t,y) + \frac{\partial f}{\partial y}(t,y) \cdot f(t, y) \\ \end{align*}
we have:
\begin{align*} T^{(2)}(t,y) &= f(t,y) + \frac{h}{2} \frac{\partial f}{\partial t}(t,y) + \frac{h}{2} \frac{\partial f}{\partial y}(t,y) \cdot f(t,y) \\ \end{align*}
Taylor expansion of $f(t + \alpha_1, y + \beta_1)$ yields:
\begin{align*} a_1 f(t + \alpha_1, y + \beta_1) &= a_1 f(t,y) + a_1 \alpha_1 \frac{\partial f}{\partial t}(t,y) + a_1 \beta_1 \frac{\partial f}{\partial y}(t,y)\\ &+ a_1 \cdot R_1(t + \alpha_1, y + \beta_1) \\ \end{align*}
where for $\xi \in (t, t+\alpha_1), \mu \in (y, y+\beta_1)$:
\begin{align*} R_1(t + \alpha_1, y + \beta_1) &= \frac{\alpha_1^2}{2} \frac{\partial^2 f}{\partial t^2} (\xi, \mu) + \alpha_1 \beta_1 \frac{\partial^2 f}{\partial t \partial y} (\xi, \mu) + \frac{\beta_1^2}{2} \frac{\partial^2 f}{\partial y^2}(\xi, \mu) \\ \end{align*}
When we set
\begin{align*} a_1 f(t + \alpha_1, y + \beta_1) &= T^{(2)}(t,y) = f(t,y) + \frac{h}{2} f'(t,y) \\ \end{align*}
yields:
\begin{align*} a_1 &= 1 \\ \alpha_1 &= \frac{h}{2} \\ \beta_1 &= \frac{h}{2} f(t,y) \\ \end{align*}
So:
\begin{align*} T^{(2)}(t,y) &= f\left( t + \frac{h}{2}, y + \frac{h}{2} f(t,y) \right) - R_1 \left( t + \frac{h}{2}, y + \frac{h}{2} f(t,y) \right) \\ \end{align*}
This yields the midpoint method:
\begin{align*} w_0 &= \alpha \\ w_{i+1} &= w_i + h f\left( t_i + \frac{h}{2}, w_i + \frac{h}{2} f(t_i, w_i) \right) \\ \end{align*}
$T^{(2)}$ appears to be the two-term Taylor approximation for the divided difference $$ \frac{y(t+h)-y(t)}{h}=y'(t)+\frac{h}{2}y''(t)+\frac{h^2}{6}y'''(t)+... $$ You can combine the first two terms a la Taylor to $y'(t+\frac h2)$. which hints to where the midpoint method comes into play.