Question 1:
How does one come up with the equation in the red box below?
It looks like some kind product rule, but I'm not sure how to apply Ito's lemma here.

Bjork doesn't seem to explain it fully, and I can't find the Heath book. My prof gave another proof which I got.

Question 2:
Should the encircled s's be u instead?

Cross-posted: https://quant.stackexchange.com/questions/16688/getting-a-stochastic-differential
A less heuristic proof is the following. Define the function $Y(t,T,\mathcal{P})$ such that, for each partition $\mathcal{P}$ (of size $n$) of the interval $[t,T]$, we have
$$ Y(t,T,\mathcal{P}) := -\sum\limits_{i=1}^{n} f(t,s_{i})(s_{i + 1} - s_{i}) = -\sum\limits_{i=1}^{n} f(t,s_{i})\Delta s_i\,. $$
Observe that, $$ \begin{eqnarray*} \sum\limits_{j=1}^{n}\frac{\partial}{\partial f_{t, s_{j}}} Y(t,T,\mathcal{P}) ~\mathrm df(t,s_{j}) = -\sum\limits_{i=1}^{n} 1\cdot\mathrm df(t,s_{i})\Delta s_i\,.\tag{1}\newline \sum\limits_{j=1}^{n}\frac{1}{2}\frac{\partial^2}{\partial f^2_{t, s_{j}}} Y(t,T,\mathcal{P}) ~\mathrm d\langle f\rangle_{t,s_{j}} = -\sum\limits_{i=1}^{n} 0\cdot \mathrm d\langle f\rangle_{t,s_{t_i}}\Delta s_i = 0\,.\tag{2}\newline \sum\limits_{j<r=1}^{n}\frac{\partial^2}{\partial f_{t, s_{j}}\partial f_{t, s_{r}}} Y(t,T,\mathcal{P}) ~\mathrm d\langle f, f\rangle_{t,s_{j},s_{r}} = -\sum\limits_{i<r=1}^{n} 0\cdot ~\mathrm d\langle f, f\rangle_{t,s_{i},s_{r}}\Delta s_i = 0\,.\tag{3} \end{eqnarray*} $$
Therefore, by Ito's Lemma, $(1)$, $(2)$ and $(3)$ imply that $$ \mathrm dY(t,T,\mathcal{P}) = \frac{\partial}{\partial t}Y(t,T,\mathcal{P})~\mathrm dt - \sum\limits_{i=1}^{n} \mathrm df(t,s_{t_i})\Delta s_i\,. $$
This means that, for each partition $\mathcal{P}^{'}$ (of size $m$) of the interval $[0,t]$, we have $$ \begin{array}{rcl} \displaystyle \sum\limits_{k=0}^{m} \Delta Y_{k}(s_k, T, \mathcal{P}) &=& \displaystyle \sum\limits_{k=0}^{m}\left(\frac{\partial}{\partial t}Y(s_k, T, \mathcal{P})\right)\Delta s_k - \sum\limits_{i=1}^{n} \left(\sum\limits_{k=0}^{m} \Delta f_k(s_k,s_{i})\right)\Delta s_i\,. \\ &&\\ \mbox{So, }\displaystyle \,\,\lim\limits_{\|\mathcal{P}\|\rightarrow 0}\sum\limits_{k=0}^{m} \Delta Y_{k} &=&\displaystyle \lim\limits_{\|\mathcal{P}\|\rightarrow 0} \sum\limits_{k=0}^{m}\left(\frac{\partial}{\partial t} Y(s_k, T, \mathcal{P})\right)\Delta s_k - \lim\limits_{\|\mathcal{P}\|\rightarrow 0}\sum\limits_{i=1}^{n} \left(\sum\limits_{k=0}^{m} \Delta f_k(s_k,s_{i})\right)\Delta s_i \\ &=&\displaystyle \sum\limits_{k=0}^{m}\frac{\partial}{\partial t} \left(\lim\limits_{\|\mathcal{P}\|\rightarrow 0} Y(s_k, T, \mathcal{P})\right)\Delta s_k - \sum\limits_{k=0}^{m} \Big(\int\limits_{t}^{T}\Delta f_k(s_k,s)~\mathrm ds\Big)_k \\ &=&\displaystyle \sum\limits_{k=0}^{m}\left(\frac{\partial}{\partial t} Y(s_k, T)\right)\Delta s_k - \int\limits_{t}^{T}\sum\limits_{k=0}^{m} \Big(\Delta f_k(s_k,s)\Big)_k~\mathrm ds \\ &=&\displaystyle \sum\limits_{k=0}^{m} \left(\frac{\partial}{\partial t} Y(s_k, T)\right)\Delta s_k - \int\limits_{t}^{T}\Big( f(t, s) -f(0, s) \Big)~\mathrm ds\,. \\ \therefore\,\, \sum\limits_{k=0}^{m} \Delta Y_{k}(s_k, T) &=&\displaystyle \sum\limits_{k=0}^{m} \left(\frac{\partial}{\partial t} Y(s_k, T)\right)\Delta s_k - \int\limits_{t}^{T}\Big( f(t, s) -f(0, s) \Big)~\mathrm ds \\ &&\\ \mbox{Consequently, }\quad\quad&& \\ Y(t,T) -Y(0,T)&=&\displaystyle \lim\limits_{\|\mathcal{P^{'}}\|\rightarrow 0}\sum\limits_{k=0}^{m} \Delta Y_{k}(s_k, T) \\ &=&\displaystyle \lim\limits_{\|\mathcal{P^{'}}\|\rightarrow 0}\sum\limits_{k=0}^{m}\left(\frac{\partial}{\partial t} Y(s_k, T)\right)\Delta s_k - \int\limits_{t}^{T}\Big( f(t, s) -f(0, s) \Big)~\mathrm ds \\ &=&\displaystyle \int\limits_{0}^{t} \left(\frac{\partial}{\partial t}Y(s,T)\right)\mathrm ds - \int\limits_{t}^{T}\Big( f(t, s) -f(0, s) \Big)~\mathrm ds\,. \end{array} $$ Or, if you prefer the SDE form,
Admittedly, this is not as "air tight" as it can be. For instance, above, I assumed that as a differentiable function of $t$ and $s$ (not a process), $f(t,s)$ is sufficiently smooth to allow the interchange of the limiting operations "$\frac{\partial}{\partial t}$" and "$\|{\mathcal{P}}\|\rightarrow 0$", and that this is sufficient for the pertinent Ito-integrals to be well-defined and agree.