I am a 17 year old student and I was reading up on epidemic modelling for a math project, specifically the SIR model and I came across this:
"This" refers to the assumptions to which the Markov Chain is based upon.
I do not understand how the probability equation could lead to a differentiation one. Please do help explain to me, or you could also provide relevant sources that I could gleam some information from. Thank you!
"Formally" refers to the fact that it is nontrivial to justify the passage from the stochastic dynamics to the deterministic one, in fact this passage might be wrong, due to the product term $S_tI_t$.
Rigorously, one considers $$\sigma(t)=E(S_t),\qquad \iota(t)=E(I_t),$$ then the description of the Markov chain which you recall implies that $$\sigma(t+h)=\sigma(t)-\alpha hE(S_tI_t)+o(h),\qquad\iota(t+h)=\iota(t)+\alpha hE(S_tI_t)-\rho h\iota(t)+o(h),$$ that is, in the limit $h\to0$, $$\sigma'(t)=-\alpha E(S_tI_t),\qquad\iota'(t)=\alpha E(S_tI_t)-\rho\iota(t).$$ And now comes the nonrigorous/approximation part: if one knows that $$E(S_tI_t)\approx E(S_t)E(I_t),\tag{$\ast$}$$ then one can replace the $E(S_tI_t)$ terms on both RHS by $\sigma(t)\iota(t)$ and one indeed gets Kermack-McKendrick deterministic model (1) for the mean functions $(\sigma,\iota)$.
The approximation $(*)$ is usually justified by a law-of-large-numbers type argument, asserting that the populations are large hence their fluctuations are negligible so that, with great probability, the (random) point $(S_t,I_t)$ is (suitably) close to the (deterministic) point $(E(S_t),E(I_t))$.