$\mathbb E = \{ \mathcal E_i \}_{i=0}^{N-1}$ and $\mathbb I = \{ \mathcal I_i \}_{i=0}^{N-1}$ are two variables varing with time. For each $I_k$, we can estimate it through equation $\eqref{m1}$:
$$\label{m1}\tag{1} \hat I_k = P(\mathcal E_k, \mathcal E_{k-1}, \dots, \mathcal E_{0}) $$
This means that for every $I_k$, we will estimate it with the k variables $\mathcal E_k, \mathcal E_{k-1}, \dots, \mathcal E_{0}$. However, we can do this another way as equation $\eqref{m2}$ doses:
$$\label{m2}\tag{2} \hat I_k = P(\mathcal E_k | \hat I_{k-1}) $$
Equation $\eqref{m2}$ is a recursive way to estimate $I_k$. We do not need to consider all k variables $\mathcal E_k, \mathcal E_{k-1}, \dots, \mathcal E_{0}$. Instead, we recursively estimate $I_k$ with the former estimated $\hat I_{k-1}$. I think this second way should be better, especially when k is large.
My question is: Is really equation $\eqref{m2}$ is better than equation $\eqref{m1}$? Is there any theory that is related to this?