Consider discrete time-invariant MIMO systems with a multidimensional hidden state (or simply state) as the recursive system
$$ h_{t+1}=Ah_{t}+Bx_t+\eta_t $$ $$ y_t=Ch_t+Dx_t+\xi_t $$
Where $h_t$ is the hidden state, $x_t$ is the input, $y_t$ is the output, and $\eta_t,\xi_t$ are noise vectors.
Show that if for $t \leq 0$ $x_t,\eta_t,\xi_t$ be the zero vectors of appropriate dimension, then the output at time $t$ is as follows
$$ y_t = \sum_{i=1}^{t-1}CA^i(Bx_{t-i}+\eta_{t-i})+CA^th_0+Dx_t+\xi_t $$
For more information, please refer to the following paper by Elad Hazan page 3:
Learning Linear Dynamical Systems via Spectral Filtering
To derive this I started by the following
$$ t=0 $$ $$ h_{1}=Ah_{0}+Bx_0+\eta_0=Ah_{0} $$ $$ y_1=Ch_1+Dx_1+\xi_1=C(Ah_{0})+Dx_1+\xi_1=CAh_{0}+Dx_1+\xi_1 $$
which is consistence with $$ y_1 = \sum_{i=1}^{1-1}CA^i(Bx_{1-i}+\eta_{1-i})+CA^1h_0+Dx_1+\xi_1=CAh_{0}+Dx_1+\xi_1 $$
$$ t=1 $$ $$ h_{2}=Ah_{1}+Bx_1+\eta_1=A(Ah_{0})+Bx_1+\eta_1=A^2h_{0}+Bx_1+\eta_1 $$ $$ y_2=Ch_2+Dx_2+\xi_2=C(A^2h_{0}+Bx_1+\eta_1)+Dx_2+\xi_2 $$ $$ y_2=CA^2h_{0}+CBx_1+C\eta_1+Dx_2+\xi_2 $$
which is not consistence with $$ y_2 = \sum_{i=1}^{2-1}CA^i(Bx_{2-i}+\eta_{2-i})+CA^2h_0+Dx_2+\xi_2=CABx_1+\eta_1+CA^2h_0+Dx_2+\xi_2 $$
Could you please tell me why I have extra $A$ when I use the recursive formulas but the paper's equality does not have it?
I think the paper might have a typo with the indexing in the equation, namely it should be
$$ y_t = C\,A^t h_0 + D\,x_t + \xi_t + \sum_{i=0}^{t-1} C\,A^i (B\,x_{t-i-1} + \eta_{t-i-1}), \tag{1} $$
which gives
\begin{align} y_0 &= C\,h_0 + D\,x_0 + \xi_0, \\ y_1 &= C\,A\,h_0 + D\,x_1 + \xi_1 + C(B\,x_0 + \eta_0), \\ y_2 &= C\,A^2\,h_0 + D\,x_2 + \xi_2 + C(B\,x_1 + \eta_1) + C\,A(B\,x_0 + \eta_0). \end{align}
Equation $(1)$ can also be rewritten such that it is easier to exclude the contribution of $x_0$ and $\eta_0$ be reversing the order of the summation
$$ y_t = C\,A^t h_0 + D\,x_t + \xi_t + \sum_{i=0}^{t-1} C\,A^{t-i-1} (B\,x_i + \eta_i), \tag{2} $$
which also agrees with section 4.1.3 of these notes. In your case you could use
$$ y_t = C\,A^t h_0 + D\,x_t + \xi_t + \sum_{i=1}^{t-1} C\,A^{t-i-1} (B\,x_i + \eta_i). \tag{3} $$