A linear, discrete-time, stationary, state-space model is a pair of real valued stochastic processes $\{X_t \}_{t \in \mathbb{N}},\{Y_t\}_{t \in \mathbb{N}}$ that obey the recursive equations $$ \begin{cases} X_{t+1} = F X_t + v_t \ &t=0,1,2\dots \\ Y_t = H X_t + w_t \ &t=0,1,2\dots \end{cases} $$ where:
- $F \in \mathbb{R}, H \in \mathbb{R}$
- $v_t, w_t$ are random variables (additive noise) which admit a PDF (Probability Density Function);
The noise terms are zero-mean and: $$ \forall t_1 \neq t_2 : \ v_{t_1} \perp v_{t_2}, w_{t_1} \perp w_{t_2} \\ \forall t_1, t_2 : \ v_{t_1} \perp w_{t_2} \\ \forall t : \ E[v_t^2] = Q, E[w_t^2] = R $$ where the simbol $X \perp Y$ means that $X$ and $Y$ are independent and $Q,R$ are assumed to be positive real numbers. The initial condition of the recursion $x_0$ is a fixed real number. It seems to me it is always possible in principal (at least numerically) to calculate $$ f( x_t | Y_{1:t-1}) $$ where $f(\cdot)$ denotes the PDF of $X_t$, $X_{0:t-1} = (X_{t-1}, X_t, \dots, x_0)$, $Y_{1:t} = (Y_t, Y_{t-1},\dots,Y_1)$.
As an example take $t=3$, then
$$P(X_3 < x_3, Y_1 < y_1, Y_2 < y_2) = $$
$$P(F^2x_0 + F v_0 + v_1 < x_3, H F x_0 + Hv_0w_1< y_1, Hx_0 + w_0 < y_2)$$
And from the independence of $v_0,v_1,w_0$ this is equal to
$$ \int_A f_{V_0}(v_0)f_{V_1}(v_1)f_{W_0}(w_0) dv_0 dv_1 dw_0 $$
where $A:= \{ (v_0,v_1,w_0 \} \in R^3 | F^2x_0 + F v_0 + v_1 < x_3, H F x_0 + Hv_0w_1< y_1, Hx_0 + w_0 < y_2 \} $
So it's possible to obtain the join density $f(x_3,y_1,y_2)$ by taking the partial derivatives with respect to $v_0,v_1,w_0$.
At this point $$f( x_3 | Y_1, Y_2) = \frac{f(x_3,y_1,y_2)}{f(y_1,y_2)} $$ and we know the numerator, for the denominator we simply take another integral in $x_3$ over the obtained joint density and we are finished.
Is this reasoning correct? If my reasoning is correct why do I often see assumed that $f( x_t | Y_{1:t-1})$ is normally distributed? even if both $v_t$ and $w_t$ are Gaussian I don't think this follows. Is there a motivation (maybe coming from some central limit argument) behind this assumption?