Improper Lebesgue prior normalization in Bayesian filtering

113 Views Asked by At

Suppose we have a conditionally Gaussian Linear State Space Model (CGLSSM) where $Y_t=(X_t,S_t)_{t \in \mathbb{N}}$ is the Markov chain of hidden states, where for each $t \in \mathbb{N}$, $S_t \in \{0,1\}$ is a discrete variable whose evolution is described by the known probability $\mathbb{P}(S_{t+1}|S_t)$ and where the evolution of $X_t \in \mathbb{R}^d$ is given by the recursion $X_{t+1}=F X_{t} + U_t$, where $(U_t)_{t \in \mathbb{N}}$ is a sequence of mutually independent variables such as $U_t \sim \mathcal{N}(0,\Sigma)$ and $F$ is a $d\times d$ matrix.

The observations $(Z_t)_{t \in \mathbb{N}}$ are given by the relation $Z_t = H X_t + V_t + M(S_t) W_t$ where $(V_t)_{t \in \mathbb{N}}$ is a sequence of mutually independent variables such as $V_t \sim N(0,C)$ is a gaussian variable, $(W_t)_{t \in \mathbb{N}}$ is a sequence of mutually independent variables such as $W_t$ follows an improper Lebesgue law on $\mathbb{R}^k$ where $k \leq m$.

The dimension of the observation $Z_t \in \mathbb{R}^m$ is such that $m<d$. Depending on the value of $S_t$, the matrix $M(0)$ is null and $M(1)$ is a $m \times k$ matrix.

This model is designed to be robust to observations contaminated by arbitrary large outliers lying in the image of $M$.

Now, when I generate a sequence $(X_0,...,X_t)$ from this model and when I generate observations from this sequence according to $Z_t = H X_t + V_t$, i.e. without any Lebesgue nuisance (or equivalently by forcing $S_t =0$), I expect the filtering recursions to give me $\mathbb{P}(S_t|Z_0,...,Z_t) \approx \mathbb{P}(S_t)$ which is the stationary law of $(S_t)_t$. In other words, since the observations do not enable to distinguish between $S_t = 0$ and $S_t = 1$, I expect all the information about $S_t$ to be given by the prior.

However it is not the case when I do the computations : in practice $\mathbb{P}(S_t|Z_0,...,Z_t)$ is very sensitive to the choice of the transition law $\mathbb{P}(S_{t+1}|S_t)$. The behavior of $t \mapsto \mathbb{P}(S_t|Z_0,...,Z_t)$ is chaotic and seems independent from the behavior of the observations.

Obviously, since the only difference with ordinary CGLSSM is the Lebesgue law, I would like to ask if, in this practical context, someone knows how to choose the "normalization constant" of the Lebesgue law, which I suspect to be the reason of my problem, to obtain $\mathbb{P}(S_t|Z_0,...,Z_t) \approx \mathbb{P}(S_t)$ when $Z_t$ is drawn under $\forall t, S_t =0$.