How to find the distribution of the following random variable?

51 Views Asked by At

I am studying a state-space model which is described as follows: \begin{align*} \boldsymbol{l}_t = \boldsymbol{F_t}\boldsymbol{l}_{t-1}+\boldsymbol{g}_t\varepsilon_t, \quad \varepsilon_t\sim \mathcal{N}(0,1)\\ z_t = \boldsymbol{a}_t^T\boldsymbol{l}_{t}+b_t+\sigma_t\epsilon_t, \quad \epsilon\sim \mathcal{N}(0,1). \end{align*} We initially assume that we have $\boldsymbol{l}_0\sim \mathcal{N}(\boldsymbol{\mu}_0,\boldsymbol{\Sigma}_0)$ where $I$ is the identity matrix. My goal is to find the distribution of $$p(z_t|z_{1:t-1}) \sim \mathcal{N}(\boldsymbol{\mu}_t,\boldsymbol{\Sigma}_t).$$ We assume that we can obtain the following filtered Gaussian distributions using Kalman Filter $$p(\boldsymbol{l}_{t-1}|z_{1:t})\sim \mathcal{N}(\boldsymbol{f}_t, \boldsymbol{S}_t).$$

For time $t=1$, it is easy, I have computed that, $$\boldsymbol{\mu}_1 = \boldsymbol{a}_1^T\boldsymbol{\mu}_0, \quad \boldsymbol{\Sigma}_1 = \boldsymbol{a}_1^T\boldsymbol{\Sigma}_0\boldsymbol{a}_1 + \sigma_1^2.$$ I am not sure how to find the result for $t>1.$ Any hints will be much appreciated. This calculation is done in this paper (in the Supplemental). I have attached a relevant picture of the result below. (Ignore the superscript $(i)$). enter image description here

1

There are 1 best solutions below

0
On

It is given that

\begin{align} p(\boldsymbol{l}_{t-1}|z_{1:t}) \sim \mathcal{N}(\boldsymbol{f}_t, \boldsymbol{S}_t), \tag{1} \\ \boldsymbol{l}_t = \boldsymbol{F}_t\,\boldsymbol{l}_{t-1} + \boldsymbol{g}_t\,\varepsilon_t, \quad \varepsilon_t \sim \mathcal{N}(0,1), \tag{2} \\ z_t = \boldsymbol{a}_t^\top\,\boldsymbol{l}_{t} + b_t + \sigma_t\,\epsilon_t, \quad \epsilon_t\sim \mathcal{N}(0,1). \tag{3} \end{align}

In order to find $p(z_t|z_{1:t-1}) \sim \mathcal{N}(\boldsymbol{\mu}_t,\boldsymbol{\Sigma}_t)$ one can to translate $(1)$ in time such that it uses the same information $z_{1:t-1}$, yielding $p(\boldsymbol{l}_{t-2}|z_{1:t-1})\sim \mathcal{N}(\boldsymbol{f}_{t-1}, \boldsymbol{S}_{t-1})$. However, from $(3)$ it can be seen that $\boldsymbol{l}_t$ is need to calculate $z_t$. In order to obtain $p(\boldsymbol{l}_{t}|z_{1:t-1})$ from $p(\boldsymbol{l}_{t-2}|z_{1:t-1})$ requires doing two prediction steps using $(2)$.

For these prediction steps it is useful to know that the mean $\mu$ and variance $\Sigma$ of a distribution $p(x)$ are defined as $\mu = E[x]$ and $\Sigma = \text{Var}(x) = E[(x - \mu)\,(x - \mu)^\top]$ respectively, where $E[y]$ means the expected value of $y$. A related concept to the variance is the covariance, which is defined as $\text{Cov}(x,y) = E[(x - E[x])\,(y - E[y])^\top]$. Using this and the assumption that the covariance between $\varepsilon_t$ and all other distributions is zero then it can be shown that $p(\boldsymbol{l}_{t-1}|z_{1:t-1})$ has a mean $\mu$ and variance $\Sigma$ of

\begin{align} \mu =& E[\boldsymbol{F}_{t-1}\,\boldsymbol{l}_{t-2} + \boldsymbol{g}_{t-1}\,\varepsilon_{t-1}|z_{1:t-1}] \\ =& \boldsymbol{F}_{t-1}\,E[\boldsymbol{l}_{t-2}|z_{1:t-1}] + \boldsymbol{g}_{t-1}\,E[\varepsilon_{t-1}|z_{1:t-1}] \\ =& \boldsymbol{F}_{t-1}\,\boldsymbol{f}_{t-1} \\ \Sigma =& E[(\boldsymbol{F}_{t-1}\,(\boldsymbol{l}_{t-2} - \boldsymbol{f}_{t-1}|z_{1:t-1}) + \boldsymbol{g}_{t-1}\,\varepsilon_{t-1})\, (\boldsymbol{F}_{t-1}\,(\boldsymbol{l}_{t-2} - \boldsymbol{f}_{t-1}|z_{1:t-1}) + \boldsymbol{g}_{t-1}\,\varepsilon_{t-1})^\top] \\ =& \boldsymbol{F}_{t-1} \text{Var}(\boldsymbol{l}_{t-2}|z_{1:t-1}) \boldsymbol{F}_{t-1}^\top + \boldsymbol{F}_{t-1} \text{Cov}(\boldsymbol{l}_{t-2}|z_{1:t-1},\varepsilon_{t-1}) \boldsymbol{g}_{t-1}^\top + \boldsymbol{g}_{t-1} \text{Cov}(\varepsilon_{t-1},\boldsymbol{l}_{t-2}|z_{1:t-1}) \boldsymbol{F}_{t-1}^\top + \boldsymbol{g}_{t-1}\text{Var}(\varepsilon_{t-1})\boldsymbol{g}_{t-1}^\top \\ =& \boldsymbol{F}_{t-1}\,S_{t-1}\,\boldsymbol{F}_{t-1}^\top + \boldsymbol{g}_{t-1}\,\boldsymbol{g}_{t-1}^\top \end{align}

Similarly, the next prediction step can be shown to give

$$ p(\boldsymbol{l}_{t}|z_{1:t-1}) \sim \mathcal{N}(\boldsymbol{F}_{t}\,\boldsymbol{F}_{t-1}\,\boldsymbol{f}_{t-1}, \boldsymbol{F}_{t}(\boldsymbol{F}_{t-1}\,\boldsymbol{S}_{t-1}\,\boldsymbol{F}_{t-1}^\top + \boldsymbol{g}_{t-1}\,\boldsymbol{g}_{t-1}^\top)\boldsymbol{F}_{t}^\top + \boldsymbol{g}_{t}\,\boldsymbol{g}_{t}^\top) $$

To simplify the final expressions I define this last distribution as $p(\boldsymbol{l}_{t}|z_{1:t-1}) \sim \mathcal{N}(\hat{\boldsymbol{l}}_{t|1:t-1}, \Sigma_{t|1:t-1})$ with

$$ \left\{ \begin{array}{l} \hat{\boldsymbol{l}}_{t|1:t-1} := \boldsymbol{F}_{t}\,\boldsymbol{F}_{t-1}\,\boldsymbol{f}_{t-1}, \\ \Sigma_{t|1:t-1} := \boldsymbol{F}_{t}(\boldsymbol{F}_{t-1}\,\boldsymbol{S}_{t-1}\,\boldsymbol{F}_{t-1}^\top + \boldsymbol{g}_{t-1}\,\boldsymbol{g}_{t-1}^\top)\boldsymbol{F}_{t}^\top + \boldsymbol{g}_{t}\,\boldsymbol{g}_{t}^\top. \end{array}\right. \tag{4} $$

The distribution that you are interested in can be obtained by substituting $p(\boldsymbol{l}_{t}|z_{1:t-1})$ using $(4)$ into $(3)$ yielding

$$ p(z_t|z_{1:t-1}) \sim \mathcal{N}(\boldsymbol{a}_t^\top\,\hat{\boldsymbol{l}}_{t|1:t-1} + b_t, \boldsymbol{a}_t^\top\,\Sigma_{t|1:t-1}\,\boldsymbol{a}_t + \sigma_t\,\sigma_t^\top). \tag{5} $$

However, the expressions from the paper seems to do one less prediction step, so maybe $(1)$ should instead be $p(\boldsymbol{l}_{t}|z_{1:t}) \sim \mathcal{N}(\boldsymbol{f}_t, \boldsymbol{S}_t)$. Using this instead of $(1)$ requires only one prediction step, changing $(4)$ to

$$ \left\{ \begin{array}{l} \hat{\boldsymbol{l}}_{t|1:t-1} := \boldsymbol{F}_{t}\,\boldsymbol{f}_{t-1}, \\ \Sigma_{t|1:t-1} := \boldsymbol{F}_{t}\,\boldsymbol{S}_{t-1}\,\boldsymbol{F}_{t}^\top + \boldsymbol{g}_{t}\,\boldsymbol{g}_{t}^\top. \end{array}\right. \tag{6} $$

Using $(6)$ instead of $(4)$ in $(5)$ almost completely agrees with the expressions in the paper. The only things that seems to be different from the equation in the paper are the time index in $S_{t-1}$ and the $b_t$ term, which should definitely be included unless it is somewhere assumed to be zero.