I was reading about Kalman filter and want to derive some formulas. Namely, they consider a random process (I am omitting constant shift, since it is trivial)
$$x_{k+1} = {\bf F}_k x_k + v_k$$
where $x_k$ is value of a random vector at step $k$, ${\bf F}_k$ is non-random transition matrix and $v_k$ is a random noise vector.
I want to derive the formula for the covariance of $x_{k+1}$
$${\bf C}_{k+1} = {\rm cov}(x_{k+1}, x^T_{k+1})$$
The answer is
$${\bf C}_{k+1} = {\bf F}_k {\bf C}_k {\bf F}^T_k + {\bf Q}_k$$
where ${\bf Q}_k$ is the noise covariance vector.
$${\bf Q}_{k+1} = {\rm cov}(v_{k+1}, v^T_{k+1})$$
Can anybody please outline me the derivation of this formula or give a reference to such derivation (preferably more detailed)?
I have stumbled at the point of calculating the expectation $E(x_{k+1})$, because I am not sure how to deal with two sources of randomness -- from distribution of $x_k$ itself and from $v_k$.
I was thinking about using the classic formula for the distribution of $x_{k+1}$
$$p(x_{k+1}) = \int P(x_{k+1}|x_{k}) p(x_k)dx_k$$
and then find the expectation from this distribution. So, I wrote
$$P(x_{k+1}|x_{k}) = \delta(x_{k+1} - {\bf F}_k x_k - v_k)$$
Then
$$p(x_{k+1}) = \int \delta(x_{k+1} - {\bf F}_k x_k - v_k) p(x_k)dx_k$$
Without random contribution $v_k$ this would be simple transformation of variables with the result
$$p(x_{k+1}) = \frac{1}{|\det {\bf F}_k|}p({\bf F}^{-1}_k x_{k+1})$$
but I am not sure how to deal with the randov noise $v_k$ since it is an additional source of randomness...
Any help is appreciated (preferable detailed explanation of a method which should be used or a reference to a good book/paper/lecture notes).
Because $x_k $ and $v_k $ are independent, it follow that \begin{equation} \text {cov}(x_{k+1},x_{k+1}^T) = {\bf F}_k \text {cov}(x_{k},x_{k}^T) {\bf F}_k^T + \text {cov}(v_{k},v_{k}^T) \end{equation}
See the properties section here: https://en.wikipedia.org/wiki/Variance#Properties (Bienaymé formula) and observe that it generalizes in a straightforward manner to the Covariance operator: "the covariance of the sum (or the difference) of uncorrelated random vectors is the sum of their covariances"
A good book on the subject of linear estimation and Kalman filters is: "Linear Estimation" by Thomas Kailath, Ali Sayed, and Babak Hassibi.
Now regarding your computations above, observe that the mean of $x$ at any time depends on the assumption on the mean of the initial condition of $x$ and the means of the noise $v$. If $v$ has a zero mean for all $k$ and the initial condition $x_0$ has zero mean, then the state process has zero mean at any $k$. Note that you can write \begin{equation} x_{k+1} = ({\bf F}_{k+1}{\bf F}_{k}\dots {\bf F}_{1}) x_0 + ({\bf F}_{k+1}{\bf F}_{k}\dots{\bf F}_{2} )v_0 + ({\bf F}_{k+1}{\bf F}_{k}\dots{\bf F}_{3}) v_1 + \dots + v_k \end{equation} and therefore,
\begin{equation}\small E[x_{k+1}] = ({\bf F}_{k+1}{\bf F}_{k}\dots {\bf F}_{1}) E[x_0] + ({\bf F}_{k+1}{\bf F}_{k}\dots{\bf F}_{2} )E[v_0] + ({\bf F}_{k+1}{\bf F}_{k}\dots{\bf F}_{3}) E[v_1] + \dots + E[v_k] \end{equation}
where the expectation operator here is with respect to all sources of randomness; that is with respect to the density \begin{equation} p(x_0, v_0, \dots, v_k). \end{equation}
If you want to compute the density $p(x_{k+1})$ you have to marginalize the joint density \begin{equation} p(x_{k+1}, x_k, \dots, x_1, x_0) = p(x_0)\prod_{i=1}^{k+1} p(x_i|x_{i-1}) \end{equation} in which \begin{equation} p(x_i |x_{i-1}) = p_{v_i}(x_i - {\bf F}_{i-1}x_{i-1}) \end{equation} where $p_{v_i}$ denotes the probability density function of the random vector $v_{i}$.
Hope this clarifies it to you.