The Kalman-Filter yields the optimal estimate of the evolution of the hidden state space variable $\mathbf{X}_{t}$ from a sequence observations $\left\{ \mathbf{Y}_{t}\right\} _{t=1\ldots T}$ where \begin{align} \mathbf{X}_{t+1}&=A\mathbf{X}_{t}+\boldsymbol{\varepsilon}_{t}\\ \mathbf{Y}_{t} &=C\mathbf{X}_{t}+\boldsymbol{\delta}_{t} \end{align} where $\boldsymbol{\varepsilon}_{t}$ and $\boldsymbol{\delta}_{t}$ represent the system and measurement noise (independent, Gaussian etc.). The Kalman-Filter admits even time-dependent matrices $A\to A_{t}$, $C\to C_{t}$ etc.
My question is the following:
I'm interested in a system with modified observation equation like
\begin{align}
\mathbf{X}_{t+1} &=A\mathbf{X}_{t}+\boldsymbol{\varepsilon}_{t}\\
\mathbf{Y}_{t} &=C\left(\mathbf{X}_{t}-\mathbf{X}_{t-\tau}\right)+\boldsymbol{\delta}_{t}
\end{align}
or slightly more general
\begin{align}\mathbf{X}_{t+1} &=A\mathbf{X}_{t}+\boldsymbol{\varepsilon}_{t}\\
\mathbf{Y}_{t} &=C_{0}\mathbf{X}_{t}+C_{1}\mathbf{X}_{t-\tau}+\boldsymbol{\delta}_{t}
\end{align}
for a fixed value of $\tau\in\mathbb{N}$. System dynamics and observation are still linear, noise still Gaussian, only the observation “mixes” different times. Can the Kalman-Filter applied to this situation as well? How does the Kalman gain matrix looks like in this situation? Does anyone know about relevant literature? Any help is appreciated.
What I tried so far:
I tried to simply introduce the new state space vector
$$\mathbf{Z}_{t}=\left[\begin{array}{c}
\mathbf{X}_{t}\\
\mathbf{X}_{t-\tau}
\end{array}\right]$$
such that one could write
$$\mathbf{Z}_{t+1} =\left[\begin{array}{cc}
A & 0\\
0 & A
\end{array}\right]\mathbf{Z}_{t}+\left[\begin{array}{c}
\boldsymbol{\varepsilon}_{t}\\
\boldsymbol{\varepsilon}_{t-\tau}
\end{array}\right]$$
with the usual observation equation
$$\mathbf{Y}_{t}=\left[\begin{array}{cc}
C, & -C\end{array}\right]\mathbf{Z}_{t}+\boldsymbol{\delta}_{t}.$$
The problem, however, seems to be that in this case the noise in the equation for $\mathbf{Z}_{t}$ is no-more white noise (non-Markovian noise with non-trivial temporal correlation). So that doesn't seem to be a viable way ...