Let $(X_t) = (X^1_t, \cdots, X^d_t)$ be a time series in $\mathbb{R}^d$ with covariance matrix $C_t := \big(\mathrm{Cov}[X^i_t, X^j_t]\big)_{i,j}$.
Suppose that $X\equiv(X_t)$ satisfies some reasonable stationarity/ergodicity assumption, say weak-sense stationarity.
Is it then typically possible to, for a fixed $t$, reliably estimate $C_t$ from empirical covariances if the components $X^i$ of $X$ aren't sampled synchronously?
That is, if there are time-delays between the different components of the data, so that instead of $$(X^1_{t_j^1}, \cdots, X^d_{t^d_j}) \quad \text{ with } \quad t^1_j=\ldots=t^d_j \quad \forall\, j$$
the available data for the estimation of $C_t$ is given as
$$(X^1_{t^1_j}, \cdots, X^d_{t^d_j}) \quad \text{ with } \quad t^k_j\neq t^l_j \quad (k\neq l) \quad \text{for most } j \ ?$$
References welcome.