I am sitting in front of the following statistical problem in which I am interested in mostly for an economic application I am working on:
Suppose that you want to estimate some parameter, say the mean $\mu$, of a distribution. For convenience assume that this distribution is normal. You receive one signal about the mean in each of $t=1,2,...$ periods. The signals are drawn from two different distributions that occur with a frequency of $\beta$ and $1-\beta$, respectively. Again for convenience, suppose that the first type of signal is distributed $$ s_1\sim N(\mu,\sigma_1^2). $$ The second signal instead is distributed according to $$ s_2\sim N(\mu+b,\sigma_2^2). $$ Suppose that your model is "occasionally" misspecified in the following sense. While you can distinguish between a signal drawn from the first vs. the second distribution, you have an incorrect belief about the "bias" $b$ of the second signal, which is given by $\tilde{b}\neq b$. As a result, even as the number of signals tends to infinity, your belief about $\mu$ will not converge to the truth.
But here is the question: What value will it converge to, if at all?
I know from previous literature (e.g., Berk (1966) or White (1981)) that if one would only observe signals drawn from the distribution which one misinterprets, then, given appropriate assumptions, the estimate of $\mu$ would converge to the value that minimizes the Kullback Leibler divergence between the correctly and the incorrectly specified model. What is not obvious to me, however, what happens in the case I described and where I sometimes use a correct and sometimes an incorrect model to interpret my observations.
If anyone either has an immediate answer to this or knows of any related literature, I would be very grateful.
Note that I use the case of normally distributed signals only for clarity and I would, in principle, be interested in a general result.
Cheers!
You can view this as observing a vector $(X, Y)$, where $X$ is $1$ or $2$ with probabilities $\beta$ and $1-\beta$, and the distribution of $Y\mid X$ is given by your $s_X$.
In this view, your model is not occasionally misspecified, it's always misspecified, just in a way that only affects inference when $X = 2$. As such, the usual results about KL divergence apply, just computing the KL divergence for the joint distribution of $(X,Y)$.