I have a process generating sequences of observations $X_1, X_2, \dots, X_n$, where $n$ may vary between different sequences (not all sequences have the same length, but they will all be finite).
Let $\mathcal{X}$ denote the space of all possible observations.
Let $p(x)$ denote the probability that any randomly sampled observation from the distribution of all trajectories equals $x \in \mathcal{X}$.
We can assume that any element $x$ from the space $\mathcal{X}$ can only show up at a single unique "time step" within a sequence, so we do not need to worry about things like the same observation showing up multiple times in a single sequence.
I want to build an estimator for the following quantity:
$$\sum_{x \in \mathcal{X}} \left( \frac{p(x)}{\mathbb{E} \left[ n \mid x \text{ was observed} \right]} \right) = \mathbb{E}_{x \sim p} \left[ \frac{1}{\mathbb{E} \left[ n \mid x \text{ was observed} \right]} \right]$$
So, basically I want an estimator for $1$ divided by the expected duration $n$ of a full sequence given that it contains at least a single particular observation $x$, and then the expectation of that again over all $x$ weighted by how likely they are to be observed.
To build an estimator for this, I have in mind to simply:
- Generate multiple sequences $X_1^{(i)}, X_2^{(i)}, \dots, X_n^{(i)}$, where the superscripts $(i)$ indicates that a sample was from the $i^{th}$ sequence.
- Use $\frac{\sum_{i=1}^M n(i)}{\sum_{i=1}^M n(i)^2}$ as my estimator, where $M$ is the number of sequences generated, and $n(i)$ is the observed sequence length of the $i^{th}$ sequence.
The rationale behind this is that every sequence $i$ gives me $n(i)$ different samples $X_j^{(i)}$, which were all sampled according to $p$ (giving the expectation $\mathbb{E}_{x \sim p}$), and all give the same value $n(i)$ as an estimator for $\mathbb{E} \left[ n \mid x \text{ was observed} \right]$.
Now to my question: my intuition says that this estimator is not unbiased, because it uses many different datapoints from every individual sequence, and within a sequence the data is not independent. However, my intuition says that the estimator is consistent, because if we can sample infinitely many full sequences, we'll get the correct probabilities for observing individual samples as well as full sequences.
How can either or both of these properties be (hopefully easily?) shown to be (in)correct?