Let $X_i$ and $X_j$ be two random variables. Let $$\hat{P}(X_i | X_j) = \frac{count(X_i = i\text{ and }X_j = j)}{count(X_j = j)}$$
where $count()$ is the number of observations satisfying a given condition in a finite sample of observations. What is the expected value of $\hat{P}(X_i | X_j)$ ? Is this equal to the conditional probability $P(X_i | X_j)$ ? What is the proof strategy ? The way I am seeing it is the following :
Let $n$ be the sample size. Clearly in a finite sample, it is possible that $count(X_j = j) = 0$ even if $P(X_j = j) \neq 0$. So, should the expectation be taken conditioned on $count(X_j = j) \neq 0$ ? If yes, then one can take expectation w.r.t. all possible values that $count(X_j = j)$ can take (between $1$ and $n$) and similarly all possible values that $count(X_i = i\text{ and }X_j = j)$ can take conditioned on $count(X_j = j)$. This will give binomial expansion kind of expression. Should this strategy give the proof ? I can't really reach anywhere with this strategy.
If your answer is that the expectation is not defined due to the possibility that $count(X_j = j) = 0$, can we atleast discuss conditional expectation given that $count(X_j = j) \neq 0$ ?
Any useful references are also welcome.
The question is exceptionally sloppily phrased but in the end, it seems that what you mean to ask can be formalized as follows.
Let $(X_t,Y_t)_{t\geqslant1}$ denote some i.i.d. sample of a given joint distribution $P_{X,Y}$. Fix some $(x,y)$ and consider, for every $T\geqslant1$, $$U_T(x,y)=\sum_{t=1}^T\mathbf 1_{X_t=x,Y_t=y}\qquad V_T(y)=\sum_{t=1}^T\mathbf 1_{Y_t=y}$$ Then you are asking for the properties of $$\frac{U_T(x,y)}{V_T(y)}$$ possibly for large values of $T$, possibly only on the event $[V_T(y)\ne0]$.
The most classical version of the law of large numbers indicates that $$\lim_{T\to\infty}\frac{U_T(x,y)}T=P(X=x,Y=y)\qquad\lim_{T\to\infty}\frac{V_T(y)}T=P(Y=y)$$ Thus, if $P(Y=y)\ne0$, then, for every $T$ large enough, $V_T(y)\ne0$, and furthermore, $$\lim_{T\to\infty}\frac{U_T(x,y)}{V_T(y)}=P(X=x\mid Y=y)$$ All the limits above are in the almost sure sense. The expected values, by contrast, have no easy form, as witnessed by the fact that, for every $T$, $$E\left(\frac{U_T(x,y)}{V_T(y)}\right)$$ simply does not exist if $P(Y=0)\ne0$.
Assuming that $P(Y=0)=0$ guarantees the ratios do exist, almost surely. There is still no general, explicit, formula for their expectations for finite values of $T$ but, by domination, the almost sure convergence mentioned above implies that, indeed, $$\lim_{T\to\infty}E\left(\frac{U_T(x,y)}{V_T(y)}\right)=P(X=x\mid Y=y)$$