Expected value of empirical conditional frequency

Question

Expected value of empirical conditional frequency

482 Views Asked by Bumbble Comm At 12 Apr 2026 - 10:47

Let $X_i$ and $X_j$ be two random variables. Let $$\hat{P}(X_i | X_j) = \frac{count(X_i = i\text{ and }X_j = j)}{count(X_j = j)}$$

where $count()$ is the number of observations satisfying a given condition in a finite sample of observations. What is the expected value of $\hat{P}(X_i | X_j)$ ? Is this equal to the conditional probability $P(X_i | X_j)$ ? What is the proof strategy ? The way I am seeing it is the following :

Let $n$ be the sample size. Clearly in a finite sample, it is possible that $count(X_j = j) = 0$ even if $P(X_j = j) \neq 0$. So, should the expectation be taken conditioned on $count(X_j = j) \neq 0$ ? If yes, then one can take expectation w.r.t. all possible values that $count(X_j = j)$ can take (between $1$ and $n$) and similarly all possible values that $count(X_i = i\text{ and }X_j = j)$ can take conditioned on $count(X_j = j)$. This will give binomial expansion kind of expression. Should this strategy give the proof ? I can't really reach anywhere with this strategy.

If your answer is that the expectation is not defined due to the possibility that $count(X_j = j) = 0$, can we atleast discuss conditional expectation given that $count(X_j = j) \neq 0$ ?

Any useful references are also welcome.

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2017-09-25 19:27:57

(2017-10-01) "Interesting" downvote, due solely to mathematical reasons, for sure... :-)

The question is exceptionally sloppily phrased but in the end, it seems that what you mean to ask can be formalized as follows.

Let $(X_t,Y_t)_{t\geqslant1}$ denote some i.i.d. sample of a given joint distribution $P_{X,Y}$. Fix some $(x,y)$ and consider, for every $T\geqslant1$, $$U_T(x,y)=\sum_{t=1}^T\mathbf 1_{X_t=x,Y_t=y}\qquad V_T(y)=\sum_{t=1}^T\mathbf 1_{Y_t=y}$$ Then you are asking for the properties of $$\frac{U_T(x,y)}{V_T(y)}$$ possibly for large values of $T$, possibly only on the event $[V_T(y)\ne0]$.

The most classical version of the law of large numbers indicates that $$\lim_{T\to\infty}\frac{U_T(x,y)}T=P(X=x,Y=y)\qquad\lim_{T\to\infty}\frac{V_T(y)}T=P(Y=y)$$ Thus, if $P(Y=y)\ne0$, then, for every $T$ large enough, $V_T(y)\ne0$, and furthermore, $$\lim_{T\to\infty}\frac{U_T(x,y)}{V_T(y)}=P(X=x\mid Y=y)$$ All the limits above are in the almost sure sense. The expected values, by contrast, have no easy form, as witnessed by the fact that, for every $T$, $$E\left(\frac{U_T(x,y)}{V_T(y)}\right)$$ simply does not exist if $P(Y=0)\ne0$.

Assuming that $P(Y=0)=0$ guarantees the ratios do exist, almost surely. There is still no general, explicit, formula for their expectations for finite values of $T$ but, by domination, the almost sure convergence mentioned above implies that, indeed, $$\lim_{T\to\infty}E\left(\frac{U_T(x,y)}{V_T(y)}\right)=P(X=x\mid Y=y)$$

Expected value of empirical conditional frequency

There are 1 best solutions below

Related Questions in PROBABILITY

Related Questions in PROBABILITY-THEORY

Related Questions in RANDOM-VARIABLES

Related Questions in EXPECTATION

Related Questions in CONDITIONAL-EXPECTATION

Trending Questions

Popular # Hahtags

Popular Questions