Estimating probability of event given marginal information for discrete random variables

Question

Estimating probability of event given marginal information for discrete random variables

34 Views Asked by Bumbble Comm At 03 Apr 2026 - 9:36

Given two mutually exclusive events $A$ and $B$ where $\mathbb{P}(X=A)=\alpha$ and $\mathbb{P}(X=B)=\beta\ \ (=1-\alpha)$ suppose we want to estimate $\alpha$. However we are only given samples from $(X,Y)$ (without knowledge of whether $X=A$ or $B$) for the values $C_k$ where the marginals $C_{A}=Y|X=A$ and $C_B=Y|X=B$ satisfy

$\mathbb{P}(C_A=C_k)=p_k=1/N,$ (uniform) where $k=1,...,N$

$\mathbb{P}(C_B=C_k)=q_k$ for $k=1,...,N.$

Obviously if $q_k$ is close to $p_k$ for all $k$ we cannot estimate $\alpha$ since the later stage samples are identically distributed. But if $q_k$ and $p_k$ differ substantially you should get a good estimate. Is anyone aware of a documented solution for this problem, or feel they can come up with a good estimate?

It should be a well documented problem I expect but I am not that comfortable with sample bias statistics. The probability estimates should depend on the number of samples $m$ and the differences between the probabilities $p_k$ and $q_k$.

P.S. If anyone believes there is need for further clarification, please let me know. I am trying to mathematically interpret the problem of estimating the number of samples from one of two datasets and where each set takes values with different probability compared to each other.

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2021-08-25 23:17:10

I've thought about this more and if you look in the limit you can recover $\alpha$ exactly of course. The empirical probability you get from samples $r_k:=\mathbb{P}_e(Y=C_k)$ converges to the actual probability $\mathbb{P}(Y=C_k)$ by LLN.

$$\mathbb{P}(Y=C_k)=\mathbb{P}(X=A)\cdot\mathbb{P}(Y=C_k|X=A)+\mathbb{P}(X=B)\cdot\mathbb{P}(Y=C_k|X=B)$$

$$\mathbb{P}(Y=C_k)=\alpha/N+(1-\alpha)q_k$$

So if you look at the estimate $r_k\sim\alpha/N+(1-\alpha)q_k$ and solve for $\alpha$ you should get a good estimate:

$$\alpha\sim \frac{r_k-q_k}{\frac{1}{N}-q_k}.$$ The probability that $\alpha$ differs from this depends on the convergence behavior for LLN with discrete random variables $(X,Y)$. I guess this kinda thing is well-known by people, but I've forgotten what the optimal bounds are here.

Estimating probability of event given marginal information for discrete random variables

There are 1 best solutions below

Related Questions in PROBABILITY

Related Questions in STATISTICS

Related Questions in PARAMETER-ESTIMATION

Trending Questions

Popular # Hahtags

Popular Questions