Given two mutually exclusive events $A$ and $B$ where $\mathbb{P}(X=A)=\alpha$ and $\mathbb{P}(X=B)=\beta\ \ (=1-\alpha)$ suppose we want to estimate $\alpha$. However we are only given samples from $(X,Y)$ (without knowledge of whether $X=A$ or $B$) for the values $C_k$ where the marginals $C_{A}=Y|X=A$ and $C_B=Y|X=B$ satisfy
$\mathbb{P}(C_A=C_k)=p_k=1/N,$ (uniform) where $k=1,...,N$
$\mathbb{P}(C_B=C_k)=q_k$ for $k=1,...,N.$
Obviously if $q_k$ is close to $p_k$ for all $k$ we cannot estimate $\alpha$ since the later stage samples are identically distributed. But if $q_k$ and $p_k$ differ substantially you should get a good estimate. Is anyone aware of a documented solution for this problem, or feel they can come up with a good estimate?
It should be a well documented problem I expect but I am not that comfortable with sample bias statistics. The probability estimates should depend on the number of samples $m$ and the differences between the probabilities $p_k$ and $q_k$.
P.S. If anyone believes there is need for further clarification, please let me know. I am trying to mathematically interpret the problem of estimating the number of samples from one of two datasets and where each set takes values with different probability compared to each other.
I've thought about this more and if you look in the limit you can recover $\alpha$ exactly of course. The empirical probability you get from samples $r_k:=\mathbb{P}_e(Y=C_k)$ converges to the actual probability $\mathbb{P}(Y=C_k)$ by LLN.
$$\mathbb{P}(Y=C_k)=\mathbb{P}(X=A)\cdot\mathbb{P}(Y=C_k|X=A)+\mathbb{P}(X=B)\cdot\mathbb{P}(Y=C_k|X=B)$$
$$\mathbb{P}(Y=C_k)=\alpha/N+(1-\alpha)q_k$$
So if you look at the estimate $r_k\sim\alpha/N+(1-\alpha)q_k$ and solve for $\alpha$ you should get a good estimate:
$$\alpha\sim \frac{r_k-q_k}{\frac{1}{N}-q_k}.$$ The probability that $\alpha$ differs from this depends on the convergence behavior for LLN with discrete random variables $(X,Y)$. I guess this kinda thing is well-known by people, but I've forgotten what the optimal bounds are here.