I'm trying to take a Baysian approach to Hypotheisis testing but I need a bit of help formalizing what claims I can actually make.
Let's assume I have two datasets $X$ and $Y$ that each consist of $N$ identical draws unknown distributions $P$ and $Q$ respectively. My ultimate question is to find some quantity which tells me how possible it is that $P\neq_d Q$. Here is my approach:
We imagine that there is some deterministic function $f$ which outputs $1$ on distribution $P$ w.p $p$ (otherwise 0) and outputs $1$ distribution $Q$ w.p $q$ (otherwise 0). We also define $F$ as $F(X):=\sum_{x\in X}f(x)$ i.e the total number of samples where $f(x)=1$ in some dataset $X$. Hence
$$F(X)\sim \text{Binomial}(N, p)\text{ and }F(Y)\sim \text{Binomial}(N, q)$$
If $P =_d Q$ then of course $p=q$ and $F(X)=_d F(Y)$. What I have calculated is that assuming a uniform prior over $p$ and $q$ what is the probability that $q>p$ given $F(X)=n$ and $F(Y)=m$
The solution is really large and not that important for the fundamenetal question I have but here it is:
$$Pr[q>p|F(X)=n, F(Y)=m]=(N+1)^2 n! (m+n+1)! (N-m)! \binom{N}{m} \binom{N}{n} \, _3\tilde{F}_2(n+1,m+n+2,n-N;n+2,n+N+3;1)$$
$_p\tilde{F}_q$ is the Regularized Hypergeometric Function
So my actual question is does this tell us, from a Baysian point of view, anything about whether $P=_dQ$ or not?
Intuitively if we knew the true values of $p$ and $q$ we could say
$$p\neq q \implies P\neq_d Q $$
but if we only know $Pr[q > p] = \alpha$ can we write some bound in the form $Pr[P\neq_d Q]\geq \alpha$? I would think not since there is no well defined probability space here. So is there any rigourous way to really reason about if two distributions are equal or not given only some statsitics calculated on finite samples? Are we stuck being a frequentist and computing a $p$-value :(
To be a true Bayesian would we need to put a prior over some set of possible distributions $P$ and $Q$ can take on? If they are high dimensional an unknown are we basically completely out of luck to do any rigorous two sample tests even if we had acess to the $f$ which optimally distingsuihes between them?
Anyways I'm just a naiive student trying to discover some intution for fundamental ideas in statistics so any older wise MSE member can feel free to roast me.
Thanks
ps happy to provide more details on the long result in eq 2 if anyone is interested (if this is a known result in some textbook I would also appreciate a reference).