Suppose that, given a vector of parameters $\mathbf{x}$, we know the conditional probability of some statement $S$ being true, $P(S|\mathbf{x})$. However, $\mathbf{x}$ itself is a random variable with probability density $\rho(\mathbf{x})$. Then it seems reasonable (to me) that the probability of $S$ given the probability density $\rho$ is $$ P(S|\rho) = \int_\Omega P(S|\mathbf{x}) \, \rho(\mathbf{x}) \, d{\mathbf{x}} $$ where $\Omega = \operatorname{supp} \rho$. However, if $P(S|\mathbf{x})$ is only known numerically then we may wish to sample $N$ values of $\mathbf{x}$ and compute the average from the samples. Denote the probability density of $\mathbf{x}$ being a sample point by $\mu(\mathbf{x})$, where $\operatorname{supp} \mu \supseteq \Omega$, then it seems reasonable (to me) that $$ P(S|\rho) = \lim_{N\to\infty} \frac{1}{N} \sum_{n=1}^N P(S|\mathbf{x}_n) \, \frac{\rho(\mathbf{x}_n)}{\mu(\mathbf{x}_n)} $$ where $\mathbf{x}_n$ is the sample. Then an approximation can be found by using some large but finite $N$, I suppose this would be a 'Monte Carlo' approach.
Above I have said the phrase 'it seems reasonable (to me)' before each of the display equations. I have two questions:
- Are these equations correct?
- How would they be proved?
I think the first of the equations should be straightforward from the definition of probability densities, but I can't see it yet. The second I'm not sure how I would prove, but some alteration to the notation will be needed to account for the 'almost surely' nature of the convergence. Any advise would be appreciated.