Probability of complement as fraction

45 Views Asked by At

I am trying to work through the paper "Repairing Neural Networks by Leaving the Right Past Behind" (arxiv). And really struggle working through the mathematics. The paper states that the key idea is that they can express:

$$ p(\mathcal{D}\setminus\mathcal{C} | \theta) = p(\mathcal{D}|\theta) / p(\mathcal{C}|\theta), \quad\forall\mathcal{C} \subset \mathcal{D} $$

This is possible due to the "i.i.d. modelling assumption".

I tried to understand this formulation with my (limited) intuition, reformulations and even by placing sets and calculating the conditional probabilities by hand and neither matches the above formulation.

Under what condition is the above formulation correct and what is the intuition behind it?

1

There are 1 best solutions below

0
On

The symbols, $\cal D, C$, represent sets of data taken from independent and identically distributed sampling (from some distribution with parameter $\theta$).

Therefore, the data in $\cal D\smallsetminus C$ is conditionally independent from the data in $\cal D\cap C$ for given $\theta$, since these parts of $\cal D$ are disjoint.

Further, the text is specifying that $\cal \forall C\subset D$ , which means that: $\cal C = D\cap C$.

And so we have this:

$$\begin{align}p(\mathcal D\mid\theta) &=p(\mathcal{(D\smallsetminus C)\cup(D\cap C)}\mid\theta)&&\text{by definition of the union}\\ &=p(\mathcal{D\smallsetminus C}\mid\theta)\cdot p(\mathcal{D\cap C}\mid\theta)&&\text{by independence} \textit{ of the data} \text{ given } \theta\\ &=p(\mathcal{D\smallsetminus C}\mid\theta)\cdot p(\mathcal C\mid\theta)&&\text{when }\mathcal{C\subset D}\\[2ex]\therefore\quad p(\mathcal{D\smallsetminus C}\mid\theta) &= p(\mathcal D\mid\theta) / p(\mathcal C\mid\theta)&&\forall \mathcal{C\subset D}\end{align}$$

That is all.


$p(\mathcal E\mid\theta)$ is the probability for obtaining data points in the $\lvert\mathcal E\rvert$ trials given identical parameter $\theta$.

To clarify: the sets are not events, the data points are, and the sets are a conjunction of events from separate trials; so a union of sets of data is the conjunction of those events.

Thus for independent sets of data, $\mathcal E$ and $\mathcal F$, we will have $p(\mathcal{E\cup F}\mid\theta)=p(\mathcal E\mid\theta)\cdot p(\mathcal F\mid\theta)$ .