Can independence go one way? I.e., so that P(A|B) = P(A), but P(B|A) ≠ P(B)

1.6k Views Asked by At

As I understand it, independence of A and B can be informally established by asking whether learning something about one of those events tells you something new about the other. This must be borne out mathematically, however. For example:

If P(A|B) = P(A), then A and B are independent.

And if P(A & B) equals P(A) x P(B), then A and B are independent.

The above imply that P(B|A) = P(B)

It’s this last statement that confuses me, at least in application to certain cases. For example:

You devise a way to randomly choose a number from all real numbers, uniformly distributed. The probability of the chosen number being prime is 0, given that 0% of the reals are prime. Similarly, choosing the number 2 from the set of all real numbers has a probability of 0. Likewise, the probability of choosing a 2 from the set of all prime numbers has a probability of 0. Given that 2 is a prime number, it seems that choosing a 2 and choosing a prime number must be dependent events, at least in one direction (namely, if I know I’ve chosen a 2, then I’m certain I’ve chosen a prime number). Here’s what I mean:

P(2|prime number) = P(2) = 0 (passes for independence)

P(prime number|2) = 1 (i.e., not 0, or P(prime number), and so fails for independence)

But can also test as follows:

P(2 & prime number) = P(2) x P(prime number) = 0 P(2 & prime number) = P(2) x P(prime number|2) = 0 P(2 & prime number) = p(prime number) x P(2|prime number) = 0

Everything here comes out to 0, as I suppose it should. This also aligns with my understanding that anything with probability 0 is independent from any other event. (Right?) And yet, I’m stuck with the intuition that:

If I learn I got a 2, I know I got a prime number, wherein learning I got a prime is insufficient for updating my beliefs about getting a 2 (provided I really do believe that the probability of pulling a 2 from the primes is 0), and the same goes for learning I got an even, a natural, an integer, and so on. Yet I learn I got all those things if I learn I got a 2.

I’ve thought of other examples, though all of them deal with some single event occurring out of a set of infinite possible outcomes. E.g., pulling from the natural numbers: P(2|even) = P(2) = 0; but P(even|2) = 1 (rather than the P(even) = 1/2). So I imagine there’s something I’m naive about in the domain of infinite possible outcomes.

What am I missing?

4

There are 4 best solutions below

0
On

First problem: Choose a number from all real numbers, uniformly distributed? There is no such distribution on $\Bbb R$.

Second problem: If you have any probability space $(\Omega,\mathcal F,P)$ and an event $A\in\mathcal F$, then a probability space $(A,\mathcal F|_A,P')$ is only induced (via $P'(S)=\frac{P(S)}{P(A)}$) if $P(A)>0$.

4
On

From the definition of conditional probability, we have that $$P(A\mid B)P(B) = P(A\cap B) = P(B\mid A)P(A).$$ Rearranging a bit, we see that $$ \frac{P(A\mid B)}{P(A)} = \frac{P(B\mid A)}{P(B)}. $$ $A$ is independent from $B$ iff the left-hand side is equal to $1$, and $B$ is independent from $A$ iff the right-hand side is equal to $1$. They're equal, so independence is symmetric.

6
On
  1. The symmetry of independence is manifest in its definition $P(A\cap B) = P(A)P(B)$ which makes no reference to conditional probability.

  2. Conditional probability is problematic when the conditioning event has probability zero, as evidenced by the division by zero that would occur in the definition $P(A\mid B) = \frac{P(A\cap B)}{P(B)}.$ For it to have meaning, we must be in a situation where a limit of $B$ approaching a null event is understood.

  3. Your example is problematic in other ways. There is no uniform distribution on the real line, nor is there a uniform distribution on the prime numbers. It does not make sense to say the probability that a prime is even is zero. There is a related concept of density, but there is a good reason why density is not regarded as the same thing as probability.
  4. A less problematic variant of your question would be to look at something like a standard normal $X$ and then consider the events $X>0$ and $X=2.$ It does seem a little strange that we have independence since $$P(X>0,X=2) = P(X=2) = 0 =P(X>0)P(X=2) $$ whereas clearly $X=2$ implies $X>0,$ and we want to write something like $$P(X>0\mid X=2)=1$$ even though it's undefined as a conditional probability (and indeed, we can write something like it... see the caveat to point (2) above regarding there being a well-defined limit.) This is probably best viewed as a counterintuitive fact about independence when events have probability zero or one. An event with probability zero or one is always independent of any other event. As a particular extreme case, note this means an event with probability zero or one is independent of itself!
8
On

What a great question!

OK, OK, so you're facing a bit of flak for the setup... but in this case we can somewhat make it rigorous!

Consider a geometric distribution $\text{Pr}(k) = p~(1-p)^{k-1}$ over the natural numbers $\mathbb N^+ = \{1,2,3,\dots\}$. We can build a table of the probabilities that you are interested in for the various $p$ values.

These numbers were obtained by Haskell (if you want to replicate, cabal install primes followed by declaring smallPrimes = takeWhile (< 10000000) primes :: [Int] and then declaring prprime p = sum . reverse . map (\k -> p * (1-p)^(k-1)) $ smallPrimes) so I'd expect there's going to be some round-off errors, but:

   p    | Pr(2)         | Pr(prime)     | Pr(2 & prime) | Pr(2 | prime) | Pr(prime | 2)
--------+---------------+---------------+---------------+---------------+---------------
 0.5    | 0.25          | 0.41468250985 | 0.25          | 0.60287085677 | 1.0
 0.333  | 0.222111      | 0.47466583522 | 0.222111      | 0.46793129718 | 1.0
 0.1    | 0.09          | 0.41253315631 | 0.09          | 0.21816428237 | 1.0
 0.03   | 0.0291        | 0.30889192395 | 0.0291        | 0.09420770743 | 1.0
 0.01   | 0.0099        | 0.24179453736 | 0.0099        | 0.04094385303 | 1.0
 0.003  | 0.002991      | 0.19183171475 | 0.002991      | 0.01559179098 | 1.0
 0.001  | 0.000999      | 0.16020745005 | 0.000999      | 0.00623566507 | 1.0
 0.0003 | 0.00029991    | 0.13500470068 | 0.00029991    | 0.00222147820 | 1.0
 0.0001 | 0.00009999    | 0.11779050920 | 0.00009999    | 0.00084887994 | 1.0

As you go to lower and lower $p$ the distribution becomes flatter and flatter among the first several numbers. We can see some immediate trends: including as you surmise that $\text{Pr}(k\text{ prime})$ is decreasing slowly as it should be, and that $\text{Pr}(k=2|k\text{ prime})$ is decreasing quickly like it should be. What you also notice is the fact that "2 is prime" is reflected directly in the rightmost column and it causes $\text{Pr}(k=2 \cap k \text{ prime}) = \text{Pr}(k = 2)$ directly, which means that unless $\text{Pr}(k\text{ prime}) = 1$ you are totally right to guess that these are dependent variables upon each other.

What you can see from this table is also where you're going wrong in the argument. When you say that $\text{Pr}(k = 2) \approx \text{Pr}(k=2|k\text{ prime})$ that is not borne out by the above data and in fact that $\approx$ sign is consistently off by a factor of $1/\text{Pr}(k\text{ prime}),$ so it starts off wrong by a factor of 2.4 or so and then this just gets worse until by the bottom of this table it is wrong by almost a factor of 10, and just as we know this probability is getting smaller and smaller, we know that this approximation is getting wronger and wronger. In general you would get a higher probability of finding $k=2$ if you knew that you happened to land upon the primes, and they are not truly independent events for any real assignment of probabilities that tries to remain relatively even for the first $N$ numbers before tapering off.