What is the conceptual motivation for Bayes' Theorem over this other equation?

Question

What is the conceptual motivation for Bayes' Theorem over this other equation?

77 Views Asked by Bumbble Comm At 28 Mar 2026 - 8:14

According to Wikipedia...

With Bayesian probability interpretation, [Bayes'] theorem expresses how a degree of belief, expressed as a probability, should rationally change to account for the availability of related evidence.

Suppose I want to relate $\Pr(W\ |\ C)$, a degree of rational belief in $W$ after evidence $C$ becomes available, back to $\Pr(W)$, a degree of rational belief in $W$ before evidence $C$ becomes available.

Bayes' Theorem is just a double-application of the conditional probability formula, thus...

$\boldsymbol{\Pr(W\ |\ C)}\ =\ \frac{\Pr(W\ \cap\ C)}{\Pr(C)} = \frac{\boldsymbol{\Pr(W)}\Pr(C\ |\ W)}{\Pr(C)}$

Alternatively however, I could apply the conditional probability formula once, and then the inclusion-exclusion formula in the numerator to obtain...

$\boldsymbol{\Pr(W\ |\ C)} = \frac{\boldsymbol{\Pr(W)}\ +\ \Pr(C)\ -\ \Pr(W\ \cup\ C)}{\Pr(C)}$

This formula also relates $\Pr(W\ |\ C)$ back to $\Pr(W)$. My question is, what warrants embracing the top relation (Bayes' Theorem) and rejecting the bottom relation as the correct mathematical representation of belief updating?

My first attempt at an answer was that the bottom relation is unhelpful because $\Pr(W\ \cup\ C)$ is not algebraically independent of $\Pr(W)$, and thus the appearance of $\Pr(W)$ may be "artificial," analogous to a more blatantly artificial introduction of $\Pr(W)$ such as $\boldsymbol{\Pr(W\ |\ C)}\ =\ \frac{\Pr(W\ \cap\ C)}{\Pr(C)}\ +\ \boldsymbol{\Pr(W)}\ -\ \boldsymbol{\Pr(W)}$. However, the $\Pr(C\ |\ W)$ that appears in the top relation is also not algebraically independent of $\Pr(W)$, the conditional probability formula already applied being the bridge between the two.

Another thought I had is that the top and bottom relations may actually be comparably good mathematical representations of belief updating, and the top relation is merely preferred because it acts on $\Pr(W)$ via multiplication, whereas the bottom relation acts on $\Pr(W)$ via addition and multiplication. However, this feels quite weak to explain the ubiquity of the top relation, as it is, in general, far from true that the most useful mathematical expression is always the simplest.

Why should I use Bayes' Theorem as the unique intuition for belief updating, and not the other expression?

Original Q&A

There are 2 best solutions below

Bumbble Comm On 11 May 2023 - 5:00

To me, the formula is better written as $\mathbb P[W|C]=\mathbb P[W]\cdot \frac{\mathbb P[C|W]}{\mathbb P[C]}$ and this is what I would mean by "update of belief", in the sense that $\frac{\mathbb P[C|W]}{\mathbb P[C]}$ is the multiplicative coefficient for your belief $\mathbb P[W]$ uppon observing $C$.

We can push this slightly further and suppose that we have $C_1,\dots, C_n$ that are independent events and independent given $W$, then \begin{align*} \mathbb P[W|C_1,\dots,C_n] &= \frac{\mathbb P[W,C_1,\dots, C_n]}{\mathbb P[C_1,\dots, C_n]}\\ &=\mathbb P[W] \cdot \frac{\prod_{k=1}^n\mathbb P[C_k|W,C_1,\dots,C_{k-1}]}{\prod_{k=1}^n\mathbb P[C_k|C_1,\dots, C_{k-1}]}\\ &= \mathbb P[W] \cdot \prod_{k=1}^n \frac{\mathbb P[C_i| W]}{\mathbb P[C_i]} \end{align*} Which is essentially the iterative update of belief.

Note that independence and conditional independence of $C_1,\dots, C_n$ given $W$ is a quite strong constraint which typically within the realm of mathematics.

**Bumbble Comm** · Accepted Answer

A practical issue when using conditional probability to update beliefs is the calculation of $\Pr(C)$. Your suggested alternative makes this more complicated by also requiring calculation of $P(W \cup C)$.

One approach is to say $\Pr(C)=\sum_i \Pr(W_i)\Pr(C \mid W_i)$

so Bayes theorem becomes $\Pr(W_k\mid C)=\frac{\Pr(W_k)\Pr(C \mid W_k)}{\sum_i \Pr(W_i)\Pr(C \mid W_i)}$

and this is attractive since it is immediately clear this is non-negative and sums over $k$ to $1$. There is a similar extension to densities and integrals. With a constant denominator across $k$, you can then say things like "posteriors are proportional to priors times likelihood".

As well as $\Pr(C)$, your version also needs to handle $\Pr(W \cup C)$, which can be written as $\Pr(W)+\Pr(W^c \cap C)$ or $\Pr(C)+\Pr(W \cap C^c)$. Usually neither of these are simple or have a natural interpretation.

What is the conceptual motivation for Bayes' Theorem over this other equation?

There are 2 best solutions below

Related Questions in PROBABILITY

Related Questions in PROBABILITY-THEORY

Related Questions in STATISTICS

Related Questions in BAYES-THEOREM

Related Questions in INCLUSION-EXCLUSION

Trending Questions

Popular # Hahtags

Popular Questions