Here is a famous example of Simpsons Paradox about Kidney Stone treatments (https://en.wikipedia.org/wiki/Simpson%27s_paradox):
StoneSize TreatmentA TreatmentB
Small stones 93% (81/87) 87% (234/270)
Large stones 73% (192/263) 69% (55/80)
Both 78% (273/350) 83% (289/350)
As we can see here, Treatment A works better than Treatment B for both small stones and large stones, yet Treatment B has a higher overall average compared to Treatment A.
I am interested in knowing if there is a mathematical proof to understand why this paradox happens and what kinds of situations results in this paradox happening more than often and more strongly (i.e. Treatment A dominates Treatment B by larger margins on the individual level, yet Treatment B dominates A by an equally large margin).
Here is a Python simulation in which simulates random clinical trials in which the effectiveness of Treatment A and Treatment B are tested on large and small kidney stones, and the prevalence of Simpsons Paradox is tested in each set of simulations.
In each simulation, the probability of treatment working is Uniform~(0,1), thus in n people, np will have successful treatments:
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(123)
sample_sizes = np.array([87, 270, 263, 80])
percentage_simpsons_paradox = []
# Run the simulation
for i in range(100):
simpsons_paradox_count = 0
for j in range(100):
numerators_A = np.random.binomial(sample_sizes[:2], np.random.rand(2))
numerators_B = np.random.binomial(sample_sizes[2:], np.random.rand(2))
success_rate_A = numerators_A / sample_sizes[:2]
success_rate_B = numerators_B / sample_sizes[2:]
if all(success_rate_A > success_rate_B) and (np.sum(numerators_B) / np.sum(sample_sizes[2:])) > (np.sum(numerators_A) / np.sum(sample_sizes[:2])):
simpsons_paradox_count += 1
percentage_simpsons_paradox.append(simpsons_paradox_count / 100 * 100)
plt.plot(range(1, 101), percentage_simpsons_paradox)
plt.title("Percentage of Simpson's Paradox Occurrences in Each Iteration")
plt.xlabel("Iteration Number")
plt.ylabel("Percentage of Simpson's Paradox Occurrences")
plt.show()
As we can see here, under these simulation condition/assumptions, Simpsons Paradox happens roughly 2.5% of the time.
But is there some mathematical proof which can show how frequently we can expect Simpsons Paradox to occur in a given simulation situation (i.e. simulation with certain conditions, parameters of success having certain probability distribution) ... and which types of simulation situations will produce more extreme and/or more frequent manifestations of Simpsons Paradox?
