probablity man is taller than woman in random pair (or comparing random values from distinct normal distributions)

64 Views Asked by At

General Question

Suppose two populations follow normal distributions for some variable, with population $A$ given by mean and standard deviation $\mu_A$ and $\sigma_A$, and population $B$ given by mean and standard deviation $\mu_B$ and $\sigma_B$.

For a pair of individuals consisting of one each from A and B, the values of the variable may be given as $a$ and $b$.

Given such a pair generated by random selection, what is the probability that $a > b$?

Example

For example, according to a 2016 study, the heights in the global populations of men and women are given by the following measures:

Mean Standard Deviation
Men 178.4 cm 7.59 cm
Women 164.7 cm 7.07 cm

Taking the findings of the study, for any random pair of man and woman, what is the probability of the man having greater height than the woman?

Auxiliary Questions

  1. Does the area of overlap of the two distributions have any meaningful relation to the general solution?
  2. Does the height of one curve at the mean of the other curve have any meaningful relation to the general solution?
1

There are 1 best solutions below

16
On

The question will probably be closed as duplicate, so let me try and quickly give some hints as how to understand the answer to the linked similar earlier question.

Let the $X$ denote the height of the man of the pair and $Y$ the height of the woman. In the general setting $X$ is a random variable with distribution $N(\mu_A, \sigma_A)$ and $Y$ a random variable with distribution $N(\mu_B, \sigma_B)$.

The answer relies on two ideas: a clever trick that is handy in many situation and a magical theorem that only applies to normal distributions and not others.

  1. (Handy trick). Instead of computing the probability of the complicated event $X > Y$ which involves two random variables we first compute the new random variable $Z = X - Y$. The event we are interested in ($X > Y$) is equivalent to the simpler event $Z > 0$. So now we have to deal with only one random variable ($Z$) and we want to know when it is bigger than 0. Of course we can do that as soon as we know the distribution of $Z$

  2. (Magic theorem): If two random variables $U$ and $V$ are each drawn from a normal distribution, then their sum $U + V$ is a random variable that is also drawn from a (different, but still) normal distribution!

In this case we can take $U = X$ and $V = -Y$ to see that $Z = X - Y$ follows a normal distribution as well. Of course we need to compute the mean and standard deviation of this new distribution before we can compute the probability that $Z > 0$, but it is intuitively obvious that the mean of this new distribution must be $\mu_A - \mu_B$. The rule for getting the standard deviation of $Z$ from $\sigma_A$ and $\sigma_B$ is less obvious, but you can find it in the answers to the other question.

EDIT: some remarks on the auxilary questions. Both relate to different questions than this one

  1. The area of the overlap is something like this. Suppose you randomly select ONE person instead of two. Someone tells you their length and you have to guess if they are a man or a woman. You consistently make the less likely guess, i.e. if the person is 2 meters you guess female and if the person is 165cm you guess male. Then the area of the overlap is the probability that your are right.

  2. The height of a probability distribution hardly ever has a meaningful interpretation. It is always about the area under the curve. So the area under the male curve to the right of the mean of the female curve is the probability that a randomly selected man is taller than the average woman and you can figure out the meaning of the other three variations on this theme.