Sorry if this question is a bit vague.
Let $F: [0, 1] \to [0, 1]$ (cdf) and $f = F': [0, 1] \to \mathbb{R}$ (pdf) be a probability distribution with domain the unit interval.
The distribution has a mean $\mu = \int_0^1 x f(x) dx$ and a median $m = F^{-1}(1/2)$.
I found some empirical evidence and an intuitive argument for the following proposition:
The mean is always closer to $1/2$ than the median, or more precisely $|\mu - 1/2| \leq |m - 1/2|$ with equality only when both are $0$ (i.e. when $m = \mu = 1/2$).
Then I found a counterexample.
I will give both the intuition and the counterexample below but my question is:
Is some version of this true? I.e. are there some 'natural' extra conditions on $F$ that make it true after all, explaining that in practice it seems more often true then not? Is it perhaps true for all beta-distributions? Some broader class? Or am I just fooling myself here?
Intuition:
The interval is divided in the subintervals $[0, m]$ and $[m, 1]$. The mean is initially lying on the boundary and both intervals try to pull it to their side. Now the strength with which either interval can pull depends on two factors: the total probability mass contained in the interval and the width of the interval. The first factor is the same for both intervals by definition of median. So the winner is the interval that is wider, which is obviously the one containing 1/2.
Counterexample:
Start with an example $F$ for which the proposition is true and $\mu$ and $m$ are both greater than 1/2 (so that $\mu < m$). Then make a new distribution by squeezing the old distribution into $[0, 1/2]$ i.e. we define
$G(x) = F(2x)$ for $x \leq 1/2$ and $G(x) = 1$ for $x \geq 1/2$.
Or in pdf form:
$g(x) = 2f(2x)$ for $x \leq 1/2$ and $g(x) = 0$ for $x \geq 1/2$.
Then writing $\mu'$ and $m'$ for the mean and median of $G$ we find that $\mu' = \mu/2, m' = m/2$. But since $\mu < m$ we have that $\mu' < m'$ and since both $\mu', m'$ are smaller than $1/2$ we find that $\mu'$ is further away from $1/2$ than $m'$ is.
Googling and tracking references I found an old article that gives a beautiful answer to this question:
Mean, median, mode II, by R. W. van Zwet, Statistica Neerlandica 33, issue 1, march 1979
I cite part of the first page. The setting is that of a cumulative distribution function $F$ defined on $\mathbb{R}$ that has a probability density function $f$ and for which the mean $\mu$ is finite. Van Zwet also writes $m$ for the median. Quote:
In my situation (I am the OP), with $F$ restricted to $[0, 1]$ we see for distributions with $m > 1/2$ that the condition $F(m - x) + F(m + x) \geq 1$ holds for $x = 0$ and for all $x \geq 1 - x$ and so the condition comes down to $F$ being not too crazy in the very small interval $[m, 1]$. (Of course when we are dealing with $F$ on $[0, 1]$ with $m < 1/2$ instead we can use the same criterion applied to $1 - F$.)
This 'Van Zwet property'
$F(m - x) + F(m + x) \geq 1$ for all $x$
seems like a nice example of an 'extra condition' of the type I was looking for. At least we have the following: if we have some a priori knowledge (based on the niceness of the distribtution) that $m$ and $\mu$ are on the same side of 1/2 then $F$ having the Van Zwet property whenever $m > 1/2$ and $1 - F$ having the Van Zwet property whenever $m < 1/2$ implies that $\mu$ is indeed the one lying closest to $1/2$.
My original question was triggered by an applications that came in two flavours: in one case $F$ is a beta distribution and in the other case $F$ is the distribution of a random variable $X$ on $[0, 1]$ for which $logit(X)$ follows a normal distribution.
I'll show that both families of distributions do indeed satisfy the Van Zwet property, thus mostly explaining the observation that for these distribution $\mu$ is closer to 1/2 than $m$ is. (The other part of the explanation, showing that $m$ and $\mu$ are always both $\geq 1/2$ or both $\leq 1/2$ for these distributions I'll postpone to a future question.) In both cases I will only look at the case that $m > 1/2$ (so that Van Zwet implies $m > \mu$.)
It seems that the '$logit(X)$ is normal' case is the easiest of the two. Define the probit function $g$ by $g(y) = \frac{1}{\exp(-y) + 1}$
We have that $X = g(Y)$ where $Y := logit(X)$ is normally distributed with with mean and median $\tau$ for some $\tau$. Since the function that sends $Y$ to $X$ is monotonic We see that $m = g(\tau)$ and hence that $m > 1/2$ implies that $\tau > 0$.
Let $\Phi$ be the cdf of $Y$ and let $\phi$ be the pdf of $Y$. From the fact that $g$ is monotonous we also see that $F(g(y)) = \Phi(y)$ or equivalently $F(z) = \Phi(logit(z))$.
Now let's look at $F(m - x) + F(m + x)$ for some fixed $x \geq 0$. By monotonicity have that $logit(m - x) = \tau - a$ and $logit(m + x) = \tau + b$ for some $a, b \geq 0$ and also $F(m - x) + F(m + x) = \Phi(\tau - a) + \Phi(\tau + b)$. Now by the symmetry of the normal distribution we have that $\Phi(\tau - a) + \Phi(\tau + a) = 1$ and hence in order to obtain $\Phi(\tau - a) + \Phi(\tau + b) \geq 1$ it suffices to show that $b \geq a$.
This can be seen visually by staring at the graph of $g$ (or logit) and remembering that $\tau > 0$ (or equivalently $m > 1/2$).
The beta-case seems harder. I don't really understand the relationship between the parameters $alpha, \beta$ on one hand and the median $m$ on the other, but it seems for the Van Zwet criterion to work for us we need to have some prior knowledge that $m > 1/2$ implies $\alpha > \beta$. As I remarked before we would be interested anyway in having some prior knowledge that $m$ and $\mu$ are on the same side of $1/2$ and such prior knowledge would do the job as $\mu > 1/2$ implies $\alpha > \beta$ through $\mu = \alpha/(\alpha + \beta)$. But at the moment I do not know how to prove that.
Anyway, if we do know that $m > 1/2$ and that $\alpha > \beta$ we can show that $m > \mu$ and hence, by $\mu = \alpha/(\alpha + \beta)$, that $m > \mu > 1/2$. The argument goes as follows.
Consider the expression $F(x - m) + F(x + m)$ as a function $H$ of $x$. Its derivative $h$ equals $f(m + x) - f(m - x)$ which equals some constant times $(m + x)^{\alpha -1}(n - x)^{\beta-1} - (m - x)^{\alpha - 1}(n + x)^{\beta - 1}$ where we introduce the shorthand $n = 1 - m$. From $m > 1/2$ we know that $m > n$.
Now we only need to consider the function $H$ on the interval $[0, n]$ since we know that $H(x) \geq 1$ for $x > n$. Also, we know that $H(0) = 1$ and $H(n) > 1$, so it would certainly be sufficient if on the interval $H$ would go up and then down again without any other twists and turns. In other words: we know $h(0) = 0$ and $h(n) < 0$ and it would be sufficient if we could show that $h(x)$ had exactly one zero in the open interval $(0, n)$. This is what we will do.
$h(x) = 0$ is equivalent to $\left(\frac{x + m}{x - m}\right)^{\alpha - 1} = \left(\frac{x + n}{x - n}\right)^{\beta - 1}$ by the above expression for $h$ and it is easy to see, just from staring at the graphs of the simpler functions $\frac{x + m}{x - m}$ and $\frac{x + n}{x - n}$ that this equation does indeed have exactly one solution in the interval $(0, n)$ for any two pairs of positive numbers $m > n$ and $\alpha > \beta$ regardless of what further mysterious relations they have among each other.
This is certainly nice. But the question of how to rule out the case where $m$ and $\mu$ are on different sides of $1/2$ for these distributions still bugs me. I'll probably come back to it in a separate MSE question.