Is the median of CDFs again a CDF?

89 Views Asked by At

I am working on the following barycenter problem: Suppose we are given $N$ probability measures on $\mathbb{R}$ with cumulative distribution functions $F_1,\dots,F_N$ and we are interested in the minimization problem $$ \arg\min_{y \in \mathbb{R}} \sum_{i=1}^{N} |y-F_i(x)|$$

for $x\in\mathbb{R}$. If we set $F(x) = \min \{ \arg\min_{y \in \mathbb{R}} \sum_{i=1}^{N} |y-F_i(x)| \}$, i.e. $F(x)$ is the median of the $F_i(x)$ (for $N$ odd) and in case we have an interval of numbers that minimize the sum of distances (if $N$ is even), we take the smallest of these numbers, so that we always have $F(x)=F_i(x)$ for some $i \in \{1,\dots,N\}$. I would like to show that $F$ is again a cdf so that it defines a probability measure. I have already shown right-continuity and that $\lim_{x \to \infty}F(x) = 1, \lim_{x \to -\infty}F(x) = 0$, but I have struggles showing that $F$ is non-decreasing. It is clear to me that this is the case on intervals on which the $F_i$ do not cross each other (the $F_i(x)$ do not change their order), but I don't know how to formally show monotonicity when the $F_i$ cross each other.

1

There are 1 best solutions below

0
On BEST ANSWER

Lemma: monotonicity of order statistics. Let's fix some notation. We take two finite sequences $x_1,..,x_n$ and $y_1,..,y_n$ s.t.

$$\ x_i\le y_i, i=1...n \tag{1}$$

Let's take now two permutations of the set of indexes $\{1,..,n\}$, $p$ and $q$ s.t.:

$x_{p(1)}\le x_{p(2)} \le x_{p(n)}$

and

$y_{q(1)}\le y_{q(2)} \le y_{q(n)}$

We can than prove that $y_{q(k)}\ge x_{p(k)}$ for every $k=1,...n$ (these if I am not mistaken are called the order statistics of the samples)

Proof: Fix $k$ and note that $y_{q(k)}\ge x_{q(k)}$ by property $(1)$. If $1\le k'<k$, $y_{q(k)}\ge y_{q(k')}\ge x_{q(k')}$, so that we have $y_{q(k)}\ge x_{q(k')}$ for $1\le k'\le k$. Told another way $y_{q(k)}$ is greater than $k$ values of the $x-$sequence. In particular it is greater than their maximum. The worst case scenario (when this maximum is as small as possible) is when these are the smallest $k$ values in the $x$-sequence and in this case their maximum is $x_{p(k)}$. This proves the claim. c.v.d.

Application to our problem: For $n=2m+1$ odd, the medians of the two sequences are $x_{p(m+1)}$ and $y_{q(m+1)}$. Than your monotonicity result follows by taking in the claim $k=m+1$, the sequence $x_i$ as $\{F_i(x)\}_{i=1..n}$ the sequence $y_i$ as $\{F_i(y)\}_{i=1..n}$ for $y>x$. For $n$ even and your definition of median I think the indexes can be adjusted accordingly as well.

I hope the reasoning works.