Extremal distribution of random variable that averages to a given value

Question

Extremal distribution of random variable that averages to a given value

177 Views Asked by Bumbble Comm At 27 Mar 2026 - 7:13

Denote by $\mathcal M$ the set of probability measures over $[0,1]$ that average to $p$. This is a convex set and I am trying to characterize its extreme points. For any $x$, $y$ such that $0\leq x< p< y\leq 1$ define \begin{align*} \lambda_x^y = \frac{p-y}{x-y}\cdot\delta_x+\frac{x-p}{x-y}\cdot\delta_y \end{align*} where for any measurable $A\subseteq [0,1]$ and $a\in[0,1]$, $\delta_a(A)=\mathbf 1(a\in A)$. Since $\delta_x$ (resp. $\delta_y$) averages to $x$ (resp. $y$) we get that $\lambda_x^y$ averages to $\frac{p-y}{x-y}\cdot x+\frac{x-p}{x-y}\cdot y=p$ and so is in $\mathcal M$.

I am very tempted to say that the extreme points of $\mathcal M$ is the set $\{ \lambda_x^y : 0\leq x< p < y \leq 1 \}\cup \{ \delta_p \}$ without being able to prove it. If I am not mistaking I am supposed to prove that (ignoring the $\delta_x$ component)

If for some $x$ and $y$ and $a\in[0,1]$, $\lambda_x^y=a \mu+(1-a) \nu$ then $\mu=\nu=\lambda_x^y$.
For any $\mu\in\mathcal M$ there is a probability measure $\kappa$ over $[0,p]\times[p,1]$ such that $\mu(A)=\int \lambda_x^y(A) d\kappa(x,y)+\mu(\{ p \})\delta_p(A)$ for all measurable $A\subseteq [0,1]$.

For the first point, it is quite clear by positivity of the measures $\mu$ and $\nu$ that their support $\{ x,y \}$, then there is only one probability measure with such a support that averages to $p$ and it is $\mu_x^y$.

For the second statement however it is less clear to me, Maybe one way to proceed is computing the integral \begin{align*} \int \lambda_x^y(A) d\kappa(x,y)&=\int \left( \frac{p-y}{x-y}\cdot\delta_x(A)+\frac{x-p}{x-y}\cdot\delta_y(A) \right) d\kappa(x,y)\\ &=\int_{\left(A\cap [0,p]\right)\times [p,1]}\frac{p-y}{x-y} d\kappa(x,y)+\int_{[0,p]\times\left(A\cap [p,1]\right)}\frac{x-p}{x-y}d\kappa(x,y) \end{align*} It now feel easier to separate cases $A\subseteq [0,p]$ and $A\subseteq [p,1]$ to set one of the two integral to $0$ but I still don't know how to proceed.

Actually thinking about how I could use it in my research it would be much more powerful to not enforce $x\leq p \leq y$ but just that $p$ is in the convex hull of $\{ x,y \}$ and then we get repetition $\lambda_x^y=\lambda_y^x$ but the of extreme points is the same. This makes $\kappa$ a probability measure over $[0,1]^2$ where we have to be careful assigning $0$ probability to sets $A\times B\subseteq [0,1]^2$ where the convex hull of $A\cup B$ does not contain $p$.

Here is the last thing I tried : Denote $a_x^y=\frac{p-y}{x-y}$, for $\mu \in\mathcal M$ and $A$ measurable \begin{align*} \mu(A)&=\int_{[0,p)} \delta_x(A) d\mu(x)+\int_{(p,1]} \delta_y(A) d\mu(y)+\mu(\{ p \})\delta_p(A)\\ &=\int_{[0,p)\times (p,1]} \left(\frac{a_x^y}{a_x^y}\delta_x(A) + \frac{1-a_x^y}{1-a_x^y}\delta_y(A) \right)d\mu\otimes\mu(x,y)+\mu(\{ p \})\delta_p(A)\\ \end{align*} It feels like we can define $\kappa$ as a function of $\mu\otimes \mu$ and $a_x^y$, I cannot finish the argument though.

Original Q&A

There are 3 best solutions below

Bumbble Comm On 16 Dec 2020 - 9:58

Your identificaton of the extreme points of $\mathcal M$ is correct. A proof of the difficult part can be based on a theorem of R.G. Douglas. See my recent answer to Two term martingales and their extreme points, with a reference to a paper of A.F. Karr containing such a proof.

Bumbble Comm On 18 Dec 2020 - 12:51

We can adapt the results in the paper "Extreme Points of Certain Sets of Probability Measures, with Applications" by A.F. Karr (https://www.jstor.org/stable/3689412) mentionned by @JohnDawkins to the following more general result.

Suppose that $X$ is a closed compact convex subset of a locally convex space $E$.

Definition. The subset $F\subseteq X$ is called a face if for all $x\in F$ such that there is $y,z\in X$ and $\lambda\in(0,1)$ with $x=\lambda y+(1-\lambda)z$, then $y,z\in F$.

One can show that faces are closed compact convex subsets of $X$ and that $x$ is extreme in $X$ if and only if $\{ x \}$ is a face. Indeed if $x$ is extreme and $\{ x \}$ is a face is really the same statement. We are now ready to characterize the extreme points of a linear subspace of $X$. Let $\mathcal F(A)$ be the set of faces of $A$.

Lemma. Let $f$ be an affine functional from $X$ to some other vector topological space $Y$. Define $X_f=\{ x\in X:f(x)=0\}$, then $\mathcal F(X_f)=\{ F\cap X_f:F\in\mathcal F(X) \}$.

Corrolary. $x$ is an extreme point of $X_f$ if and only if there exist a face $F\in\mathcal F(X)$ such that $F\cap X_f=\{x\}$.

Suppose that $F\in\mathcal F(X)$, then for $x\in F\cap X_f$, $y,z\in X_f$ and $\lambda\in(0,1)$ such that $x=\lambda y+(1-\lambda)z$, we know that $y,z\in F$ which means that $x,z\in F\cap X_f$ and so $F\cap X_f$ is a face.

Suppose that $F\in\mathcal F(X_f)$ and let $G=\{ y\in X : \exists x\in F,z\in X,\lambda\in(0,1):x=\lambda y+(1-\lambda)z \}$, then we can show that $F=G\cap X_f$ and $G\in \mathcal F(X)$ which finishes the proof. To show the first statement, it is clear that any $y\in F$ is also in $G\cap X_f$ by taking $x=z=y$ and any $\lambda$. Now take $y\in G\cap X_f$, then there is $x\in F$, $z\in X$ and $\lambda\in(0,1)$ such that $x=\lambda y+(1-\lambda)z$, and since $\lambda\neq 1$, $z\in X_f$. But since $y$ and $z$ are in $X_f$, then they are in $F$ and in particular $y\in F$.

In order to show that $G\in\mathcal F(X)$, we take any $x\in G$ and any $y,z\in X$, $\lambda\in (0,1)$ such that $x=\lambda y+(1-\lambda)z$. Since $x\in G$ there is $x'\in F$, $z'\in X$ and $\lambda'$ such that \begin{align*} x'&=\lambda'x+(1-\lambda')z'\\ &=\lambda'\lambda y+\lambda'(1-\lambda)z+(1-\lambda')z'\\ &=\alpha y+(1-\alpha)\tilde z \end{align*} with $\alpha=\lambda' \lambda\in(0,1)$ and $\tilde z =\frac{\lambda'(1-\lambda )z+(1-\lambda')z'}{1-\alpha}\in X$, this implies that $y\in G$ and similarly we can write $x'=\beta z+(1-\beta)\tilde y$ with $\beta=\lambda'(1-\lambda)\in(0,1)$ and $\tilde y=\frac{\lambda'\lambda y+(1-\lambda')z'}{1-\beta}\in X$ and so $z\in G$, so $G\in\mathcal F(X)$. $\square$

In my case we can have $X$ to be the set of probability measures over $[0,1]$ and if $f:\mu\rightarrow \int xd\mu(x)-p$ then $\mathcal M=X_f$. One can show that for any Borel measurable $A\subseteq [0,1]$, $F_A=\{ \int_A \delta_x d\mu(x):\mu(A)=1 \}$ is a face (it is all measures with support inside $A$) and all faces are of this form. Now we can also show that $F_A\cap X_f=\{ x\}$ if and only if $1\leq|A|\leq 2$ and $p$ is in the convex hull of $A$. This indeed gives the set that I mentioned in the question.

**Bumbble Comm** · Accepted Answer

Indeed the second claim is true. In order to prove this, we introduce some notation:

1. We will use binary splitting process to establish the desired fact. This will involve splitting the interval $[0, 1]$ into pieces, and we will track this process using binary trees. More precisely, we will consider the binary trees $\mathsf{T}$ such that

Each node of $\mathsf{T}$ is a tagged interval of the form $(I, a)$, where $a \in I$.
Each internal node $(I, a)$ of $\mathsf{T}$ has exactly two childs $(I_0, a_0)$ and $(I_1, a_1)$, where
- $I_0 = I \cap (-\infty, a)$ and $I_1 = I \cap [a, \infty)$, and
- $a_0, a_1 \in I$ such that $a_0 < a < a_1$.
So, the tagged point $a$ is used to split the interval $I$.

Here is an example of a binary tree:

$$ \small \begin{gathered} ([0, 1], 0.7) \\ \swarrow \hspace{4em} \searrow \\ ([0, 0.7), 0.3) \quad ([0.7, 1], 0.8) \\ \hspace{5.5em} \swarrow \hspace{4em} \searrow \\ \hspace{5.5em} ([0.7, 0.8), 0.72) \quad ([0.8, 1], 0.95) \end{gathered} $$

Since knowing the initial interval (the interval in the root) and all the tagged points in $\mathsf{T}$ is enough to reconstruct the entire $\mathsf{T}$, we will usually abbreviate by omitting the interval part whenever no confusion arises. For instance, the above example can be abbreviated as

$$ \small \begin{gathered} 0.7 \\ \swarrow \quad \searrow \\ 0.3 \qquad 0.8 \\ \hspace{3.25em} \swarrow \quad \searrow \\ \hspace{3.5em} 0.72 \qquad 0.95 \end{gathered} $$

2. Now we recursively define $\nu[\mathsf{T}]$ as follows:

$\nu[a] := \delta_a$.
If $\mathsf{T} = \Bigl( {\scriptsize\begin{gathered} a \\[-5pt] \swarrow \ \searrow \\[-5pt] \mathsf{T}_0 \qquad \mathsf{T}_1 \end{gathered}} \Bigr) $ such that $\mathsf{T}_i$ has root $a_i$ for each $i = 0, 1,$ (in particular, $a_0 < a < a_1$), then

$$ \nu[\mathsf{T}] = \frac{a_1 - a}{a_1 - a_0} \nu[\mathsf{T}_0] + \frac{a - a_0}{a_1 - a_0} \nu[\mathsf{T}_1]. $$

This notation is related to OP's notation in that, if $x < p < y$, then

$$ \nu \Bigl[ {\scriptsize\begin{gathered} p \\[-5pt] \swarrow \ \searrow \\[-5pt] x \qquad y \end{gathered}} \Bigr] = \lambda^{y}_{x}. $$

Then the following lemma holds:

Lemma. For each binary tree $\mathsf{T}$, there exists a probability measure $\kappa$ on $([0,p)\times(p,1])\cup\{p\}$ such that

$$ \nu[\mathsf{T}](\cdot) = \int_{[0,p)\times(p,1]} \lambda_{x}^{y}(\cdot) \, \kappa(\mathrm{d}x\mathrm{d}y) + \delta_p(\cdot) \, \kappa(\{p\}). $$

This follows by the structural induction together with the following reduction formula

$$ \nu \Biggl[ {\scriptsize\begin{gathered} a \\[-5pt] \swarrow \ \searrow \\[-5pt] a_0 \qquad a_1 \\[-5pt] \swarrow \ \searrow \hspace{3em} \\[-5pt] a_{00} \qquad a_{01} \hspace{3em} \end{gathered}} \hspace{-1.8em} \Biggr] = \alpha \nu \Bigl[ {\scriptsize\begin{gathered} a \\[-5pt] \swarrow \ \searrow \\[-5pt] a_{00} \qquad a_1 \end{gathered}} \Bigr] + (1-\alpha) \nu \Bigl[ {\scriptsize\begin{gathered} a \\[-5pt] \swarrow \ \searrow \\[-5pt] a_{01} \qquad a_1 \end{gathered}} \Bigr], $$

where $\alpha$ is given by

$$ \alpha = \frac{(a_1 - a_{00})(a_{01} - a_0)}{(a_1 - a_0)(a_{01} - a_{00})} \in (0, 1). $$

3. The final ingredient is the observation that any probability measure $\mu \in \mathcal{M}$ on $[0, 1]$ that averages to $p$ can be approximated by $\nu[\mathsf{T}]$. This follows from the binary splitting process (see this, for instance), which is often adopted in the proof of the Skorokhod's embedding theorem.

Here, we only outline the key idea. We will construct a sequence of binary trees $(\mathsf{T}_n)_{n=0}^{N}$ by running the following algorithm:

We begin with the binary tree $\mathsf{T}_0$ which consists only of the root $([0, 1], p)$.
Suppose that $\mathsf{T}_n$ is defined and satisfies $a = \mu[x \mid x \in I] = \frac{1}{\mu(I)} \int_{I} x \, \mu(\mathrm{d}x)$ for each node $(I, a)$ of $\mathsf{T}_n$.
1. If $\mu(I) = \mu(\{a\})$ holds for all leaves $(I, a)$ of $\mathsf{T}_n$, we set $N = n$ and halt the algorithm.
2. Otherwise, pick a leaf $(I, a)$ of $\mathsf{T}_n$ and set
  
  $$ a_0 = \mu[x \mid x \in I \cap [0, a)] \qquad\text{and}\qquad a_1 = \mu[x \mid x \in I \cap [a, 1]]. $$
  
  The two assumptions, $a = \mu[x \mid x \in I]$ and $\mu(I) \neq \mu(\{a\})$, shows that both $\mu(I \cap [0, a)) > 0$ and $\mu(I \cap [a, 1]) > 0$ hold true, and so, both $a_0$ and $a_1$ are well-defined. Moreover, with this choice, we have
  
  $$ \nu \Bigl[ {\scriptsize\begin{gathered} a \\[-5pt] \swarrow \ \searrow \\[-5pt] a_{0} \qquad a_{1} \end{gathered}} \Bigr] = \frac{\mu(I \cap [0, a))}{\mu(I)} \delta_{a_0} + \frac{\mu(I \cap [a, 1])}{\mu(I)} \delta_{a_1}. $$
  
  Then we set $\mathsf{T}_{n+1}$ as the binary tree obtained by inserting two children $(I \cap [0, a), a_0)$ and $(I \cap [a, 1], a_1)$ under the node $(I, a)$.
If the algorithm does not halt in finite time, set $N = \infty$.

Then it can be proved that $\nu[\mathsf{T}_n]$ converges to $\mu$ weakly as $n \to N$. If we denote $\kappa_n$ for the probability measure obtained by applying the lemma to $\nu[\mathsf{T}_n]$, then any limit point of $(\kappa_n)_{n=0}^{N}$ as $n \to N$ will provide the desired probability measure associated with $\mu$.

Extremal distribution of random variable that averages to a given value

There are 3 best solutions below

Related Questions in PROBABILITY-THEORY

Related Questions in MEASURE-THEORY

Related Questions in LINEAR-PROGRAMMING

Related Questions in CONVEX-HULLS

Trending Questions

Popular # Hahtags

Popular Questions