Denote by $\mathcal M$ the set of probability measures over $[0,1]$ that average to $p$. This is a convex set and I am trying to characterize its extreme points. For any $x$, $y$ such that $0\leq x< p< y\leq 1$ define \begin{align*} \lambda_x^y = \frac{p-y}{x-y}\cdot\delta_x+\frac{x-p}{x-y}\cdot\delta_y \end{align*} where for any measurable $A\subseteq [0,1]$ and $a\in[0,1]$, $\delta_a(A)=\mathbf 1(a\in A)$. Since $\delta_x$ (resp. $\delta_y$) averages to $x$ (resp. $y$) we get that $\lambda_x^y$ averages to $\frac{p-y}{x-y}\cdot x+\frac{x-p}{x-y}\cdot y=p$ and so is in $\mathcal M$.
I am very tempted to say that the extreme points of $\mathcal M$ is the set $\{ \lambda_x^y : 0\leq x< p < y \leq 1 \}\cup \{ \delta_p \}$ without being able to prove it. If I am not mistaking I am supposed to prove that (ignoring the $\delta_x$ component)
- If for some $x$ and $y$ and $a\in[0,1]$, $\lambda_x^y=a \mu+(1-a) \nu$ then $\mu=\nu=\lambda_x^y$.
- For any $\mu\in\mathcal M$ there is a probability measure $\kappa$ over $[0,p]\times[p,1]$ such that $\mu(A)=\int \lambda_x^y(A) d\kappa(x,y)+\mu(\{ p \})\delta_p(A)$ for all measurable $A\subseteq [0,1]$.
For the first point, it is quite clear by positivity of the measures $\mu$ and $\nu$ that their support $\{ x,y \}$, then there is only one probability measure with such a support that averages to $p$ and it is $\mu_x^y$.
For the second statement however it is less clear to me, Maybe one way to proceed is computing the integral \begin{align*} \int \lambda_x^y(A) d\kappa(x,y)&=\int \left( \frac{p-y}{x-y}\cdot\delta_x(A)+\frac{x-p}{x-y}\cdot\delta_y(A) \right) d\kappa(x,y)\\ &=\int_{\left(A\cap [0,p]\right)\times [p,1]}\frac{p-y}{x-y} d\kappa(x,y)+\int_{[0,p]\times\left(A\cap [p,1]\right)}\frac{x-p}{x-y}d\kappa(x,y) \end{align*} It now feel easier to separate cases $A\subseteq [0,p]$ and $A\subseteq [p,1]$ to set one of the two integral to $0$ but I still don't know how to proceed.
Actually thinking about how I could use it in my research it would be much more powerful to not enforce $x\leq p \leq y$ but just that $p$ is in the convex hull of $\{ x,y \}$ and then we get repetition $\lambda_x^y=\lambda_y^x$ but the of extreme points is the same. This makes $\kappa$ a probability measure over $[0,1]^2$ where we have to be careful assigning $0$ probability to sets $A\times B\subseteq [0,1]^2$ where the convex hull of $A\cup B$ does not contain $p$.
Here is the last thing I tried : Denote $a_x^y=\frac{p-y}{x-y}$, for $\mu \in\mathcal M$ and $A$ measurable \begin{align*} \mu(A)&=\int_{[0,p)} \delta_x(A) d\mu(x)+\int_{(p,1]} \delta_y(A) d\mu(y)+\mu(\{ p \})\delta_p(A)\\ &=\int_{[0,p)\times (p,1]} \left(\frac{a_x^y}{a_x^y}\delta_x(A) + \frac{1-a_x^y}{1-a_x^y}\delta_y(A) \right)d\mu\otimes\mu(x,y)+\mu(\{ p \})\delta_p(A)\\ \end{align*} It feels like we can define $\kappa$ as a function of $\mu\otimes \mu$ and $a_x^y$, I cannot finish the argument though.
Indeed the second claim is true. In order to prove this, we introduce some notation:
1. We will use binary splitting process to establish the desired fact. This will involve splitting the interval $[0, 1]$ into pieces, and we will track this process using binary trees. More precisely, we will consider the binary trees $\mathsf{T}$ such that
Each node of $\mathsf{T}$ is a tagged interval of the form $(I, a)$, where $a \in I$.
Each internal node $(I, a)$ of $\mathsf{T}$ has exactly two childs $(I_0, a_0)$ and $(I_1, a_1)$, where
So, the tagged point $a$ is used to split the interval $I$.
Here is an example of a binary tree:
$$ \small \begin{gathered} ([0, 1], 0.7) \\ \swarrow \hspace{4em} \searrow \\ ([0, 0.7), 0.3) \quad ([0.7, 1], 0.8) \\ \hspace{5.5em} \swarrow \hspace{4em} \searrow \\ \hspace{5.5em} ([0.7, 0.8), 0.72) \quad ([0.8, 1], 0.95) \end{gathered} $$
Since knowing the initial interval (the interval in the root) and all the tagged points in $\mathsf{T}$ is enough to reconstruct the entire $\mathsf{T}$, we will usually abbreviate by omitting the interval part whenever no confusion arises. For instance, the above example can be abbreviated as
$$ \small \begin{gathered} 0.7 \\ \swarrow \quad \searrow \\ 0.3 \qquad 0.8 \\ \hspace{3.25em} \swarrow \quad \searrow \\ \hspace{3.5em} 0.72 \qquad 0.95 \end{gathered} $$
2. Now we recursively define $\nu[\mathsf{T}]$ as follows:
$\nu[a] := \delta_a$.
If $\mathsf{T} = \Bigl( {\scriptsize\begin{gathered} a \\[-5pt] \swarrow \ \searrow \\[-5pt] \mathsf{T}_0 \qquad \mathsf{T}_1 \end{gathered}} \Bigr) $ such that $\mathsf{T}_i$ has root $a_i$ for each $i = 0, 1,$ (in particular, $a_0 < a < a_1$), then
$$ \nu[\mathsf{T}] = \frac{a_1 - a}{a_1 - a_0} \nu[\mathsf{T}_0] + \frac{a - a_0}{a_1 - a_0} \nu[\mathsf{T}_1]. $$
This notation is related to OP's notation in that, if $x < p < y$, then
$$ \nu \Bigl[ {\scriptsize\begin{gathered} p \\[-5pt] \swarrow \ \searrow \\[-5pt] x \qquad y \end{gathered}} \Bigr] = \lambda^{y}_{x}. $$
Then the following lemma holds:
This follows by the structural induction together with the following reduction formula
$$ \nu \Biggl[ {\scriptsize\begin{gathered} a \\[-5pt] \swarrow \ \searrow \\[-5pt] a_0 \qquad a_1 \\[-5pt] \swarrow \ \searrow \hspace{3em} \\[-5pt] a_{00} \qquad a_{01} \hspace{3em} \end{gathered}} \hspace{-1.8em} \Biggr] = \alpha \nu \Bigl[ {\scriptsize\begin{gathered} a \\[-5pt] \swarrow \ \searrow \\[-5pt] a_{00} \qquad a_1 \end{gathered}} \Bigr] + (1-\alpha) \nu \Bigl[ {\scriptsize\begin{gathered} a \\[-5pt] \swarrow \ \searrow \\[-5pt] a_{01} \qquad a_1 \end{gathered}} \Bigr], $$
where $\alpha$ is given by
$$ \alpha = \frac{(a_1 - a_{00})(a_{01} - a_0)}{(a_1 - a_0)(a_{01} - a_{00})} \in (0, 1). $$
3. The final ingredient is the observation that any probability measure $\mu \in \mathcal{M}$ on $[0, 1]$ that averages to $p$ can be approximated by $\nu[\mathsf{T}]$. This follows from the binary splitting process (see this, for instance), which is often adopted in the proof of the Skorokhod's embedding theorem.
Here, we only outline the key idea. We will construct a sequence of binary trees $(\mathsf{T}_n)_{n=0}^{N}$ by running the following algorithm:
We begin with the binary tree $\mathsf{T}_0$ which consists only of the root $([0, 1], p)$.
Suppose that $\mathsf{T}_n$ is defined and satisfies $a = \mu[x \mid x \in I] = \frac{1}{\mu(I)} \int_{I} x \, \mu(\mathrm{d}x)$ for each node $(I, a)$ of $\mathsf{T}_n$.
If $\mu(I) = \mu(\{a\})$ holds for all leaves $(I, a)$ of $\mathsf{T}_n$, we set $N = n$ and halt the algorithm.
Otherwise, pick a leaf $(I, a)$ of $\mathsf{T}_n$ and set
$$ a_0 = \mu[x \mid x \in I \cap [0, a)] \qquad\text{and}\qquad a_1 = \mu[x \mid x \in I \cap [a, 1]]. $$
The two assumptions, $a = \mu[x \mid x \in I]$ and $\mu(I) \neq \mu(\{a\})$, shows that both $\mu(I \cap [0, a)) > 0$ and $\mu(I \cap [a, 1]) > 0$ hold true, and so, both $a_0$ and $a_1$ are well-defined. Moreover, with this choice, we have
$$ \nu \Bigl[ {\scriptsize\begin{gathered} a \\[-5pt] \swarrow \ \searrow \\[-5pt] a_{0} \qquad a_{1} \end{gathered}} \Bigr] = \frac{\mu(I \cap [0, a))}{\mu(I)} \delta_{a_0} + \frac{\mu(I \cap [a, 1])}{\mu(I)} \delta_{a_1}. $$
Then we set $\mathsf{T}_{n+1}$ as the binary tree obtained by inserting two children $(I \cap [0, a), a_0)$ and $(I \cap [a, 1], a_1)$ under the node $(I, a)$.
If the algorithm does not halt in finite time, set $N = \infty$.
Then it can be proved that $\nu[\mathsf{T}_n]$ converges to $\mu$ weakly as $n \to N$. If we denote $\kappa_n$ for the probability measure obtained by applying the lemma to $\nu[\mathsf{T}_n]$, then any limit point of $(\kappa_n)_{n=0}^{N}$ as $n \to N$ will provide the desired probability measure associated with $\mu$.