Intuition of convex combination of probability measures.

805 Views Asked by At

In the paper Risk Preferences and their Robust Representation by S. Drapeau and M. Kupper, the authors say that if $\mu$ and $\nu$ are probability distributions (commonly referred as lotteries) on $ (\mathbb{R},\mathcal{B})$, a convex combination of the two such as $\lambda \mu + (1- \lambda) \nu$ can be interpreted as some additional randomization, since it corresponds to the sampling of either the lottery $\mu$ or $\nu$ depending on the outcome of a binary lottery with probability $\lambda$ or 1 - $\lambda$.

I am confused by this interpretation, indeed I understand that, if $A$ is a set in $\mathcal{B}$ the definition is $$ (\lambda \mu + (1-\lambda) \nu) (A)= \lambda \mu(A) + (1-\lambda) \nu (A), $$ thus my natural intuition on this is that we are deciding to which probability give the most relevance.

Can somebody explain to me, perhaps with an example, why my interpretation of this is not the correct one?

2

There are 2 best solutions below

7
On

Let $X,Z$ be random variables on $(\Omega,\mathscr{F},P)$. Let $Z\sim \textrm{Bernoulli}(\lambda)$. The above convex combination has the following probabilistic interpretation: for $A \in \mathcal{B}(\mathbb{R})$ $$P(X\in A)=\underbrace{P(X \in A|Z=1)}_{\mu(A)}\underbrace{P(Z=1)}_{\lambda}+\underbrace{P(X\in A|Z=0)}_{\nu(A)}\underbrace{P(Z=0)}_{1-\lambda}$$ Therefore the unconditional law $P_X(A):=P(X \in A)$ is obtained by specifying first the conditional laws of $X$ on the random variable $Z$. It is not false that we are deciding to which probability we give more relevance: by Bayes for fixed non $P_X,\mu,\nu$-null $A$ and known $\mu,\nu$ we have $$P(Z=1|X\in A)\propto\mu(A)\lambda,\quad P(Z=0|X\in A)\propto\nu(A)(1-\lambda)$$ So that conditional on $X$, higher $\lambda$ means more probability of having the $\{Z=1\}$ latent state for given $A$ (and less probability of having the other latent state).

2
On

What it means is that with probability $\lambda$, you perform experiment $\mu$, and with probability $1 - \lambda$ you perform experiment $\nu$.

To get a sample $X$ from $\lambda \mu + (1 - \lambda)\nu$, first generate $U \sim Ber(\lambda)$, and then if $U = 1$ let $X$ be a sample from $\mu$, otherwise let $X$ be a sample from $\nu$. This is an instance of composing stochastic kernels. See https://en.wikipedia.org/wiki/Transition_kernel. This is the formalism for "multistage experiments", where we specify only the conditional distributions and from these obtain a unique joint distribution.