Question on Conditional Probabilities, Joint Probabilities and Optimal Transport Distances

98 Views Asked by Bumbble Comm At 23 Feb 2026 - 1:42

I am working on a data-driven problem, where I want to measure an optimal transport distance (Wasserstein distance) between two empirical multi-dimensional probability distributions. Then I want to use math to be able to interpret relationships between the probabilities of these distributions. Pardon me in advance for crudeness but I am definitely getting outside of my comfort zone here but it is by choice.

Let's say we have two unknown random variables $X,Y \in \mathbb{R}^2$ where we can assume $X \sim p(x1,x2)$ and $Y \sim q(y1,y2)$, so $X$ and $Y$ are distributed according to unknown measures $p$ and $q$ respectively.

We compute the Wasserstein distance between $p$ and $q$.

$$W_1(p,q)= \sup_{f \in \mathcal{F}} \left( \int f(x) d p(x) - \int f(x) d q(x) \right)$$

Where $\mathcal{F}$ is the family of all continuous functions with Lipschitz constant less than or equal to 1.

What I think is true, and what I'm trying to show is that any conditional probabilities defined on p and q, given the same conditioning variable; the Wasserstein distance of these conditional probability (measures) must be less then or equal to $W_1(p,q)$.

If I try to write this, I would define 2 new measures like $r \sim p(x_1|x_2=c)$ and $v \sim q(y_1|y_2=c)$ for some common value $c$, then we must have that $$W_1(r,v) \le W_1(p,q)$$.

My intuition here comes from visualizing a pair of 2D histograms, 2 bumps in 2D for $p$ and $q$. We understand the Wasserstein distance like earth-movers distance. But the main point here is that if we visualize $r$ and $v$, the conditional probabilities, then these would show up by cutting the 2D histograms where $x_1=c$ and $y_2=c$ respectively, and these conditional probabilities are actually occurring as a 1-dimensional histogram (or distribution). So $W_1(r,v)$ is a Wasserstein distance of measures in a lower-dimensional space. So it seems logical that the amount of earth we have to move to match these 1-dimensional measures ($r$ and $v$) would be must less than the dirt we would need to move to match the 2-dimensional measures $p$ and $q$. The reason is that $r$ and $v$ are just a part of the greater $p$ and $q$.

Does this make any sense to anyone? Again, sorry for the crudeness. I am trying to work how to formulate this idea. Does anyone have an idea if my thinking is valid, or if I am way off?

Thanks.

Original Q&A

Question on Conditional Probabilities, Joint Probabilities and Optimal Transport Distances

Related Questions in PROBABILITY

Related Questions in MEASURE-THEORY

Related Questions in OPTIMAL-TRANSPORT

Trending Questions

Popular # Hahtags

Popular Questions