Conditional distribution of $(X_i)_i$ given $\sum\limits_i X_i$ when $(X_i)_i$ is i.i.d.

233 Views Asked by At

Suppose $X_1,X_2,\ldots, X_n$ are i.i.d. random variables. Is there some way to determine the distribution of $(X_1,\ldots,X_n)$ given $S_n := X_1+\ldots+X_n$?

In the discrete case it is easy just using the definition of conditional expected value, but what about in the continuous case?

1

There are 1 best solutions below

0
On BEST ANSWER

Expanding on my first comment...

General formula

Let $S$ be a random variable with PDF $f_S(s)$ for $s \in \mathbb{R}$. Let $A$ be an event with $P[A>0]$. Then: $$ P[A|S=s] = \frac{f_{S|A}(s|A)P[A]}{f_S(s)} $$ For intuition about this formula, it is easy to verify that for any interval $[a,b]$: \begin{align} \int_{s=-\infty}^{\infty} P[A|S=s]f_S(s)ds &= P[A] \\ \int_{s \in [a,b]} P[A|S=s] f_S(s) &= P[A \cap \{ S \in [a,b]\}] \end{align}

Application to your problem

Let $\{X_i\}_{i=1}^n$ be i.i.d., let $S=\sum_{i=1}^n X_i$. We want to find $P[(X_1, \ldots, X_n) \leq (x_1, ..., x_n) | S=s]$ for all relevant values of $s \in \mathbb{R}$. For notational simplicity define $X=(X_1, ..., X_n)$, $x=(x_1, ..., x_n)$, $A_x = \{X\leq x\}$. Note that: $$P[A_x] = P[X\leq x] = P[X_1\leq x_1]\cdots P[X_n\leq x_n] $$ Assume $S$ has PDF $f_S(s)$. We want to compute $P[A_x|S=s]$. Applying the above formula gives: \begin{align*} P[A_x | S=s] &= \frac{f_{S|A_x}(s|A_x)P[A_x]}{f_S(s)} \\ &= \frac{f_{S|A_x}(s|A_x)P[X_1\leq x_1]\cdots P[X_n\leq x_n]}{f_S(s)} \end{align*} You can find $f_S(s)$ by $n$-fold convolution of the PDFs of $f_X(s)$ (assuming such exist). You can find $f_{S|A_x}(s|A_x)$ by: \begin{align} f_{S|A_x}(s|A_x) &= \frac{d}{ds} P[S \leq s| A_x] \\ &= \frac{1}{P[A_x]}\frac{d}{ds}P[\{S\leq s \} \cap A_x]\\ &= \frac{\frac{d}{ds} P[X_1\leq x_1, ..., X_n \leq x_n, X_1+...+X_n\leq s]}{P[X_1\leq x_1]\cdots P[X_n\leq x_n]} \end{align}

Computing probabilities with the conditional CDF

Notice that the above gives the conditional cumulative distribution function (CDF) rather than the conditional PDF. Conditional PDFs given $X_1 + ... +X_n=s$ are hard to define since, as you note, this restricts $(X_1, ..., X_n)$ to a multidimensional set of measure zero in $\mathbb{R}^n$. So real-valued functions defined over that measure-zero set would integrate to 0, not to 1. One would need to use PDFs with multidimensional impulses, which are tricky. However, there is no need to use impulses: Working with the conditional CDF gives all you need.

For example, suppose $n=2$ and let $B$ be the line segment in $\mathbb{R}^2$ between the points $(0,1)$ and $(1,0)$. How do we compute $P[(X_1, X_2) \in B | S=1]$? We just observe:

$$P[(X_1, X_2) \in B | S=1] = P[\underbrace{X_1 \leq 1, X_2 \leq 1}_{A_{(1,1)}} | S=1] $$

A 2-d picture to illustrate the above equation would be great, but I do not know how to post a picture on stackexchange.