Conditional Probability of $P(X|X^2)$ for uniform distribution.

257 Views Asked by At

Assume $X \sim U[-1,1]$ is a uniform distribution. We can tell intuitively that $$P(X=k|X^2=x^2) = \begin{cases} 1/2 \;\;\;\;\;\;\;\text{if} \;\; k=-x \\ 1/2 \;\;\;\;\;\;\;\text{if} \;\; k=x \end{cases}$$

But I want to confirm this using regular conditional probability. However, when I try the definition as $P(X\in A|Y=y):=\lim\limits_{\{Y=y\}\in U}\frac{P(A\cap U)}{P(U)}$, the conditional probability is $0$ since for example if trying to compute $P(X=2|X^2=4):=\lim\limits_{\epsilon \to 0}\frac{P(X=2 \;\cap\; X^2 \in [4-\epsilon,4+\epsilon])}{P(X^2 \in [4-\epsilon,4+\epsilon])}$, which is just $0$ because $X$ is continuous distribution. Maybe I got something wrong. Please help me compute this conditional probability formally.

3

There are 3 best solutions below

5
On BEST ANSWER

I think what you're trying to do is to find conditional law for $X$ and this one question is closely related to the problem of probability kernel.
Theorem
Let $(E_1, \mathcal{E}_1 )$ and $(E_2, \mathcal{E}_2 )$ be two measurable function. Suppose further that $(E_2, \mathcal{E}_2 )$ is regular ( Remark: $\mathbb{R}$ is regular).
Let $m$ be a finite positive measure on $E_1 \otimes E_2$, with $\mu$ as its first marginal law.
Then there is a unique (in almost sure sense) probability kernel from $(E_1, \mathcal{E}_1 )$ to $(E_2, \mathcal{E}_2 )$ such that $m= \mu \otimes p$, that is:
$\int_{E_1 \times E_2} gdm= \int_{E_1} \mu(dx) \int_{E_2} p(x, dy)g(x,y)$
for all measureable function $g:E_1\times E_2 \longrightarrow \mathbb{R_+}$
In particular, we have:
Corollary:
$\mathbb{E}(f(Y) | X) = \int_{\mathbb{R}} f(y) p(X,dy) $ for all positive measureable function $f$, a.s
where $p$ is the respective kernel for $ m= \mathcal{L}(X,Y)$

Remark 1 In fact, there are a lot of technical questions in these statements.(measurability, regularity, almost sureness, spaces of measures, etc.) So if you may, please don't be concerned too much of those because it'll be a heavy burden on you.
Remark 2 Roughly speaking, $p$ represents conditional law.
Remark 3 $p(X=2|X^2=2)$ doesn't mean anything because $P(X^2=2)=0$.We can replace it by anything we want, so "calculating it" also doesn't mean anything. But $\mathcal{L}(X| X^2)$ is a different thing. That's why we introduce the notion of probability kernel.


Back to your question, what you're trying to find is the probability kernel for $(X^2,X)$
. You can find it by your intuition or anything and check it with the unicity in the above theorem ( yeah, this is the right way of *calculating it* in my opinion) then use it in form of the above corollary when you want.

**Appendix **
$\mathcal{L}(X)$: law of $X$.

Disclaim any errors in these statements are mine any maybe due to my reluctance to check the question of almost sureness.

5
On

Since $U(-1,1)$ is a continuous distribution, $\Pr(X=k|X^2=x^2)$ is not well defined. Instead you first find the CDF given $u \leqslant X^2 \leqslant u+\Delta u$, for some small $\Delta u$ (you can assume it's positive), then take the derivative w.r.t. $k$, and let $\Delta u \to 0$ to get your conditional PDF.

So in this case your unconditional probability is $$ \Pr(u \leqslant X^2 \leqslant u+\Delta u)=\sqrt{u+\Delta u} - u. \tag 1 $$

Then you need to calculate $$ \Pr(X \leqslant k \text{ and } u \leqslant X^2 \leqslant u+\Delta u).\tag 2 $$

This is a piece-wise linear function of $k$. I will leave the details to you.

Finally you calculate the ratio of (2) to (1), take the derivative, let $\Delta u \to 0$, you will get your conditional probability mass function.

============ Update:

I found this interesting: https://www.probabilitycourse.com/chapter4/4_3_2_delta_function.php

It basically says that the "symbolic derivative" of the Heavyside step function $u(x)$ is the Dirac delta function $\delta(x)$, where

$$ u(x) = \begin{cases} 1, & x\geqslant 0 \\ 0, & x < 0 \end{cases} $$

and $$ \delta(x) = \begin{cases} \infty, & x=0 \\ 0, & \text{elsewhere} \end{cases} $$

And any discrete distribution has a generalized PDF in terms of the delta function: $$ f_X(x) = \sum \Pr(X=x_k) \delta (x-x_k). $$

Now back to OP's question. We look at the joint distribution of $(X,Y)$ on $[-1,1] \times [0,1]$, where $X \sim U[-1,1], Y=X^2, a.s.$

On one hand, $$ f_{X|Y}(x | y) = \frac 12 \left( \delta(x+\sqrt y) + \delta(x-\sqrt y) \right) $$

$$ f_Y(y) = \frac{1}{2\sqrt y} $$

On the other hand, $$ F_{X,Y} (x,y) = \Pr(X\leqslant x, X^2\leqslant y)= \Pr\left(-\sqrt y \leqslant X\leqslant \min(x,\sqrt y)\right) \\ = \begin{cases} \frac{x+\sqrt{y}}{2}, & x < \sqrt y\\ \sqrt y, & x>\sqrt y \end{cases}\\ = \frac{x+\sqrt y}{2} - \frac{x-\sqrt y}{2} u(x-\sqrt y) $$

We want to show that

$$ f_{X,Y} (x,y) = f_{X|Y} (x|y) f_Y(y) $$

But the link I provided did not discuss the multivariate distribution case. Getting $f_{X,Y} (x,y)$ involves taking the derivative of $F_{X,Y}(x,y)$ w.r.t. to $x$ and $y$, which in term requires the derivative of the delta function. I only found this: https://en.wikipedia.org/wiki/Unit_doublet but there is little information.

5
On

I'm not convinced something like $P(X=1|X^2=1)$ is well$-$defined if $X\sim \mathcal{U}(-2,2)$, and it's not because $P(X^2=1)=0$; it is possible (in fact, common) to condition on events with vanishing probabilities in higher$-$dimensional space. For example, if $(X,Y)\sim f_{XY}$ is a continuous random vector that's supported on $\mathbb{R}^2$, then $$P(X<0|Y=X^2)=\frac{\int_{-\infty}^{0}f(t,t^2)\sqrt{4t^2+1}dt}{\int_{-\infty}^{\infty}f(t,t^2)\sqrt{4t^2+1}dt}$$ even though $P(Y=X^2)=0$.

Here is why I don't believe something like $P(X=1|X^2=1)$ is well defined when $X \sim \mathcal{U}(-2,2)$. Notice how conditioning on $\{X^2=1\}$ is equivalent to conditioning on the event that $X$ belongs to the finite set $\{-1,1\}$. That being said, define intervals $I_1,I_2\subseteq (-2,2)$ by $I_1=(-1-\epsilon_1,-1+\epsilon_1)$ and $I_2=(1-\epsilon_2,1+\epsilon_2)$ where $\epsilon_1,\epsilon_2$ are very small positive real numbers. Notice $$P(X\in I_2|X\in I_1 \cup I_2)=\frac{\epsilon_2}{\epsilon_1 +\epsilon_2}$$ Now if $P(X=1|X^2=1)$ were well$-$defined, we would certainly have $$P(X=1|X^2=1)=\lim_{(\epsilon_1,\epsilon_2)\rightarrow (0^+,0^+)}\bigg(\frac{\epsilon_2}{\epsilon_1+\epsilon_2}\bigg)$$ But this limit doesn't exist.

My original intuition made me think that if $X\sim f_X$ is a continuous random variable supported on $\mathbb{R}$ and if $S\subseteq \mathbb{R}$ is a finite set, then for any $s\in S$ we have $$P(X=s|X\in S)=\frac{f_X(s)}{\sum_{x\in S}f_X{(x)}}$$ but I no longer believe this is the case based off my previous comments. Any thoughts?