Distribution of $Y= F(X)$ when $F$ is not continuous?

125 Views Asked by At

Let $X$ be a real-valued random variable with cdf $F$. Consider the random variable $$Y= F(X).$$ It is well-known that if $F$ is continuous then $$Y \sim \mathrm{Unif}([0,1]).$$

Question: When $F$ is not necessarily continuous, is there a name for the random variable $Y = F(X)$? Does $Y$ have any useful properties? Could its distribution perhaps be used to measure "how far $F$ is from being continuous"?


Here are some of my thoughts:

If $F$ is not continuous then it has some jump-point, i.e. there exists $x$ s.t. $$F(x-) := \lim_{x'\uparrow x}F(x') < F(x);$$ in particular $Y$ cannot take on any of the values in $]F(x-), F(x)[$. Moreover, since $\Pr(X = x) = F(x)-F(x-),$ $Y$ has a mass-point at $F(x)$. Between jump-points, $Y$ should intuitively be distributed uniformly.

Example: Consider $X$ s.t. $X=0$ a.s. Then the cdf is given by $$F(x) = 0$$ for $x<0$ and $$F(x) = 1$$ for $x \ge 0$. So $$Y = F(X) = F(1) = 1$$ with probability $1$. In particular, $Y$ is not uniform on $[0,1]$.

3

There are 3 best solutions below

3
On BEST ANSWER

Let $X$ be a random variable with CDF $F$. It can have at most countable many jumps. Let $ -\infty < t_1 < t_2 < ....$ be points where $F(t_k) \not = F(t_k -)$. Let's say $F(t_k) = x_k$ and $F(t_k-) = y_k$.

Then letting $Z=F(X)$, then obviously $Z \in [0,1]$, so $F_Z(t) = 0$ if $t < 0$ and $F_Z(t) = 1$ if $t>1$. Moreover $\mathbb P(F(X) \in [y_k,x_k))=0$ for any $k \in \mathbb N_+$. Take now any $s \in (-\infty,y_1)$. We have: $$ F_Z(t) = \mathbb P(F(X) \le s) = s $$ where we used the monotonous behaviour and continuity of $F$ on the segment $(-\infty,t_1)$. (You arlready know it is uniform in that case). Take now for example $s \in [y_1,x_1)$. We have: $$ \mathbb P(F(X) \le s) = \mathbb P(F(X) <y_1) + \mathbb P( F(X) \in (y_1,s)) = \mathbb P(X <t_1) = y_1$$ And if $s \in [x_1,y_2)$ then: $$ \mathbb P(F(X) \le s) = \mathbb P(F(X) \le x_1) + \mathbb P(F(X) \in (x_1,s)) = x_1 + (s-x_1) = s $$ (where again we used the fact that on the segment where it is continuous it is uniform) We can now try to tackle any case. Take $s \in [y_k,x_k)$ getting: $$ \mathbb P(F(X) \le s) = \mathbb P(F(X) <y_k) + \mathbb P(F(X) \in [y_k,x_k)) = \mathbb P(F(X) < y_k) = y_k$$

And for $s \in [x_k,y_{k+1})$ we have:

$$ \mathbb P(F(X) \le s) = \mathbb P(F(X) \le x_k) + \mathbb P(F(X) \in (x_k,s)) = x_k + (s-x_k) = s$$

In other words:

$$ F_Z(t) = \begin{cases} 0 & t<0 \\ t & t \in [F(t_k),F(t_{k+1}-)) , k \in \mathbb N \\ F(t_k-) & t \in [F(t_k-),F(t_k)) , k \in \mathbb N \\ 1 & otherwise \end{cases} $$

Where $t_0 = - \infty$ for shorter notation.

So heuristically, it is uniform on every segment of continuity, whereas on the segment when jump occured, it stands still with the last value it took, and it's waiting for next segment of continuity to "jump" and then go uniformly.

2
On

We have that $\Pr [Y\leqslant c]=\Pr [X\in F^{-1}((-\infty ,c])]$, and if we set $H:=\overline{\operatorname{img}(F)}$ then we find that $$ \Pr [X\in F^{-1}((-\infty ,c])]=\Pr [X\in F^{-1}([0,c]\cap H)]=\Pr [X\in F^{-1}([0 ,b])]\tag1 $$ where $b:=\max ([0 ,c]\cap H)$. There we used the fact that $\operatorname{img}(F)\subset [0,1]$ and consequently that $F^{-1}((-\infty ,c])=F^{-1}([0 ,b])$.

Now if $c\in \operatorname{img}(F)$ then its easy to see that $\Pr [Y\leqslant c]=c$, and from the continuity from above and from below of a probability measure we can see also that the statement holds when $c\in H$, therefore in the general case we have that $$ \Pr [Y\leqslant c]=\begin{cases} \max ([0,c]\cap H),& c\geqslant 0\\ 0,& \text{ otherwise } \end{cases}\tag2 $$

0
On

If $F(X)$ is not continuous, the resulting random variable is hybrid random variable: it is continuous and uniform in the regions where $F(X)$ is continuous, and it is discrete in the jumps. You can use the Delta Dirac function to fully characterize $F(X)$ even when there are discontinuities. You just add a Delta Dirac at the discontinuity point

https://en.wikipedia.org/wiki/Dirac_delta_function

In probability theory and statistics, the Dirac delta function is often used to represent a discrete distribution, or a partially discrete, partially continuous distribution, using a probability density function (which is normally used to represent fully continuous distributions).

https://en.wikipedia.org/wiki/Dirac_delta_function#Applications