Statistics, PDF and CDF pre-university (A level)

184 Views Asked by At

This is a question from an A level textbook on continuous random variables. It states that the CRV $T$ has pdf $f(t)=0.5 ~for~ 1<t<3$ and then goes on to ask us to find the CDF (easy peasy $F(t)=\frac{1}{2} (t-1)$ and to show that the probability of selecting two independent observation less than 2.5 is $\frac{9}{16}$ (again, fine, $F(2.5)\times F(2.5)$). The third part, however, is a bit weird. I'll quote exactly

"$S$ is the larger of two independent observations of $T$. By considering the CDF of $S$ show that $S$ has pdf $g(s)=\frac{s-1}{2}$ and then the details about the ranges it applied to"

I couldn't get anywhere with this and cheated looking up their worked solutions which amounted to

$P(S=s)=P(T=s)\times P(T\leq S)\times 2=0.5\times \frac{s-1}{2}\times 2=\frac{s-1}{2}$

In other words, you pick a value of $T=s$ and the next one needs to be smaller than it ($T\leq s$) but it could be the other way around (hence $\times 2$).

Now, I'm really concerned about $P(T=s)$, surely this is zero?

Your help will be much appreciated.

2

There are 2 best solutions below

8
On

You raise an excellent point. Their derivation here is fine, but looks unrigorous unless interpreted correctly.

This procedure is analogous to a calculation of the form “$\mathrm{d}u=5u\,\mathrm{d}x$” when doing a substitution in an integral. Strictly speaking, both the LHS and RHS don’t make sense unless you interpret them in a certain way. Here, I’m interpreting $P(S=s)$ as $g(s)=\mathrm{d}F_S$ where $F$ is the cumulative distribution function (this is all similar to Riemann-Stieltjes integration and some ideas from measure theory, eg the Radon-Nikodym derivative: beyond A level, but interesting further reading, perhaps...). Their use of “$P(T=s)$” rather than: “$\operatorname{pdf}_T(s)$” is possibly adding to the confusion, since strictly the probability is zero (as you say) while the probability density (the “$5u$” in: $5u\,\mathrm{d}x$) is not zero.

It might make more sense if you determine the cumulative distribution function for $S$ first. Unfortunately, to do so, we’ll still need to mess round with the same “zero” expression, but view it as moving from the discrete case (lots of sums) to a continuous case. It’s better to use density function notation though, since that’s what we are integrating (the distinction between the probability and the density is very important). $$\begin{align}P(S\le s)&=\int_1^s2\operatorname{pdf}_T(s’)\operatorname{cdf}_T(s’)\,\mathrm{d}s’\\&=\int_1^s2(1/2)(1/2)(s’-1)\,\mathrm{d}s’\\&=\frac{1}{4}(s-1)^2\end{align}$$

Now differentiate this expression to find $g(s)$.

3
On

Another approach.

If both observations $s_1$ and $s_2$ are smaller or equal than s, then $\max(s_1,s_2)\leq s$.

Thus you obtain the cdf by calculating $[P(T\leq s)]^2$. Let $G(s)$ denote the corresponding cdf, then

$$G(s)=\left(\frac{1}{2} (s-1)\right)^2 \mathbb 1_{\{1\leq s\leq 3\}} $$

To obtain the pdf you differentiate the cdf w.r.t. $s$