Expected number of years that have record high or low rainfall

755 Views Asked by At

Exercise problem from Introduction to Probability by Joe Blitzstein,

Let X1,X2, . . . be the annual rainfalls in Boston (measured in inches) in the years 2101, 2102, . . . , respectively. Assume that annual rainfalls are i.i.d. draws from a continuous distribution. A rainfall value is a record high if it is greater than those in all previous years (starting with 2101), and a record low if it is lower than those in all previous years.

(a) In the 22nd century (the years 2101 through 2200, inclusive), find the expected number of years that have either a record low or a record high rainfall.

My attempt is as follows.

Let $R_{j}$ be the indicator variable if $j^{th}$ year is record high or low. Let $P(r_{j}) = p_{j}$.

$p_{j} = P(X_{j} > max(X_{1}, X_{2}, \ldots, X_{j-1})) + P(X_{j} < min(X_{1}, X_{2}, \ldots, X_{j-1}))$

$p_{j} = \prod_{k=1}^{j-1} \left[ P(X_{j} > X_{k}) + P(X_{j} < X_{k}) \right]$

$P(X_{k} \leq X_{j}) = \int_{0}^{\infty} P(X_{k}\leq x) f(x) dx $

$P(X_{k} \leq X_{j}) = \int_{0}^{\infty} F(x) dF(x) dx$

$P(X_{k} \leq X_{j}) = \frac{F^{2}(x)}{2}|_{0}^{\infty} = \frac{1}{2}$.

$\implies p_{j} = \frac{1}{2^{j-2}}$.

Sanity Check: for j=2, p = 1, which is true as the year will definitely be record high or low.

I found a solution on the internet, which takes a different approach.

The random variables $X_{1}, X_{2}, \ldots, X_{j}$ can be ordered based on the values. From all such orderings, only two are of interest (record high or low). So,

$p_{j} = \frac{2*(j-1)!}{j!} = \frac{2}{j}$.

Sanity Check: for j=2, p = 1.

Why is there a discrepancy in both approaches? What am I doing wrong?

1

There are 1 best solutions below

0
On

Based on the hint by @lulu in the comments, here is the answer.

Consider one of the terms in the expression for $p_{j}$, from my approach (by symmetry, both the terms will be equal).

$P(X_{3} < X_{1}, X_{3} < X_{2}) = P(X_{3}<X_{1} | X_{3}<X_{2})P(X_{3}<X_{2})$

The mistake in my approach is assuming conditional independence in the above expression and arriving at the wrong expression.

$P(X_{3} < X_{1}, X_{3} < X_{2}) = P(X_{3}<X_{1} | X_{3}<X_{2}) * \frac{1}{2}$

$P(X_{3} < X_{1}, X_{3} < X_{2}) = \left[ P(X_{3}<X_{1}|X_{3}<X_{2}, X_{2}<X_{1})*P(X_{2}<X_{1}|X_{3}<X_{2}) + P(X_{3}<X_{1}|X_{3}<X_{2}, X_{1}<X_{2}*P(X_{1}<X_{2}|X_{3}<X_{2})) \right]*\frac{1}{2}$

$P(X_{3} < X_{1}, X_{3} < X_{2}) = \frac{1}{2}* \left[ 1*\frac{1}{2} + \frac{1}{3}*\frac{1}{2} \right] = \frac{1}{3}$.

The conditional probabilities in the above expressions can be computed using symmetry arguments, which intuitively would be based on the orderings of three arbitrary numbers. I guess it's much easier and intuitive to think about this problem directly in terms of permutations of random variables.