Can a prediction interval be interpreted as a probability?

560 Views Asked by At

Suppose I find a 90% prediction interval for some data distribution. This implies that if I sample large enough data from this distribution, then 90% of such data will lie inside the prediction interval. Is it same as saying that any randomly sampled data point from the distribution will lie inside the prediction interval with 0.9 probability?

1

There are 1 best solutions below

0
On

Your interpretation of a prediction interval is incorrect. A 90% prediction interval will contain 90% of the probability of the true underlying distribution on average (not always nor "at a minimum").

What you are thinking about is a less-taught concept: a Tolerance Interval. A tolerance interval is specified by a confidence and a coverage: The coverage is the minimum probability that we want our interval to contain (as calculated using the true distribution), while the confidence is the probability that the interval will actually achieve its minimum coverage for a random sample from the population.

In contrast, a 90% prediction interval is defined in reference only to the next point, not to all future points (like tolerance intervals). So, say you collect a sample $S$ and calculate a 90%PI, then collect one more point, $p$. What the 90% tells us is that, 90% of the PI's constructed in this way from random samples $S$ will contain the next point $p$.

What is missing from PI's is the conditional argument: Let $I$ be our prediction interval and $p_1$ be the first "out of sample" point collected after we formed our interval. What we cannot say is:

$$P(p_2 \in I|p_1 \in I)=0.9\;$$

We also cannot say:

$$P(p_2 \in I|p_1 \notin I)=0.9$$

However, a tolerance interval is defined not in reference to the next point but to the entire population, so it retains its properties for all future "out of sample" observations (albeit with the given confidence).