Uniform Distribution Accuracy

375 Views Asked by At

A data scientist is testing his predictor (based on historical data) that would make predictions about Philadelphia’s temperatures. A prediction is judged "correct" if it falls within three standard deviations from the target value. Assume that the expected value of each prediction equals the target value. What is the accuracy of the predictor if the distribution of measurements is uniform?

5

There are 5 best solutions below

2
On BEST ANSWER

Let $X\sim \mathcal{U}(a,b)$. Assume $\mu = \mathbb{E}(X)=\frac{a+b}2$ and $\sigma^2 = Var(X)=\frac{(b-a)^2}{12}$ so that $\sigma=\frac{(b-a)}{2\sqrt{3}}$. First, by Chebyshev's inequality we know that $\mathbb{P}(|X-\mu|\leq 3\sigma)\geq \frac89$ which is close to $1$.

In fact, one can show by elementary inequality arguments that $$\mu-3\sigma < a$$ and $$\mu+3\sigma >b,$$ so that $\mathbb{P}(|X-\mu|\leq 3\sigma)=1$ since these bounds are always outside of where $X$ is distributed in $(a,b)$.

Edit Here is a proof that for any $a\leq b \in \mathbb{R}$ that $\mu-3\sigma <a$ where $\mu=(a+b)/2$ and $\sigma=(b-a)/\sqrt{12}$:

Since $\sqrt{3}>1$ we have $\sqrt{3}(a-b)<(a-b)$ since $a-b\leq 0$ because $a\leq b$. Thus, by dividng by $2$, adding $(a+b)/2$ one has $$\frac{(a+b)}2+\sqrt{3}\frac{(a-b)}2<a,$$ but this is the same as $$\frac{(a+b)}2-\frac{3(b-a)}{\sqrt{12}}<a,$$ i.e. $\mu-3\sigma <a$, as desired.

Proving $b< \mu+3\sigma$ is similar.

2
On

The variance of Uniform($[a,b]$) is $\frac{(b-a)^2}{12}$. So the standard deviation is $\frac{b-a}{\sqrt{12}}\approx .2883(b-a)$.

Does that help?

11
On

The blue shows the distribution of measurements of Philadelphia temperature. We are told: "the distribution of measurements is uniform." (I've marked the $\pm 3 \sigma$ points in red.)

The orange shows a candidate distribution of predictions consistent with the statements in the problem: "expected value of each prediction equals the target value." That is all we are given about the predictions---not that it is uniform, or normal, or uni-modal, or anything else.

(Do you agree that the orange distribution of predictions comports to the problem statement? Yes or no? If no, state exactly where it differs.)

Now we ask what is the accuracy of such a prediction, given the stated criterion that a "hit" occurs when the prediction is within $\pm 3 \sigma$ of the target value. Clearly for the case shown this accuracy is not $1.0$, whether or not you consider the "target" to be the mean value of the Philadelphia distribution, or a particular target value (for instance near the extremes of the blue distribution).

wide uniform prediction

Here is a more extreme case:

bimodal prediction

The orange bimodal distribution of predictions is completely consistent with the statement of the problem, including that the mean is equal to the target (if you disagree, state precisely where it is inconsistent)... and now the prediction accuracy is $0$.

3
On

As a quick edit here to hopefully expand on some of the other answers given. I'm going to rewrite the question a bit

What is the accuracy of the predictor if the distribution of measurements is uniform? Assume that

  1. Accuracy is defined as the probability of getting a correct temperature i.e. P($x_i$ is "Correct")
  2. "Correct" is defined as $x_i$ being within three standard deviations ($\sigma$) from the target value
  3. $x_i \sim \text{Unif}(a,b)$, where $a$ and $b$ are two real numbers with $b > a$
  4. $E[x_i] = \mu = \text{target}$ for all $x_i$

From 1. and 2., we want $P(x_i \in [\text{target} - 3\sigma, \text{target} + 3\sigma])$. From 4., we have that $\mu = \text{target}$, so we can rewrite this as $P(x_i \in [\mu - 3\sigma, \mu + 3\sigma])$. The fact that we're using $3\sigma$ is really important here, because if we were to use $0.5\sigma$ for example, then there would not be enough information to solve this problem because the accuracy would depend on $a$ and $b$. However, the use of $3\sigma$ makes this probability equal to 1. To see why, check out LoveTooNap's excellent solution which demonstrates algebraically that $\mu - 3\sigma < a$ and $\mu + 3\sigma > b$, which means that we have a correct prediction with probability $1$.

However, if you weren't able to verify the solution algebraically, then if you have R installed on your computer, you can quickly run this piece of code, which demonstrates these inequalities numerically (at least for $a$ and $b$ within the range $(-100, 100)$).

# Bounds for the uniform RV
a = seq(-100,100,0.1)
b = a

# Creating matrix with all possible values of a and b in the range 0, 100
ab = expand.grid(a,b)

# Getting rid of values where b < a
ab = ab[ab[,2] > ab[,1],]

# For each combination of a and b, calculate the theoretical mean
m = apply(ab,1,mean)

# For each combination of a and b, calculate the theoretical SD
s = apply(ab,1,function(x) return(sqrt(1/12*(x[2]-x[1])^2)))

# Calculate mean - 3SD for each combination of a and b
anew = m - 3*s

# Calculate mean + 3SD for each combination of a and b
bnew = m + 3*s

# Return the probability that mean - 3SD < a AND mean + 3SD > b
sum(anew < ab[,1] & bnew > ab[,2])/nrow(ab)

It's equal to 1, which means that at least numerically, we've verified for all possible combinations of $a$ and $b$ that we could choose from within the interval $(-100, 100)$, that $(a,b)$ is always contained within the interval $(\mu - 3\sigma, \mu + 3\sigma)$, which is kind of a numerical demonstration of LoveTooNap's solution

2
On

For those who were curious, here is the answer provided by my professor:

For those who were curious, here is the answer provided by my professor: