Reconstructing normal distribution according to data ranges

48 Views Asked by At

I have a temperature data and I believe it follows the normal distribution. The problem is that I know just values for few ranges, but I need to have the results for finer temperature classes.

So, as input I have:

    T<23°C --> 21.50%
23°<T<65°C --> 75.27%
65°<T<85°C -->  2.15%
85°<T<95°C -->  1.08%

and I know that it follow a normal distribution. and then I need to estimate the temperature for the following ranges:

X°<T<X+5°C -->  % ; where X=-40,-35, ..., 85, 90, 95, 100,...

Since I do not have a solid mathematical question, I appreciate your help on this topic.

2

There are 2 best solutions below

3
On BEST ANSWER

Before you can provide the probability of the temperature being in each interval $[x, x + 5)$, we first need to derive estimates for the mean and variance of the normal distribution.

Given the limited data you provide, we will only be able to get a very crude estimate for the parameters $\mu$, $\sigma^2$ (respectively the mean and variance) required to describe the normal distribution. Given a random variable $T \sim N(\mu,\sigma^2)$ we can define its cumulative distribution function $F_{\mu,\sigma^2}$ to be

$$ F_{\mu,\sigma^2}(t) = \mathbf P[T \leq t]$$

The data you provided allows us to estimate that the values $\mu,\sigma^2$ should produce a CDF similar to:

$$ \begin{aligned} F(23) & = 0.2150 \\ F(65) & = 0.9677 \\ F(85) & = 0.9892 \\ F(95) & = 1.0000 \end{aligned} $$

Let us denote $t_i$ for the temperatures at which we have data above, and $p_i$ for the corresponding probabilities above. I.e. $t_1 = 23, \, t_2 = 65,...$ and $p_1 = 0.2150, \, p_2 = 0.9677,...$.

We will approximate $\mu,\,\sigma^2$ by minimising the squared error:

$$\sum_{i=1}^4 \left\{ F_{\mu,\sigma^2}(t_i) - p_i\right\}^2$$

To actually find the values $\mu,\sigma^2$ that minimize this we will need to use a computer solver as there is no explicit formula. The code below can be run in R; it is adapted from this post

t_vals <- c(23,65,85,95)
p_vals <- c(0.2150, 0.9677, 0.9892,1)

fn <- function(t) {
  mu <- t[1];
  sigma <- exp(t[2])
  sum((p_vals-pnorm(t_vals,mu,sigma))^2)
}

start_params <- c(20, 1)

est <- nlm(fn, start_params)$estimate
est

mu <- est[1]
s2 <- exp(est[2])^2

This returns the values $\mu \approx 35.61$, and $\sigma^2 \approx 255.42$ (or $\sigma \approx 15.98$). We can plot how well the corresponding cumulative distribution fits the observed data (see plot at the end).

Finally we can now use these estimates to construct the probabilities for each of the ranges you wanted:

df <- data_frame(
  t = seq(0, 95,by = 5)
) %>% mutate(
  prob_less_t = pnorm(t, mean = mu, sd = sd),
  prob_range_t_t_pl_5 = pnorm(t+5, mean = mu, sd = sd) - prob_less_t
)

which gives:

       t prob_less_t prob_range_t_t_pl_5
       <dbl>       <dbl>               <dbl>
 1     0      0.0129           0.0148   
 2     5      0.0277           0.0268   
 3    10      0.0545           0.0441   
 4    15      0.0986           0.0658   
 5    20      0.164            0.0890   
 6    25      0.253            0.109    
 7    30      0.363            0.122    
 8    35      0.485            0.123    
 9    40      0.608            0.113    
10    45      0.722            0.0945   
11    50      0.816            0.0714   
12    55      0.887            0.0490   
13    60      0.937            0.0305   
14    65      0.967            0.0173   
15    70      0.984            0.00885  
16    75      0.993            0.00412  
17    80      0.997            0.00174  
18    85      0.999            0.000667 
19    90      1.000            0.000232 
20    95      1.000            0.0000732

Note: since the probability of a temperature below 0 is so small I have not provided all rows from -40C onwards, and start at 0. The first column you can ignore, the second colume is a temperature $t$, the third column is the probability that the temperature will be less than $t$, whilst the fourth column is the probability of being in the range $[t,t+5)$.

enter image description here

0
On

Given any two quantiles of a normal distribution, you can find its mean $\mu$ and SD $\sigma.$ The method is to standardize and then to solve two equations in the two unknowns $\mu$ and $\sigma.$ (This assumes that exact quantiles are known. If quantiles are approximated from data, then results will be only approximate.)

First, you are given $$.215 = P(X \le 23) = P\left(Z = \frac{X-\mu}{\sigma} \le \frac{23-\mu}{\sigma}\right),$$

where $Z$ is standard normal, so $\frac{23-\mu}{\sigma} = -0.7892$ (approximately from standard normal tables or exactly using R statistical software in which the 'quantile fundtion' qnorm is the inverse of a normal CDF).

qnorm(.215)
[1] -0.7891917

Second, you have $$ .215 + .7527 + .0215 = 0.9892 = P(X \le 85) = P(Z \le (85 - \mu)/\sigma),$$ so $(85 - \mu)/\sigma) = 2.2974.$

qnorm(.9892)
[1] 2.297329

Roughly, I get $\mu = 38.853,\, \sigma = 20.087.$ Once you have $\mu$ and $\sigma,$ you can use standardization and normal tables or you can use software to find the probabilities of you intervals of interest (listed in your Question).