Estimate variance of normal distribution of known mean

506 Views Asked by At

I have made the following observations between two variables X and Y:

$$ \begin{array}{|c|c|} \hline X & Y \\ \hline \hline 40 & 29.5 \\ \hline 38 & 27.8 \\ \hline 35 & 27.3 \\ \hline 41 & 34.1 \\ \hline 35 & 26.8 \\ \hline \end{array} $$

We assume that there is a linear dependency between these variables, such that $Y = aX+b + \epsilon$, where $\epsilon$ is a random variable following a normal distribution of mean $0$ but unknown variance.

Using standard linear regression analysis I arrive at the conclusion that $a=0.91234$ and $b=5.38636$.

I am now given an observation $X = 25$, and I am asked to predict the probability that the associated $Y$ value will exceed $26.8$ in value.

For this I would need to estimate the standard deviation of $\epsilon$, and I am right now a bit stuck on how to do that.

1

There are 1 best solutions below

0
On BEST ANSWER

Using the simple linear regression analysis, I find that $a=0.9123 $ and $b=-5.3864$. So the estimated regression line is $$ \widehat{y} = 0.9123 x -5.3864. $$

The estimate of the variance $\sigma^2$ is found by using the equation $$ \widehat{\sigma}^2 = s^2 = \frac{\sum_i (y_i-\widehat{y}_i)^2 }{n-2} $$ where $\widehat{y}_i=ax_i + b$ for $1\leq i\leq 5$.

Note that the differences $y_1-\widehat{y}_1, y_2-\widehat{y}_2, y_3-\widehat{y}_3,y_4-\widehat{y}_4,y_5-\widehat{y}_5$ between observed and fitted $y$-values, respectively, are called $\textbf{residuals}$.

Here $n-2$ in the denominator of the equation for the estimated variance $s^2$ is the number of degrees of freedom (df): the two parameters $a$ and $b$ must first be estimated, which result in a loss of $2$ df.

So $$ \begin{align*} s^2 &\approx \frac{(29.5-31.107)^2 + (27.8-29.283)^2 +(27.3-26.546)^2 +(34.1-32.020)^2 +(26.8-26.546)^2 }{3} \\ &\approx 3.24777 \end{align*} $$ implies $s\approx \boxed{1.80216}$.

Now, given an observation $X=25$, we see that $\widehat{y}\approx 17.42214$. So $$ \begin{align*} P(Y> 26.8) &= \int_{26.8}^{\infty} \frac{1}{\sqrt{2 \pi}\: 1.80216}e^{-\frac{(x-17.42214)^2}{2(1.80216)^2}} dx. \end{align*} $$

Using RStudio, I see that $$ P(Y> 26.8) =\texttt{1- pnorm(26.8, 17.42214, 1.80216) } \approx \boxed{9.769\times 10^{-8}}. $$