Difficulty to find the Standard Error in a simple hypothesis testing

65 Views Asked by At

I'm having some difficulty to solve a very simple hypothesis testing problem.

$$\begin{array}{c|c|c|} & \text{X} & \text{Y} \\ \hline \text{1} & 1 & 1 \\ \hline \text{2} & 3 & 4 \\ \hline \text{3} & 5 & 4 \\ \hline \text{4} & 6 & 4 \\ \hline \text{5} & 9 & 9 \\ \hline \text{mean} & 4,8 & 4,4 \\ \hline \end{array}$$

I'm running a linear regression with this table in a software and verifying that:

beta = 0,8804

Standard Error = 0,2057

I want to know how to do this by hand.. I know that beta comes from:

$$ \beta = \frac{\sum xiyi}{\sum xi^2} $$

But I have no idea how to find the Standard Error without the software. Anyone knows where the 0,2057 comes from?

2

There are 2 best solutions below

0
On BEST ANSWER

Let $X_k$ be the $k$th sample from data $X$, where $k\in1,2,3,4,5$, so that

$X_1=1,X_2=3,X_3=5,X_4=6,X_5=9$

Let $Y_k$ be the $k$th sample from data $Y$, where $k\in1,2,3,4,5$, so that

$Y_1=1,Y_2=4,Y_3=4,Y_4=4,Y_5=9$

The standard error is the square root of the variance of the least squares estimate of the gradient, which is $beta$ in your case, for a linear model.

It is calculated using the following formula:

$\large SE = \sqrt{\frac{\frac{1}{n-2}\sum_{k=1}^ne_k^2}{\sum_{k=1}^N(X_k-\bar{X})^2}}$

where $n$ is the number of samples (which is $5$ in our case), $e(k)$ is the residual error between $Y_k$ and $\hat{Y_k}$, where $\hat{Y_k}$ is the value we would have obtained from $X_k$ using the linear model, and $\bar{X}$ is the mean of $X$.

$e(k)=Y_k-\hat{Y_k}$

where $\hat{Y_k}$ is calculated as follows:-

$\hat{Y_k}=beta\times X_k+ alpha$

$beta$ is the estimated gradient, $0.8802$, $alpha$ is the estimated intercept $0.1739$, according to my free online regression calculator (at http://scistatcalc.blogspot.co.uk/2013/10/web-app-testing.html)

We thus obtain

$\hat{Y}=[1.0543,2.8151,4.5759,5.4563,8.0975]$

$\sum_{k=1}^5e_k^2=(1-1.0543)^2+(4-2.8151)^2+(4-4.5759)^2+(4-5.4563)^2+(9-8.0975)^2\\=4.6739$

and $\bar{X}=24/5=4.8$

$\sum_{k=1}^5(X_k-\bar{X})^2=(1-4.8)^2+(3-4.8)^2+(5-4.8)^2+(6-4.8)^2+(9-4.8)^2\\=36.8$

so that

$\large SE = \sqrt{\frac{(\frac{1}{5-2})4.6739}{36.8}}=0.20576$

0
On

If your observations are $(x_i, y_i)$ for $i = 1, 2, \ldots, n$, then the correct least squares point estimate for the slope $\beta$ should be $$\hat \beta = \frac{\sum x_i y_i - \frac{1}{n} \sum x_i \sum y_i}{\sum x_i^2 - \frac{1}{n}\left(\sum x_i\right)^2}.$$ From this, we can obtain the variance of $\hat \beta$: $${\rm Var}[\hat\beta] = \frac{\sigma^2}{\sum (x_i - \bar x)^2},$$ where $\sigma^2$ is the variance of the IID normal errors. Since this is an unknown parameter, we must estimate it: $$\begin{align*} \hat \sigma &= \frac{1}{n-2} \sum_{i=1}^n (y_i - \hat y_i)^2 \\ &= \frac{1}{n-2} \left( \sum y_i^2 - \frac{1}{n} \Bigl( \sum y_i \Bigr)^2 - \hat\beta \Bigl( \sum x_i y_i - \frac{1}{n} \sum x_i \sum y_i \Bigr) \right).\end{align*}$$ Thus the estimated variance of $\hat \beta$ is $$\widehat{\rm Var}[\hat\beta] = \frac{\hat\sigma^2}{\sum (x_i - \bar x)^2}.$$ The square root gives the standard error, and for the hypothesis $$H_0 : \beta = 0, \quad H_a : \beta \ne 0,$$ the distribution of the test statistic $$T = \frac{\hat \beta - 0}{{\rm s.e.}(\hat\beta)}$$ is Student's $t$-distributed with $\nu = n-2$ degrees of freedom.