Linear regression problem

56 Views Asked by Bumbble Comm At 26 Feb 2026 - 7:32

Let's say researchers observe $\{E_i, D_i\}_{i=1}^n$ where $E_i$ represents a person's years of education and $D_i = \begin{cases} 0 & \text{, if neither parent has a college degree} \\ 1 & \text{, if at least one parent a college degree} \end{cases}$ The researchers estimate a linear regression with

$$E = B_1 + B_2D + \epsilon$$

, and find that,

$$\begin{bmatrix} b_{1} \\ b_{2} \\ \end{bmatrix} = \begin{bmatrix} 10.5 \\ 4.3 \\ \end{bmatrix}$$ and,

$$\begin{bmatrix} \hat{Var(b_1)} \\ \hat{Var(b_2)} \\ \end{bmatrix} = \begin{bmatrix} 3.8 \\ 1 \\ \end{bmatrix}$$

(i) What is the estimate of the expected number of years of education for a person who had at least one parent attend college?

(ii) Assume $\bar{D_n}= \frac{1}{n}\sum_i^n{D_i} = 0.56$. What is the average of $E$ in this sample?

(iii) Using a normal approximation, determine a $90$% confidence interval for $B_2$.

(iv) Can a $95$% confidence interval be found for $B_2$ using a t-approximation? If yes, find it. If not, explain why not.

My attempt:

(i) From the results, we can assume the graph was $E = 10.5 + 4.3 D$. If $D = 1$, $E = 10.5 + 4.3 = 14.8$

(ii) $\mathbb{E}[E] = \mathbb{E}[B_1 + B_2D + \epsilon] = b_1 + b_2\mathbb{E}[D] + \mathbb{E}[\epsilon] = 10.5 + (4.3 * 0.56) + 0 = 12.908$

(iii) With the normal approximation, the $90$% confidence interval is (I think) given by

$$C.I. = [b_2 - z_{\alpha/2} \sqrt{Var(b_2)}, b_2 + z_{\alpha/2} \sqrt{Var(b_2)}]$$

We can easily find that $z_{0.05} = 1.645$. Then,

$$C.I. = [4.3 - 1.645 \sqrt{1}, 4.3 + 1.645 \sqrt{1}]$$

$$C.I. = [2.66, 5.94]$$

(iv) A t-distribution cannot be used here because we do not know the sample size and therefore cannot determine the degrees of freedom.

Is this correct? For (i), I was not sure whether I'm supposed to compute the expectation of $E$ or to do what I showed above. I'm not quite sure of the other solutions either.

Original Q&A

There are 1 best solutions below

Bumbble Comm On 21 Oct 2021 - 9:41

Yes, everything is correct. Note that in linear regression, the predictions $\hat{y_i}$ are always called the "expected" value given some values of the inputs, because we are only estimating the value of $y$ and we are in a sense "expecting" our estimates to represent the true value. Hope this clears up your confusion.

If you study linear regression in terms of Machine Learning, you will see that the optimal estimator function $f^*(x)=\hat{y}$ for linear regression with the square loss function is in fact $f^*(x)=E(y|x)$.

Linear regression problem

There are 1 best solutions below

Related Questions in STATISTICS

Related Questions in SOLUTION-VERIFICATION

Related Questions in ECONOMICS

Related Questions in LINEAR-REGRESSION

Related Questions in CONFIDENCE-INTERVAL

Trending Questions

Popular # Hahtags

Popular Questions