Regression Analysis

156 Views Asked by At

When I have a table of values like \begin{array}{c|ccccc} x & 1 & 2 & 3 & 4 & 5 \\ y & 3 & 6 & 8 & 9 & 0 \\ y & 4 & 6 & 1 & 2 & 4 \end{array}

and know that it is a simple linear regression model, what is the value of $n$? I think it is either $5$ or $10$ but am not sure which one. I need the value to calculate the least square estimates. Please explain.

2

There are 2 best solutions below

0
On

$n$ should be 5. The question might be asking you to calculate a simple linear regression with one of set of values for $y$ and $x$, and then calculate it again using the other set of values of $y$ and the same values for $x$ again. So, at first focus on the 1st and 2nd rows only

\begin{array}{c|ccccc} x & 1 & 2 & 3 & 4 & 5 \\ y & 3 & 6 & 8 & 9 & 0 \\ \end{array}

then you will have that

$$\hat{\beta_1 }= \frac{\sum_{i=1}^n(x_i-\bar{x})(y_i-\bar{y})}{\sum_{i=1}^n(x_i-\bar{x})^2}$$

and $$\hat{\beta_0}=\bar{y}-\hat{\beta_1}\bar{x}$$

Now since $n$ equals 5 we have according to the table we have

$ \bar{x} =\frac{1}{5} \sum_{i=1}^{5} x_i=\frac{1}{5}(1+2+3+4+5)$=3

and

$ \bar{y} =\frac{1}{5} \sum_{i=1}^{5} y_i=\frac{1}{5}(3+6+8+9+0)=5.2$

so,

\begin{align*} \sum_{i=1}^n(x_i-\bar{x})(y_i-\bar{y}) &= \sum_{i=1}^5(x_i-3)(y_i-5.2)\\ &= (1-3)(3-5.2)+(2-3)(6-5.2)+(3-3)(8-5.2)\\ & \quad +(4-3)(9-5.2)+(5-3)(0-5.2) \\ &= -3 \end{align*}

and

\begin{align*} \sum_{i=1}^n(x_i-\bar{x})^2 &= \sum_{i=1}^5(x_i-3)^2 \\ &= (1-3)^2+(2-3)^2 +(3-3)^2+(4-3)^2+(5-3)^2 \\ &= 10 \end{align*}

so,

$$\hat{\beta_1}= \frac{-3}{10}=-0.3$$

and

\begin{align*} \hat{\beta_0} &=\bar{y}-\hat{\beta_1}\bar{x}\\ &= 5.2-(-0.3)\cdot 3 \\ &= 6.1 \end{align*}

Hence, the expected value of $y_i$ given $x_i$, denoted as $\hat{y_i}$ is given by equation:

$$\hat{y_i}=6.1-0.3x_i+e_i$$

where $e_i$ is you error term. Now you can do this same process using the other set of value for $y$. Notice that in a simple linear regression modal you will never have more that one independent variable and dependent variable, and the number of observations must match. That is our data is a set of ordered pairs $$\{(y_i,x_i)|i=1,...,n\}$$

which makes it clear that we need to have as many $y$ values as $x$ values. In practice it is often that you can find more data for one variable than another variable, in which case we only consider values of each variable that have matches. Does this make sense?

2
On

$\newcommand{\+}{^{\dagger}}% \newcommand{\angles}[1]{\left\langle #1 \right\rangle}% \newcommand{\braces}[1]{\left\lbrace #1 \right\rbrace}% \newcommand{\bracks}[1]{\left\lbrack #1 \right\rbrack}% \newcommand{\ceil}[1]{\,\left\lceil #1 \right\rceil\,}% \newcommand{\dd}{{\rm d}}% \newcommand{\ds}[1]{\displaystyle{#1}}% \newcommand{\equalby}[1]{{#1 \atop {= \atop \vphantom{\huge A}}}}% \newcommand{\expo}[1]{\,{\rm e}^{#1}\,}% \newcommand{\fermi}{\,{\rm f}}% \newcommand{\floor}[1]{\,\left\lfloor #1 \right\rfloor\,}% \newcommand{\half}{{1 \over 2}}% \newcommand{\ic}{{\rm i}}% \newcommand{\iff}{\Longleftrightarrow} \newcommand{\imp}{\Longrightarrow}% \newcommand{\isdiv}{\,\left.\right\vert\,}% \newcommand{\ket}[1]{\left\vert #1\right\rangle}% \newcommand{\ol}[1]{\overline{#1}}% \newcommand{\pars}[1]{\left( #1 \right)}% \newcommand{\partiald}[3][]{\frac{\partial^{#1} #2}{\partial #3^{#1}}} \newcommand{\pp}{{\cal P}}% \newcommand{\root}[2][]{\,\sqrt[#1]{\,#2\,}\,}% \newcommand{\sech}{\,{\rm sech}}% \newcommand{\sgn}{\,{\rm sgn}}% \newcommand{\totald}[3][]{\frac{{\rm d}^{#1} #2}{{\rm d} #3^{#1}}} \newcommand{\ul}[1]{\underline{#1}}% \newcommand{\verts}[1]{\left\vert\, #1 \,\right\vert}$ $\ds{{\cal F}\pars{a,b} \equiv \half\sum_{i = 1}^{5}\sum_{\sigma = \pm}\pars{ax_{i} + b - y_{i\sigma}}^{2}}$

\begin{align} 0 &= \partiald{{\cal F}\pars{a,b}}{a} = \sum_{i = 1}^{5}\sum_{\sigma = \pm}\pars{ax_{i} + b - y_{i\sigma}}x_{i} = \sum_{i = 1}^{5}\pars{2x_{i}^{2}a + 2x_{i}b - x_{i}\sum_{\sigma = \pm}y_{i\sigma}} \\[3mm] 0 &= \partiald{{\cal F}\pars{a,b}}{b} = \sum_{i = 1}^{5}\sum_{\sigma = \pm}\pars{ax_{i} + b - y_{i\sigma}} = \sum_{i = 1}^{5}\pars{2x_{i}a + 2b - \sum_{\sigma = \pm}y_{i\sigma}} \end{align}

$$ \left\lbrace% \begin{array}{rcrcl} \overbrace{\pars{2\sum_{i = 1}^{5}xi^{2}}}^{\ds{\equiv\ S_{xx}}}\ a & + & \overbrace{\pars{2\sum_{i = 1}^{5}xi}}^{\ds{\equiv\ S_{x} }}\ b & = & \overbrace{\sum_{i = 1}^{5}x_{i}\sum_{\sigma = \pm}y_{i\sigma}} ^{\ds{\equiv S_{xy}}} \\[3mm] \underbrace{\pars{2\sum_{i = 1}^{5}x_{i}}}_{\ds{=\ S_{x}}}\ a & + & 10\,b & = & \underbrace{\sum_{i = 1}^{5}\sum_{\sigma = \pm}y_{i\sigma}}_{S_{y}} \end{array}\right. $$

$$ S_{xx} = 110\,,\quad S_{x} = 30\,,\quad S_{xy} = 122\,,\quad S_{y} = 43 $$

$$\left.% \begin{array}{rcrcl} 55 a & + & 15 b & = & 61 \\[1mm] 30 a & + & 10 & b = & 43 \end{array}\right\rbrace \quad\imp\quad \left\lbrace% \begin{array}{rclcr} a & = & {61\times 10 - 43\times 15 \over 100} & = & -\,{7 \over 20} \\[3mm] b & = & {51\times 43 - 30\times 61 \over 100} & = & {363 \over 100} \end{array}\right. $$ which yields $$ y\pars{x} = {1 \over 100}\pars{-35x + 363}\,,\qquad\imp\qquad \begin{array}{c|ccl} x & y&& \\[2mm] 1 & {82 \over 25} & = & 3.28 \\[2mm] 2 & {293 \over 100} & = & 2.93 \\[2mm] 3 & {129 \over 50} & = & 2.58 \\[2mm] 4 & {223 \over 100} & = & 2.23 \\[2mm] 5 & {47 \over 25} & = & 1.88 \end{array} $$