Multivariate linear regression with 2 independent variables - formulae

874 Views Asked by At

I have regressed y on x1 and x2 in python but I get very different results when I do it by hand. I am using the following formulae:

http://faculty.cas.usf.edu/mbrannick/regression/Reg2IV.html

I am having hard time finding the formulae for a, b1 and b2 in the 2-variate regression:

$$ y = a + b_1x + b_2x $$

It is a bit silly to ask for this but everyone seems to be doing this with a software package and I want to do it by hand. Any useful references would be very appreciated.

3

There are 3 best solutions below

4
On BEST ANSWER

The OLS estimator of $b$ is given by $$ \hat{b} = \mathbf{(X ^ T X)^{-1}X ^ T y }, $$ where $\mathbf{X}$ is $ n\times 3$ matrix, where the first column is all ones for the intercept and the other two consist of your $x_{1}$ and $x_2$ values respectively. Same structure holds for any number of variables $p$ with $\mathbf{X}_{n \times p}$.

0
On

Supposing that you have $n$ datapoints $(x_{1i},x_{2i},y_i)$ and you want to fit the model $$y = a + b_1\,x_1 + b_2\,x_2$$ for sure matrix calculation is simple.

Otherwise, build the standard normal equations for ordinary least-square methods; they are $$\sum_{i=1}^n y_i= n a + b_1\sum_{i=1}^n x_{1i}+b_2\sum_{i=1}^n x_{2i}$$ $$\sum_{i=1}^n x_{1i}\,y_i= a\sum_{i=1}^n x_{1i} + b_1\sum_{i=1}^n x^2_{1i}+b_2\sum_{i=1}^n x_{1i}\,x_{2i}$$ $$\sum_{i=1}^n x_{2i}\,y_i= a\sum_{i=1}^n x_{2i} + b_1\sum_{i=1}^n x_{1i}\,x_{2i}+b_2\sum_{i=1}^n x^2_{2i}$$ Three linear equations for the three unknown variables $a,b_1,b_2$.

0
On

In the referenced page, the formulas of the slope and intercept for one variable function is given as:

$$b=\frac{\sum xy}{\sum x^2}; \ \ \ \ \ \ a=\bar{y}-b\bar{x}.$$

whereas it must be: $$b=\frac{SS_{xy}}{SS_{xx}}=\frac{\sum xy-\frac{\sum x\sum y}{n}}{\sum x^2-\frac{(\sum x)^2}{n}}; \ \ \ \ \ \ a=\bar{y}-b\bar{x}.$$

Note how it is similar with the matrix formula: $$b=(X^TX)^{-1}X^TY=\frac{X^TY}{X^TX}.$$

Once you understand the method of minimizing (setting partial derivatives to zero) the two variable function: $$R^2(a,b)=\sum (y-a-bx)^2,$$ it is easy to generalize it to the three variable function: $$R^2(a,b,c)=\sum (y-a-bx-cx^2)^2.$$