Finding estimates of a Linear Regression Equation - R

390 Views Asked by At

I'm new to Statistics and R. I'm currently looking through a book called "Discovering Statistics using R". Although the book implies you don't need any statistical background, some of the content isn't covered/explained properly (in my opinion), despite being recommended for beginners... It's a great book though, apart from covering the following:

I'm trying to look at the relationship between x1, x2 and the response variable y using linear regression.

x1=c(  5,  6,  7, 12, 4,   9,  2, 4,   1,  8)
x2=c(  1,  6, 10, 11, 1,   2,  4, 6,   1,  3)
y =c(8.5, 11, 12, 20, 9, 5.5, 11, 5, 2.3, 12)

The linear regression model relating x1, x2 with y is: y = β0 + β1x1 + β2x2

The matrix form is Y = Xβ (where Y is the matrix form of y, β′ = (β0, β1, β2) and X = (1, x1, x2).)

How do I go about finding the estimates of β using the following equation?

$$\widehat{β} = (X′X)^{−1} X′Y$$

If you could point me in the right direction or give another example/reference in R so I can figure this one out, that would be great.

2

There are 2 best solutions below

1
On BEST ANSWER

Here is some R code that computes the coefficients using linear algebra $\widehat{β} = (X′X)^{−1} X′Y$, and then using R's built-in function lm. As you point out, the first column of $X$ in the model $y=X\beta$ is a column of 1s.

# Your data
x1=c(  5,  6,  7, 12, 4,   9,  2, 4,   1,  8)
x2=c(  1,  6, 10, 11, 1,   2,  4, 6,   1,  3)
y =c(8.5, 11, 12, 20, 9, 5.5, 11, 5, 2.3, 12)

# Build inverse. Needs column of 1s for intercept.
# (matrix form of linear regression equation is y = Xb)
x0=rep(1,10)
X=t(matrix(c(x0,x1,x2),nrow=3,byrow=T))

# MASS for ginv matrix inversion function
library(MASS)
# Calculate coefficients using linear algebra
ginv(t(X) %*% X) %*% t(X) %*% y

# Calculate coefficients using R's built-in linear model function
lm(y ~ x1 + x2)

Note

  • You need to install the MASS package before using it.
  • The R operator for matrix multiply is %*%.
  • Transpose is accomplished using t()
0
On

Affine regression: $y_i = w^T x_i +b + e_i$. Average over $i$ to obtain $\frac{1}{m} \sum_i y_i = w^T\frac{1}{m}\sum_i x_i +b +\frac{1}{m}\sum_i e_i$. This gives $\bar y = w^T\bar x +b \bar e$. Subtract this from the first equation to get $y_i -\bar y = w^T(x_i-\bar x) + e_i-\bar e$. Assume $\bar e=0$. Then what remains is centered linear regression. So under these assumptions, you can use affine, or center then linear regression