Root Mean Square Error - How did he get this number?

264 Views Asked by At

So I am studying for a college final exam, and following a past exam paper at the moment.

The lecturer has provided us with solutions to the previous years exam paper, not very clear in some cases might I add.

Such example is the Root Mean Square Error question.

I have a table $$\begin{array}{|c|c|} \hline X_k & Y_k \\ \hline 1 & 0.6 \\ 2 & 1.9 \\ 3 & 4.3 \\ 4 & 7.6 \\ 5 & 12.6 \\ \hline \end{array}$$

He has given the Root Mean Square Error answer without showing any work on how he got it. I would like to know if anyone here can help me?

E(f(Xk)-Yk)^2 = 0.86609

E(f) = (1/5(0.86609))^1/2 = 0.416195

Any idea how he gets the 0.86609 value? Once I figure out that step, the rest is straight forward.

Any help to put me in the right direction is greatly appreciated.

1

There are 1 best solutions below

0
On

An umbiased estimator of the residual MSE is given by $$\hat{\sigma}^2 = \dfrac{1}{5-1-1}\sum_{i=1}^{5}(\hat{y}_i - y_i)^2 = \dfrac{1}{3}\sum_{i=1}^{5}(\hat{y}_i - y_i)^2\text{.}$$ You then take the square root of this to get the RMSE. Assuming that $\hat{y}_i$ is computed using least squares on a simple linear regression, you will get $$\hat{y}_i = 2.97x_i - 3.51$$ as your regression line. This gives $$\begin{array}{|c|c|} \hline X_k & Y_k & \hat{Y}_k\\ \hline 1 & 0.6 & -0.54\\ 2 & 1.9 & 2.43\\ 3 & 4.3 & 5.40\\ 4 & 7.6 & 8.37\\ 5 & 12.6 & 11.34\\ \hline \end{array}$$

This is the code in R to generate the fitted values:

> data <- matrix(c(1, 0.6,
                 2, 1.9,
                 3, 4.3,
                 4, 7.6,
                 5, 12.6), 
               ncol = 2,
               byrow=TRUE)
> colnames(data) <- c("X", "Y")
> data <- as.data.frame(data)
> data
  X    Y
1 1  0.6
2 2  1.9
3 3  4.3
4 4  7.6
5 5 12.6
> lm(Y~X, data = data)

Call:
lm(formula = Y ~ X, data = data)

Coefficients:
(Intercept)            X  
      -3.51         2.97  

> fitted(lm(Y~X, data = data))
    1     2     3     4     5 
-0.54  2.43  5.40  8.37 11.34 

Now the thing is, my answers don't match what you have.

> data <- data.frame(data, fitted(lm(Y~X, data = data)))
> data
  X    Y fitted.lm.Y...X..data...data..
1 1  0.6                          -0.54
2 2  1.9                           2.43
3 3  4.3                           5.40
4 4  7.6                           8.37
5 5 12.6                          11.34
> colnames(data)[3] <- 'Y_hat'
> data
  X    Y Y_hat
1 1  0.6 -0.54
2 2  1.9  2.43
3 3  4.3  5.40
4 4  7.6  8.37
5 5 12.6 11.34
> sum((data$Y_hat - data$Y)^2)/3
[1] 1.657
> sqrt(1.657)
[1] 1.287245

This matches with the R output as well:

> summary(lm(Y~X, data = data))

Call:
lm(formula = Y ~ X, data = data)

Residuals:
    1     2     3     4     5 
 1.14 -0.53 -1.10 -0.77  1.26 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)  -3.5100     1.3501  -2.600  0.08039 . 
X             2.9700     0.4071   7.296  0.00532 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.287 on 3 degrees of freedom
Multiple R-squared:  0.9467,    Adjusted R-squared:  0.9289 
F-statistic: 53.23 on 1 and 3 DF,  p-value: 0.005316

So I'm either guessing that some information is either missing or incorrect, or your lecturer has done this incorrectly.