For a random sample of 10 service calls, both the number of copiers and the total service time were recorded.
Number of Copiers (x) : 4, 2, 5, 7, 1, 3, 4, 5, 2, 6
Service Time (y): 90, 60, 170, 190, 40, 80, 100, 130, 70, 150
Σy = 1080, Σx=39, Σy^2=139,000, Σx^2=185, Σxy=5030,
and the data set yields the least-squares regression line $\hat Y = 11.0334+24.8632x.$
a) Compute and interpret the coefficient of determination.
b) If the number of copiers increases by 1, estimate the average increase in service time with 95% confidence.
c)One service call requires 4 copiers to be serviced. Predict the total service time with 95% confidence.
d) Compute and interpret a 95% confidence interval for the mean service time when servicing 4 copiers.
MY WORK:
I've found SSTO=SSyy =22,360 , SSxx=33 , SSR=20277 , SSE = 2083 , MSE = 260.38 , and SSxy = 818.
a) I did SSR/SSTO=20,277/22,360=0.9068, which I believe is correct.
For b) and c) I thought of possibly substituting the values of 1 for b) and 4 for c) into the equation for the least-squares regression line, but then I got confused about the 95% confidence component to the question.
d) I am lost!
Any help is greatly appreciated, thank you.
Here is a guide to the answers in terms of the process of finding the regression line. I believe you should take an overview of the regression material in your text to try to understand the purpose of the formulas and what is meant by the various notations. Then focus specifically on using the formulas to get numerical values. I hope some of the following helps in this process.
(a) Coefficient of determination is $r^2,$ where $r$ is the correlation. In your text, you should have a formula for $r$ in terms of $x$s and $Y$s.
(b) When you get the regression line $\hat Y = b_0 + b_1 x,$ the slope $b_1$ is the answer to this question. Your book may write the y-intercept of the regression equation as $\hat \beta_0$ instead of $b_0,$ and the slope as $\hat \beta_1.$
(c) In the regression line, plug in $x = 4$ and find the corresponding $\hat Y.$
(d) Most texts have formulas for two "intervals" connected with regression: 'confidence' and 'prediction'. The formula for the former may be written as follows:
$$\hat Y_{n-1} + t^*s_{Y|x}\sqrt{\frac{1}{n}+\frac{(x_{n+1} - \bar X)^2}{SS_{xx}}}.$$ This is a confidence interval for $E(Y_{n+1})$, where $(x_{n+1}, Y_{n+1})$ are the coordinates of an observation $n+1$ in addition to the $n$ observations used to get the regression line. The number $t^*$ cuts 2.5% from the upper tail of Student's t distribution with $n - 2 = 10 - 2 = 8$ degrees of freedom. If you define the residuals as $d_i = \hat Y_i - Y_i$ then $s_{Y|x}$ is their standard deviation. Also, $SS_{xx} = (n-1)S_x^2,$ where $S_x = 1.912$ (for your data) is the sample standard deviation of the $x$s.
Below is Minitab output for the regression procedure using your data. I have annotated it to show a few correspondences with your computations, and you should be able to find other connections.
Now we're ready for the regression procedure.
The y-intercept of the regression line is $b_0 = \hat \beta_0 = 11.0$ and its slope is $b_1 = \hat \beta_1 = 29.4.$ The number
Sin the Minitab printout is $S_{Y|x}$ in the formula mentioned earlier. The numberR-sq = 91.0%indicates that $r^2 = (0.954)^2 = 0.910.$ [For a single predictor variable $x$, it is OK to ignoreR-SQ(adj).] It is not surprising that the data are consistent with a y-intercept of $0$; a 'phantom' service call to repair $x = 0$ computers would conceivably require $Y = 0$ service time.The number
MS(Resid. Err.) = 253should correspond to your value $MSE = 260.38.$ The discrepancy may well be due to roundoff error in your computations. (In a computation of this sort, do not round off anything before you get to the final answer.)The predicted value $\hat Y_{n+1} = 110.49$ is obtained by plugging $x = 4$ into the regression equation. The confidence interval requested in the last question is $(98.88, 122.10).$ You should read in your text how this is different from the prediction interval.
Here is a graph of the regression line (least squares line) drawn through a scatterplot of your data. Curved lines indicate the confidence intervals at each value of $x$; focus on where a vertical line at $x = 4$ crosses these curves, and compare with the CI in the output.
Notice that the regression line passes through $(\bar x, \bar Y) = (3.9, 108),$ the 'center of gravity' of the data cloud. Also, notice how far the point at $(5, 170)$ falls from the regression line. This was called out in the output from the regression procedure as an
Unusual Observation.Finally, there is an important distinction between correlation and regression: Correlation is $symmetrical$; the correlation between x and Y is the same as the correlation between Y and x. Regression is $not$ symmetrical. Here we are doing 'regression of Y on x' (that is seeking to predict Y-values from x-values). Regression of x on Y would give entirely different results.