Calculating sample mean and sample variance for X,Y

403 Views Asked by At

The determination of the shear strength of spot welds is difficult, whereas measuring the weld diameter of spot welds is relatively simple. As a result, it would be advantageous if shear strength could be predicted from a measurement of weld diameter. We assume that $$S=\alpha+\beta W+e,$$ where $S$ is the shear strength, $W$ is the weld diameter, and $e$ is a random error, assumed to be normally distributed with zero mean. The data are as follows: \begin{matrix} \textbf{Weld Diameter (0.0001 in)} & \textbf{Shear Strength (psi)}\\ \hline 400 & 380 \\ 800 & 790 \\ 1250 & 1220 \\ 1600 & 1550 \\ 2000 & 1970 \\ 2500 & 2440 \\ 3100 & 3060 \\ 3600 & 3540 \\ 4000 & 3920 \\ 4000 & 3930 \end{matrix}

(a) Estimate the regression coefficients $\alpha$ and $\beta$.

I have found these using Matlab's regress() function. $$ \alpha \approx -9.8350, \beta \approx 0.9849 $$

(b) Find the 90% confidence interval of the expected shear strength when the weld diameter is 2500.

This is my first time finding confidence intervals for this kind of data.

Following @ bruceET answer I have come to the following: Matlab code:

x=[400,800,1250,1600,2000,2500,3100,3600,4000,4000];
y=[380,790,1220,1550,1970,2440,3060,3540,3920,3930]; 
sumofx = sum(x); %the sum of x
sumofxpower2 = (400^2)+(800^2)+(1250^2)+(1600^2)+(2000^2)+(2500^2)+(3100^2)+(3600^2)+(4000^2)+(4000^2);
meanofx = mean(x);
n = length(x); % n is = 10, which is also the lenght of x. 
stdofd = (sumofxpower2-((sumofx)^2)/n)/(n-1);
sbar0 = -9.835+(0.9849*(2500))

lower = sbar0 - 1.860*(sqrt((1/n)+((2500-meanofx)^2)/((n-1)*stdofd)))
upper = sbar0 + 1.860*(sqrt((1/n)+((2500-meanofx)^2)/((n-1)*stdofd)))

For the lower limit of 90% CI i get: 2451,8 for the upper limit I get: 2453

The result is supposed to be: 2445.51, 2459.19, so I am making a mistake somewhere

1

There are 1 best solutions below

2
On BEST ANSWER

I have typed your data into Minitab 17 statistical software. The first output is a graph of the regression surrounded by curves that give the CI for Strength at each Diameter value. For your data the points lie almost exactly on the regression line, so the 'confidence bands' are almost indistinguishable from the line itself.

enter image description here

The value reported in the legend as $S = 11.52$ is often denoted in textbooks on regression as $S_\epsilon$ or $S_{s|d}.$ It is neither $S_d$ the SD of the Diameters nor $S_s$ the SD of the Strengths; it is the SD of the residuals about the regression line, an estimate of $SD(e),$ where $e$ is the error term in your model.

The formula for a 90% CI of the Strength at Diameter $d_0 = 2500$ is as follows:

$$\hat s_0 \pm t^* S_{s|d}\sqrt{\frac{1}{n} + \frac{(d_0 - \bar d)^2}{(n-1)S_d^2}},$$

where $\hat s_0 = -9.835 + 0.9849(2500),\,$ $t^*$ cuts 5% of the probability from the upper tail of Student's t distribution with $n - 2$ degrees of freedom, $d_0 = 2500,\,$ $\bar d$ is the sample mean of the $n$ diameters, and $S_d^2$ is the sample variance of the diameters.

Here is a printout of some computations from Minitab.

Regression Analysis: s versus d 

Analysis of Variance

Source         DF    Adj SS    Adj MS    F-Value  P-Value
Regression      1  15215338  15215338  114652.94    0.000
  d             1  15215338  15215338  114652.94    0.000
Error           8      1062       133
  Lack-of-Fit   7      1012       145       2.89    0.425
  Pure Error    1        50        50
Total           9  15216400


Model Summary

      S    R-sq  R-sq(adj)  R-sq(pred)
11.5199  99.99%     99.99%      99.99%


Coefficients

Term         Coef  SE Coef  T-Value  P-Value   VIF
Constant    -9.84     7.68    -1.28    0.236
d         0.98488  0.00291   338.60    0.000  1.00

Regression Equation

s = -9.84 + 0.98488 d

.

Prediction for s 

Regression Equation

s = -9.84 + 0.98488 d


Variable  Setting
d            2500


    Fit   SE Fit        90% CI              90% PI
2452.35  3.67830  (2445.51, 2459.19)  (2429.87, 2474.84)

Notes: (1) You should look in your textbook for the relevant equations; my notation is almost surely a little different from what you will find there. In particular, your text probably uses $y$ and $x$ where I have used $s$ and $d.$ Also, maybe you should check my notation for typos.

(2) The 90% CI near the end of the printout is what you want. [The PI would be an interval for the predicted value of strength for a new observation (not used to get the regression line) at $d_0 = 2500.$]