I've got two methods to calculate a certain value for a physical problem. (The stress shielding in a bone). I've got n (=20) data points to test the two methods. The first method tells me that the average value of the n data points is x1% (=49.8%) and the standard deviation is y1% (=7.33%). The second method tells me that the average value of the n data sets is x2% (=44.2%) and the standard deviation is y2% (=6%). So there's a difference of abs(x1-x2)% (=5.6%) in the averages between the two methods.
How can I tell if this difference is statistically significant, assuming that the error between the two methods is normally distributed?
I've got the feeling that this is a problem that should already be explained somewhere else, but I can't find the right search words to find it.
EDIT: The full data set can be found here: Data set
EDIT 2: changed data sets to data points for clarity
It is clear that the data are paired, since there are two types of measurements performed on each subject (in this case, a femur). The samples are not independent; thus a paired $t$-test is appropriate. The hypothesis to be tested is $$H_0 : \mu_1 - \mu_2 = 0, \quad H_a : \mu_1 - \mu_2 \ne 0,$$ where $\mu_1$ and $\mu_2$ are the means of the respective measurement type. Then the test statistic is given by $$T = \frac{\bar x_1 - \bar x_2 - 0}{s/\sqrt{n}} \sim t_{n-1},$$ where $n = 20$ is the number of paired observations in the sample, $\bar x_1$, $\bar x_2$ are the sample means of the respective measurement types, and $s$ is the sample standard deviation of the paired differences.
So, you would calculate for each femur the difference in measurements, then calculate the standard deviation of these. Then you calculate the difference of means (or the mean of differences, it's the same). You should get $$s = 0.0166702, \quad \bar x_1 = 0.4975, \quad \bar x_2 = 0.4415, \quad T = 15.0232.$$ The corresponding $p$-value for the two-sided test is $$p = 5.356 \times 10^{-12},$$ which furnishes evidence to reject $H_0$: the probability of obtaining a result as extreme as that observed if the true difference is $0$, is at most $p$, which is so small that we can be highly confident that the true mean difference is not $0$.