I want to compare the performance of two different stochastic methods on a problem. I have the results of method A on the problem on 50 independent different runs.
However for method B I only possess the mean and std of 50 different runs. I want to perform t-test and Willcoxon rank sum test on these methods.
would the result of these tests, based on mean and std, be reliable and correct?
Also, how these tests can be performed in matlab?
2026-04-13 05:51:19.1776059479
Can we perform t-test, Willcoxon rank sum test based on mean, and std
69 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail At
1
t test: Yes. Sample sizes, means and standard deviations are sufficient for doing a t test. You already have $n_2 = 50, \bar X_2,$ and $S_2$ for the second sample. Because you have the data for the first sample you have $n_1 = 50$ and you can compute $\bar X_1$ and $S_1.$
Then the pooled t statistic is
$$T = \frac{\bar X_1 - \bar X_2}{S_p\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}},$$ where $S_p^2 = \frac{(n_1 - 1)S_1^2 + (n_2-1)S_2^2}{n_1 + n_2 - 2},$ and under the null hypothesis $H_0: \mu_1 = \mu_2,$ we have $T \sim \mathsf{T}(\text{df}=n_1+n_2 - 2)$ [Student's t distribution with $n_1 + n_2 - 2$ degrees of freedom]. So, for $n_1 = n_2 - 2,$ you would reject $H_0: \mu_1 = \mu_2$ against the alternative $H_a: \mu_1 \ne \mu_2$ at the 5% level of significance if $|T| \ge 1.984.$ You can get the two-sided critical value $c = 1.984$ from a printed table of Student's t distribution or use software (value from R below):
This test assumes that the two populations have (nearly) the same population variances: $\sigma_1^2 \approx \sigma_2^2.$ If you have reason to believe that is not true, you can use the 'Welch separate-variances' t test. For equal sample sizes $n_1 = n_2$ the $T$-statistic is the same as for the pooled test just described, but the degrees of freedom will be smaller than 98 (roughly to the degree that sample variances are not equal). You can get the formula for the degrees of freedom on Wilipedia or in almost any elementary of intermediate level statistics text. [With $n_1 = n_2 = 50,$ it seems likely that the degrees of freedom would be greater than 30, in which case you could use the approximate critical value $c = 2.0$ for a test at the 5% level.]
Wilcoxon rank sum test: No. There is no way to do a two-sample Wilcoxon test without having access to the data for the second group. [Unless the data for the first sample are extremely skewed or have many far outliers, the t test should be OK. (It is typical for normal sample of size 50 to show a couple of moderate outliers.) Of course, it would be nice to be able to look at the data for the second sample, but unless you have reason to suspect otherwise, it seems safe to assume it shares near-normality with sample 1.]
Example: I don't know whether Matlab does t tests, but the computations here are not beyond what you can do on a simple calculator. In R, here is how the pooled and Welch t tests would look for the two fake datasets I have simulated below.
There are slight differences between the two normal samples, but too small to be detected either by the pooled or Welch test (both P-values exceed 0.05; the T statistics are well below 2 in absolute value).