Problem: $10-10$ nails are manufactured on two machines. The average sizes (cm) and corrected empirical standard deviations are: $$\overline{x} = 0.625, \overline{y} = 0.471, s^*_x = 0.754, s^*_y = 1.269.$$Compare the variances of the production of the two machines with F-test, and investigate the null-hypothesis that there is no difference in the sizes of the production of the two machines! Use $\alpha = 0.10$ for the level of significance! Be careful, which kind of t-test you use! How would you test the same hypothesis if $100-100$ nails are manufactured on two machines with the same empirical data?
My solution: For both the small and large sample cases shouldn't I use F-test? For the small case we fail to reject the null hypothesis while for the large sample case we reject the null hypothesis because the test statistic $\frac{(s^*_y)^2}{(s^*_x)^2} = 2.82$ and $F_{0.05}(9,9)=3.18$ and $F_{0.05}(99,99)=1.39$. Is that correct? I don't understand why the author warns us about which kind of t-test we use. Also, both F and t-tests require normalcy of variables, should I just assume that?
Results of F-tests. Abridged from Minitab 17
$n=10:$
$n = 100$
The first test (with sample sizes 10) does not reject the hypothesis of equal population means (P-value .45 > .05); the second test (sample sizes 100) does reject (P-value 0.01 < .05). So sample size clearly matters. These conclusions match yours.
Power curves. Also from Minitab, here are power curves for $n = 10, 100$ against the alternative that the ratio is $\sigma_1^2/\sigma_2^2 = 0.6$ at level $\alpha = 0.1.$ (Power is probability of rejecting $H_0$ when the alternative is true.)
The tests and power curves above address the difficulty of detecting differences between population variances when sample sizes are small.
Statistical practice. You also ask about the reason to 'be careful which kind of t test to use'. The pooled t test assumes equal variances and the Welch (separate variances) test does not make this assumption. Based on reliable information from many simulation studies, there is now general agreement among practicing statisticians that the Welch test should always be used unless there is solid prior evidence that population variances are truly the same. (Perhaps that information would be available if the factory has a lot of past data on the behavior of these two machines making various kinds of nails.)
Pooled t tests based on data from populations with different variances often give incorrect results because the T statistic does not have the distribution $\mathsf{T}(df = n_1+n_2-2).$ By contrast, a combination of theory and simulation shows that the Welch test with (often somewhat reduced degrees of freedom) does give reliable results.
Especially deprecated nowadays, is the suggestion in some older elementary texts to do a preliminary F-test for equal variances, then branch to pooled or Welch test according to the result of the F-test. The F-test has such poor power (especially for smaller sample sizes) that it very often directs one to the wrong t test. Absent prior information that population variances are equal: skip the F-test and use the Welch t test.
Finally, you should not do t tests or F tests unless you have information that the data are normal. In a textbook exercise, the context may be that everything is assumed normal, unless the contrary is explicitly stated. However in statistical practice, one uses various tests and graphical procedures to see if data are reasonably close to normal before doing F or t tests. (Appropriate alternative tests are available when data are not normal.)