Should I reject the null hypothesis or not?

380 Views Asked by At

EDIT: My apologies, I had a coding error. I accidentally used the same standard deviation for both samples. Now that I fixed that, both the normal and Student's confidence intervals are stupidly similar (they don't contain 0), and the pvalues are both identical (0.0358). I suppose this makes the question pointless, but I'm unsure if the mods prefer to trash it, leave it with this edit, or something else.

The correct confidence intervals are:

  • Normal: (91.9098, 2682.0902)
  • Student's: (90.7354, 2683.2646)

Original question

I'm doing a statistics exercise where I have a dataset with the health insurance costs of a certain sample, $676$ males and $662$ females. What I'm trying to determine is whether the average costs at a population level are different for males and females.

My null hypothesis is that they aren't different.

The sample difference is $1387$ (male costs are larger). As I understand it, while not huge, my sample sizes are large enough to assume that the sampling distribution is approximately normal (because the samples have both far more than $30$ independent observations), so I decided to do hypothesis testing both using the Normal distribution and Student's t-distribution.

I got the following results.

Normal distribution

  • $p$-value: $0.0505$
  • $95\%$ confidence interval: $(-3.1067, 2777.1067)$

According to the above, the probability of observing a sample difference of $1387$ is higher than $5\%$, so at a $95\%$ confidence level we can't reject the null hypothesis.

Student's t:

  • $p$-value: $0.0358$
  • $95\%$ confidence interval: $(-5.6567, 2779.6567)$

According to this one, the probability of observing a sample difference of $1387$ is lower than $5\%$, so at a $95\%$ confidence level we can reject the null hypothesis.

So they contradict each other, even though both intervals contain $0$, which means in both cases there's a chance that the two means are the same at a population level (and to be fair, in both cases it is much more likely that the male costs are higher, based on the confidence intervals.)

I'm not sure whether I should conclude that the null hypothesis can be rejected or not, because I'm not sure on what ground I could choose either methods.

As far as I've read, the criteria to safely assume normality vary a great deal, from "you can always assume it when the sample size is larger than $30$" to "z-tests are sloppy and you should NEVER use them!", so I don't really know how to proceed.

1

There are 1 best solutions below

0
On BEST ANSWER

There isn't really a perfect answer to which test you should use. The 95% confidence interval is just a convention, and while the t distribution better represents the distribution (and failing a t test is a slightly higher standard), the normal distribution is arguably more conventional, especially for a sample size of hundreds. A good case can be made for either.

As @Henry notes in the comments, there appears to be an error in your $p$-value for the t test. If your 95% confidence interval contains 0, then by definition your $p$-value should be greater than 0.05.