P-value, T-test, R.

299 Views Asked by At

How does language R determine p-value? Suppose that we have a piece of code:

x <- c (1,2,3,4)
t.test (x)

And result:

t = 3.873, df = 3, p-value = 0.03047
  1. What is the way R determine p-value?
  2. What does it mean in this case?
1

There are 1 best solutions below

1
On BEST ANSWER

The Comment by @GregoryGrant is correct as far as it goes, and the advice to look up full details of the one-sample t test is appropriate. Whenever you start using unfamiliar software it is always a good idea to compare its output for a simple example, as you have done, with hand computations or with output for the same example from other software.

Here are some specific observations that may have given rise to your question---perhaps ones you won't find in Wikipedia or in a textbook that's not specifically keyed to the use of R (or similar software).

First, in order to test a hypothesis, you need to know what it is. By default (that is, unless otherwise specified) R uses the null hypothesis $H_0: \mu = 0$ against the two-sided alternative $H_a: \mu \ne 0.$

So the specific formula for the t statistic is $$T = \frac{\bar X - \mu_0}{s/\sqrt{n}} = \frac{2.5 - 0}{1.29/\sqrt{4}} = 3.873,$$ to three places.

Second, the P-value is the probability---assuming $H_0$ to be true---of a more extreme result than you have obtained. Here "more extreme result" means $T$ farther from 0 in either direction. The relevant distribution is Student's t distribution with DF = n - 1 = 4 - 1 = 3.

The P-value could also have been obtained as '2*(1 - pt(3.873, 3))' which returns 0.03046595. Rounded, this is the P-value in your output. Probabilities of T below -3.873 and above +3.873 are combined to give the probability 0.03047. This is because the default two-sided test is being used.

P-values are largely 'creatures' of the computer age. With ordinary printed t tables, you would not be able to find the P-value with any degree of precision. With the one I have at hand, I could see that 3.873 is between entries 3.182 and 4.541, and thus conclude that the P-value is somewhere between 2% and 5%.

Because the P-value is less than 5% = 0.05, you would reject the null hypothesis at the 5% level of significance. [Of course, that conclusion is based in part on the (unlikely) assumption that random sampling from a normal distribution resulted in data 1, 2, 3, 4.]

Finally, if you look at the documentation for 't.test' in R, you will see how to specify a hypothetical value different from $\mu_0 = 0$ and how to specify left or right-sided alternatives. (You will also see how to include a second data vector to do two-sample t tests.)