The Objectivity of Statistical Testing

Question

The Objectivity of Statistical Testing

140 Views Asked by user142299 At 27 Mar 2026 - 10:33

I have a very generic question about applied statistics.

Suppose, to make things simple, we have a biased coin with probability $p$ of landing heads. We want to determine if our coin is truly fair - that is, if $p=1/2$.

We can do this by flipping the coin several times, generating a sequence such as $$0,0,1,0,0,1,1,0,1,0,1,1,1,1,0$$ for example. Now we must determine if that sequence of numbers is "random".

Typically statistical testing for randomness involves so-called "suites" or "batteries" which consist of several tests put together. For example, random.org list $15$ different tests on this page which are used to confirm the randomness of its number generator.

My first question is: how on earth do they justify the use of all these tests simultaneously? Surely the $15$ tests are all interdependently correlated in a way that is hopelessly complicated? I don't see how it would be possible to make sense out of such a vast array of results.

Secondly, and more importantly: let's say we are free to choose any statistical test we want (we can even make one up), after the sequence of coin flips has been generated. Is it always possible to cook up some unsavory mess of a function which returns $p$-values arbitrarily low? That is, can we fabricate a statistic such that its attaining the value for the given event of coin flips has (assuming randomness) probability smaller than any $\epsilon>0$ that is given?

If so, what does this say about the objectivity of statistical testing. There are many different statistics which could be measured for a sequence of coin flips. Some of these, undoubtedly, will return very unlikely results. Humans are free to choose both which tests to use and what $p$-values to reject at - does this have implications for the practice of statistics? How can we measure the "randomness" of that sequences in an objective fashion, without incorporating human bias?

EDIT: Nobody has yet touched on the question of whether, for any given sequence of $1$'s and $0$'s a statistic can be constructed such that $P(\text{stat outcome})$ is arbitrarily small. I believe this demonstrates a negative answer:

Since there are $2^n$ possible sequences of length $n$, the statistic can take on at most $2^n$ different values over the event space. Therefore, the least possible probability would be the chance of getting that one outcome alone, which is $2^{-n}$. Therefore the $p$-value cannot be made smaller than any given $\epsilon>0$ - does this look correct?

Original Q&A

There are 4 best solutions below

Bumbble Comm On 23 Apr 2014 - 7:41

I should point out that testing the hypothesis that the coin is fair is most definitely not equivalent to testing the hypothesis that the sequence of heads/tails it generates is random. The sequence can be random but biased; conversely, the coin can be fair but in a deterministic (non-random) fashion.

Bumbble Comm On 23 Apr 2014 - 8:10

To do a scientific experiment truly objectively, the ideal process is:

Describe the experiment to be performed.
Enumerate all possible outcomes of the experiment.
Declare what conclusion you would draw from each outcome.
Perform the experiment.
Publish the results and the conclusion.

People often simplify this process in ways that can introduce minor (or serious) subjectivity to the interpretation of the results. For the case of "Is this coin fair", it is simple to follow the steps exactly:

Flip the coin 100 times.
Possible results are N = 0 - 100 heads. Order is unimportant.
For
- N < 42 declare "Unfair coin, favors tails" with 95% confidence.
- N > 58 declare "Unfair coin, favors heads" with 95% confidence.
- Otherwise declare "Coin is fair to within limits of this experiment".
Perform experiment, etc.

If you wait until after you perform the experiment to decide how to interpret the results, there is a chance to introduce bias. Depending on how the experiment is designed, this can be a minor or major issue.

For example, an experiment where the technique is: "We will divide our test subjects up into 8 different groups, track 6 different variables, and report the most significant finding." In that case, you are virtually guaranteed to observe a bogus "significant" result. For example "We found that for women aged 30-35 who spent 1-2 hours on cell phones per day that their children had twice as many discipline problems (P>95%)."

As to the tests you cited on random.org, most of those are for pseudo-random sources where results of successive attempts are not independent. They test whether successive results are sufficiently evenly-distributed. Those tests are not meaningful to coin-tossing, where successive results are independent.

Bumbble Comm On 13 May 2014 - 6:41

Normally, the objectivity in statistical testing comes from deciding on a test statistic prior to collecting the data. The process you are describing in which you perform post hoc analyses searching for a small p-value has received a lot of attention in the literature. Uri Simonsohn and his colleagues have popularized the phrase P-hacking to describe the frowned-upon practices of "data-dredging, snooping, fishing, significance-chasing and double-dipping." Here's a link to a recent article in Nature discussing this form of statistical malpractice.

**Bumbble Comm** · Accepted Answer

A $p$-value is actually defined (in Statistical Inference by Casella and Berger, for instance) to be a statistic, not just a number, and it is called valid if it satisfies $$ P_\theta[p({\bf X}) \leq \alpha] \leq \alpha $$ for $\theta \in \Theta_0$. So the probability of seeing small values is genuinely low under the null hypothesis. This makes sense. If you constrain yourself to valid $p$-values then, you can't simply choose any $p$-value you like.

Also keep in mind that there is the notion of the information contained in a sample, and the degree to which a statistic preserves it.

The Objectivity of Statistical Testing

There are 4 best solutions below

Related Questions in PROBABILITY

Related Questions in STATISTICS

Related Questions in RANDOM

Related Questions in STATISTICAL-INFERENCE

Trending Questions

Popular # Hahtags

Popular Questions