Today on another stack exchange forum, someone asked a question “Are most songs diatonic?” Naturally, this made me consider how I could design an experiment to test a claim that “most songs are diatonic” (I will pose a formal hypothesis below, but colloquially I thought this would answer the OP’s question based on context). So I’d like to share some thoughts, and then receive input on experiment design.
First, I naturally sought out what will be our definition of diatonic for the experiment? For example, would a song be considered diatonic if all of the harmonies were diatonic, but there were accidentals in the melody? What about the converse of that situation? What about modal compositions, whose harmonies could be interpreted as borrowed from another key? How do we treat the inclusion of blues notes?
The questions above led me to two natural answers: strictly speaking a composition could be considered diatonic if there are zero accidentals in the entire piece. While this definition is a natural and very formal treatment of the term diatonic, the OP’s post seemed to suggest that he was interested in whether or not the harmonies were diatonic. So for our experiment I believe it would be most appropriate to treat a song as diatonic if all of the harmonies are constructed from the 7 note western major scale, or natural minor scale. So the question we are addressing is do most songs have harmonies that consist only of notes contained within one key signature of the western major or natural minor scale? (aka diatonic harmonies) This will also make categorizing each sample as diatonic quick as we can simply consult a chord chart or lead sheet (or online database of Songs) and determine if it’s diatonic quickly.
Before presenting the hypotheses to be tested, I wanted to address from what population of music will we be sampling? The OP indicated that that he was mostly concerned with “stuff we hear in the radio, pop, classical, and traditional.” In order to define the population I considered how could we design a simple random sample where we choose songs from these pooled categories? There are likely databases from which we can pull random entries, but to make our method more effective, I wanted to narrow our population down, and thus refine our question to pop music. Thus our question is are most radio pop songs composed of harmonies that are strictly diatonic?”
With this population well defined, we can use a database website like last.fm or allmusic to pull random samples. Let’s assume I can create a script that will run and pull the appropriate amount of samples from a database such as this in a random manner (for example, using a random number generator from 1-26 to pick the first letter of the artist name, then another random number generator bounded at the length of the list of artists to choose an entry from a list of artists with that chosen starting letter, then another random number generator defined by the length of the list of songs by that artist to sample a specific song). Now I have my sample of songs.
Now, I would like to make an inference about this sample proportion. First let’s define our hypotheses and significance level, and be sure we have enough samples.
$H_0$: 1/2 of the population of radio pop music songs consist of harmonies that are entirely diatonic.
$H_1$: More than 1/2 of the population of radio pop music songs consist of harmonies that are entirely diatonic.
I’d like to set a significance level of $\alpha = 0.05$.
With my desired margin of error, I believe we calculate the required number of samples to be 385 at minimum. Thus I will sample 400 songs to make this inference.
After sampling and analyzing 400 random songs from the population under consideration I will perform a 1-PropZTest to make an inference if indeed greater than 50% of radio pop songs are composed entirely of songs that have strictly diatonic harmonies.
I would appreciate input on if this experiment design seems appropriate and if there are considerations I’ve left out. The main points of contention I received from the other message board were: the sample size is “nonsense” (I can’t expand because no more explanation was offered), and that we don’t have an agreeable definition of diatonic (which I believe I’ve considered in this post). An engineer also said I’m assuming that the distribution we draw from will be normal, however, shouldn’t the sampling distribution of sample proportions always be normal?
I look forward to input from this scientific community! Perhaps I will conduct this experiment at some point, but currently it is just an idea.
Usually, a null hypothesis is somehow distinguished among possible hypotheses -- for instance, the means of two distributions are equal (distinguished from the continuum of hypotheses that the means differ by a certain amount) or a die is fair (distinguished from the continuum of hypotheses that it has a certain bias). In your context, the proportion $\frac12$ isn't distinguished from any other proportion; there's no potential symmetry that makes it plausible that exactly half of all pop songs might be diatonic. So I think it would make more sense to symmetrically test the hypotheses against each other that the proportion is above or below $\frac12$. The required number of samples will then depend on how close the actual proportion is to $\frac12$, you can't determine it beforehand; but whatever it is, once you start, the initial samples will soon give you a good estimate how many more you need to reach your significance level.