Is a spurious correlation more likely when there is a tighter range of data?

29 Views Asked by At

Suppose you had two people each producing $1$ rating for a set of $n$ movies (suppose $n$ is small, $\approx 10$). Suppose the ground truth is that their ratings are randomly sampled from a discrete uniform distribution. Are their ratings more likely to be spuriously correlated if the range of possible ratings was just $\{1,2\}$ (e.g., $1$ is bad, $2$ is good) compared to the range being $\{1,2,3,4,5\}$ ($1$ is very bad, $5$ is very good)?

I ran simulations, but I would like to approach the answer mathematically.

Let $X_i$ denote movie rating $i$ by rater $1$, sampled $iid$ where $X_i \sim \text{DiscreteUnif}(1,c)$, where $c = 2$ or $5$. Similarly, let $Y_i$ denote $iid$ movie ratings $i$ by rater $2$, sampled from the same distribution as $X_i$. Our random variable of interest is the following:

$\Large R_c = \frac{\sum_{i=1}^n(X_i - E(X_i))(Y_i - E(Y_i))}{\sqrt{\sum_i(X_i - E(X_i))^2}\sqrt{\sum_i(Y_i - E(Y_i))^2}}$

Is the $P(R_{c=2} > a) > P(R_{c=5} > a)$, for any $a \in (0,1)$? Or are those probabilities equal? How do I begin formulating a solution to this problem?