I am trying to understand Covariance intuitively. Below is the general formula
$$ \mathrm{Cov}(X,Y) = \dfrac{\sum\limits_{i=1}^N(x_i - \overline{x})(y_i - \overline{y})}{N} \tag{1} $$
There is another formula as per this paper which I am not sure if widely known.
$$ \mathrm{Cov}(X,Y) = \dfrac{\sum\limits_{i=1}^N\sum\limits_{j=i+1}^N(x_i-x_j)(y_i - y_j)}{N^2} \tag{2} $$
Situation
I am starting from (2) and trying to end up with (1). I am done with visualizing numerator of (2), but stuck with why $N^2$ in denominator.
Why I start with (2)?
{2} is easier to understand intuitively via a wonderful analogy as answered in stack exchange here.
The numerator in {2} gives the total rectangles which we could color and see the underlying measure in a visual sense. What I am failing to understand is,
Question:
Why divide by $N^2$? What is its purpose? Is it to cancel out the units of X and Y, and make the measure unitless? Or is it because it is in some way the probability function $p(x_i,x_j,y_i,y_j)$ with assumed uniform distribution $\dfrac{1}{N^2}$?
Update - My hypothesis:
But needs clarity and confirmation from experts.
Imagine the numerator of {2} as $h(x,y)$. What we are actually interested is the expected value of $h(x,y)$. In that sense,
$$ E[h(x,y)] = \sum\limits_{i=1}^N\sum\limits_{j=i+1}^N(x_i-x_j)(y_i - y_j)p(x_i,y_i) \tag{3} $$
where $p(x_i,y_i)$ is probability mass function for $h(x,y)$. Assuming equal probability for all $(x_i,y_i)$, then for $N^2$ possible pairs of $(x_i,y_i)$
$$ p(x_i,y_i) = \dfrac{1}{N^2} \tag{4} $$
This also seemingly leads to a trap. If $p(x_i,y_i) = \dfrac{1}{N^2}$, then it also means
$$ p(x_i,y_i) = \dfrac{1}{N}\dfrac{1}{N} = p(x_i)p(y_i) \tag{5} $$
leading to say, X and Y are independent???