I've been looking around, but I am having some difficulty figuring out why I am seeing multiple formulas for solving for the covariance.
From three different sources, the covariance formulas have appeared to be the following:
WolframMathworlds Definition: $$\text{Cov}(X,Y) =\sum_{i=1}^N \frac{(x_i-\bar x)(y_i - \bar y)}{N} \tag{1} $$
"Regression Analysis by Example": $$\text{Cov}(Y,X) =\frac{\sum_{i=1}^N (y_i - \bar y)(x_i-\bar x)}{N-1} \tag{2}$$
"Probability And Statistics: For Engineers and Scientists"
This formula appears to be used more for Discrete Joint Probability Distributions $$\text{Cov}(X,Y) =\sum_x \sum_y (x-\mu_x)(y-\mu_y) f(x,y) \tag{3}$$
A common theme between the three formulas is that all three of them involve some division (this applies to Formula $(3)$ as well since $0 \le f(x,y) \le 1$)
However, it seems that all three formulas will end up with different values for $\text{Cov}(X,Y)$, which in turn should give different results should you solve for the correlation coefficient.
I'm aware that the $N-1$ has to do with bias / unbiasness, but I thought that if that were the case, WolframMathWorld would've made some sort of point about this.
What I am asking is, are these formulas (mainly formulas $1$ and $2$) close enough that they can be used interchangeably? Or should a certain formula be used for certain situations? Is one formula better than another? Also could the covariance represent the same thing in probability as well as regression analysis?