In many sources on data analysis, the author(s) talk about calculating covariance of the data, and the formula is given as such
$$ \Sigma = cov(X) = E[(X-E[X])(X-E[X])^T]$$
This formulation is given without any reference to any distribution. And as example, the author can go right to a programming environment, say R, and for famous data set Iris, and do

I am curious if the disconnect between specific distribution and data is an omission, or if covariance calculation on data, any data, can be performed for any distribution. As it happens, multivariate normal is
$$ f(x;\mu,\Sigma) = \frac{ 1}{(2\pi)^{k/2} \det(\Sigma)^{1/2}} \exp \bigg\{ -\frac{ 1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu) \bigg\} $$
and I wonder if $\Sigma$ of the distribution is related to covariance calculated from data. For example maximum likelihood estimator for $\Sigma$ is
$$ \frac{1}{n} \sum_{j=1}^{n} (x_j-\bar{x})(x_j-\bar{x})^T $$
This statement is almost similar to the one using $E[\cdot]$, except we have summing and dividing by $n$ which can be seen as the calculation of expectation?
I have also seen expectation and variance (covariance in multidimensional case) being talked about / derived utilizing method of moments, which makes no assumption on the underlying distribution. I guess I am trying to understand if all of the "summary statistics" have any theoretical basis (i.e. tied to a probability distribution)
Any clarification or pointers are welcome.
The short answer is no, the covariance is a general property, just like the expected value, variance etc. The formula for the expected value is not dervied from any distribution, but is more of a definition, like the moments and central moments.
I don't know if a the sample covariance matrix is an unbiased estimator for a given sample size, but I do know that it is a consistent estimator, and hence will approach the true covariance matrix as you collect more data.