I have seen in websites that given two R.V. $X,Y$, if $$ \cos(\theta)=\frac{X\cdot Y}{\|X\|_2\|Y\|_2} $$ and $$ \rho=\frac{\text{Cov}(X,Y)}{\sqrt{\text{Var}(X)\text{Var}(Y)}} $$then
$$ \cos(\theta)=\rho $$
This identity implies $\text{Cov}(X,Y)=X\cdot Y$. Isn't $X\cdot Y$ the Maximum Likelihood Estimate for the covariance missing some factors? If true then the equation above is not equality but rather $≈$ as the samples become bigger.
Next is the denominator which implies that $$ \text{Var}(X)= \| X\|_2 ^2$$. Again, isn't the right side not an identity but rather an estimator (MLE) to the variance of $X$? Isn't $$ \rho ≈ \cos(\theta)$$
I have also seen the dot product (without the denominator in the first equation I've given but being more general using inner products) being used to measure correlation in some papers like Least Angle Regression. I am confused about the relationship between dot products and correlation. This leads me to a general question:
Is $$ \langle X,X\rangle = \text{Var}(X) $$ in Euclidean space.
Hint:
$X \cdot Y$ is a random variable.
$\text{Cov}(X,Y)$ is an expected value.
There is some confusion of terminolgy in your post.
If $\bf X , \bf Y$ are two random vectors (in $m$-space) then their dot product is a random variable $$ \begin{array}{l} {\bf X},{\bf Y} \in R^m \\ q = {\bf X} \cdot {\bf Y} = \left\| {\bf X} \right\|\,\left\| {\bf Y} \right\|\;\cos \alpha \quad \left| {\;q \in R} \right. \\ \end{array} $$ which is actually the product of three random variables, among which $\cos \alpha$.
In this case the covariance is defined as a matrix of expected values, which is not what you are considering.
If instead $\bf X , \bf Y$ are two vectors corresponding to the joint sampling (of size $m$) of two random variables $X,Y$ and we are to estimate the correlation between them (which seems what you mean to do) then