I read here that for n x d data matrix X, where X is mean-centered, V = $X^{T}*X$ is its covariance matrix. Why is that?
As I understand the element $V_{i,j}$ of the covariance matrix is defined by $E[(X_i - \mu_i)(X_j-\mu_j)]$ and here, because of the mean-centering we would have $V_{i,j} = E[(X_i)(X_j)]$ - but this is not equivalent to just multiplying X with its transpose - or am I missing something?
Recall first that for a scalar zero mean variable $Y$, the variance is $$\sigma^2=E(Y^2) \tag{1}$$. And if we have a sample of $n$ data values $Y_1, Y_2 \dots Y_n$, we can estimate this expectation as a sample average: $$s=\frac{\sum_{k=1}^n Y_k^2}{n} \tag{2}$$ Here $s$ is not the true variance but an estimator (there are others). $s$ is a random variable (it will vary among experiments) while $\sigma^2$ is a constant parameter. If $n$ is large, we expect that (in some sense and under some conditions) $s\to \sigma^2$.
Now assume we have a random variable $X$ which is multivariate, $X=(X_1,X_2 \cdots X_d)$.
Then, using your notation, and given that they are zero mean, we have $V_{i,j}=E(X_i X_j)$, which is the same as $$V=E(X^t X) \tag{3}$$
Here, $V$ is here the "true" covariance ($d \times d$) matrix (analogous to $\sigma^2$), $X$ is a row ($1 \times d$) matrix, its transpose $X^t$ is a column ($d \times 1$) matrix.
Now, analogously with the scalar case, assume you have $n$ data values $X^{(1)} X^{(2)} \cdots X^{(f)}$. Here each data is itself a column of size $d$. Again, we can estimate the covariance in $(3)$ as, say:
$$ S= \frac{\sum_{k=1}^d {X^{(k)}}^t X^{(k)}}{n} \tag{4}$$
A little of reflection shows that the above can be writen as
$$ S= \frac{D^t D}{n} \tag{5}$$
where $D$ is the "data matrix" (each $X^{(k)}$ is a row of $D$) Again, $S$ is not an the "covariance matrix" but an estimator of the covariance matrix, which is sometimes called (confusingly) also "covariance matrix". Sometimes, (often) even the denominator $n$ is omitted, because it only represents a normalization that, for some applications (eg PCA) is irrelevant.