Suppose I have a data set of $m$ vectors in $\mathbb{R}^d$, $D = \{x_1,\ldots,x_m\}$.
Let $S = \sum_{i=1}^{m}x_ix_i^T$ be the scatter matrix.
My question is: thus the eigenvectors of $S$ form a base to $\mathbb{R}^d$?
I know that if I have $d$ eigenvectors of $S$ with distinct eigenvalues, than those vectors are orthogonal to each other because $S$ is symmetric. Then, I know orthogonal vectors are independent of each other. Thus, if $S$ has $d$ eigenvectors with distinct eigenvalues, then those vectors are $d$ (= dimension of the space) independent vectors and thus form a base.
But, I don't sure that the eigenvectors of $S$ necessarily form a base because maybe there are eigenvectors with same eigenvalues and thus not necessarily orthogonal to each other. Also, I don't sure if $S$ necessarily has $d$ eigenvectors, maybe can be less.
Can $S$ has non distinct eigenvalues?
Can $S$ has less than $d$ eigenvectors?
Elaboration on when the eigenvectors of $S$ form a base (maybe always or maybe under some conditions) will be helpful.
Note that $S = \sum_{i=1}^m x_i x_i^T$ is symmetric. A fundamental property of symmetric matrices is the property of (orthogonal) diagonalizability which means that the eigenvectors of $S$ span $\mathbb{R}^d$. For reference see:
why symmetric matrices are diagonalizable?
I assume that $m < d$, so eigenvalue $0$ must have a multiplicity of at least $d - m$. This means that $S$ can have non-distinct eigenvalues (not only $\lambda = 0$ can be degenerate).
You are right that eigenvectors with the same eigenvalues are not necessarily orthogonal to each other but the eigenspaces belonging to certain eigenvalues all have a orthogonal basis that can be acquired, for instance, through the Gram-Schmidt-process (meaning that the linearly independent eigenvectors belonging to the same eigenvalue can be chosen orthogonal).
So, to summarize, $S$ is diagonalizable, therefore it has $d$ linearly independent eigenvectors which do not necessarily have to belong to different eigenspaces (there can be non-distinct eigenvalues).
$S$ cannot have less than $d$ eigenvectors.