Linear Discriminant Analysis: Meaning of Negative Eigenvalues?

363 Views Asked by At

In Linear Discriminant Analysis (LDA) we compute two matrices from the data: the between scatter matrix $\boldsymbol{S}_b$ and within scatter matrix $\boldsymbol{S}_w$. A direction $\boldsymbol{w}$ is then considered more discriminative if $(\boldsymbol{w}^T \boldsymbol{S}_b \boldsymbol{w})/(\boldsymbol{w}^T \boldsymbol{S}_w \boldsymbol{w})$ is larger. Therefore, the most $k$ discriminative directions correspond to the top $k$ eigenvectors of the matrix $\boldsymbol{S}_b \boldsymbol{S}_w^{-1}$.

While the two scatter matrices $\boldsymbol{S}_b$ and $\boldsymbol{S}_w$ are positive semi-definite, the matrix $\boldsymbol{S}_b \boldsymbol{S}_w^{-1}$ does not have to be. So I wonder what is the meaning/intuition of negative eigenvalues in this case?

This question arose when I wanted to choose the number $k$ based on what percentage of discriminative information is preserved by the top $k$ eigenvectors. For example, in case of PCA, I could say I want to pick the top $k$ eigenvectors that maintain $90\%$ of the variance, i.e. $(\sum_{j=1}^k \lambda_j)/(\sum_{j=1}^n \lambda_j) \approx 0.9$ ($n$ being the total number of eigenvalues). However, if negative eigenvalues can exist in LDA, how should I modify the PCA idea to achieve similar goal? Should I use $(\sum_{j=1}^k \lambda_j)/(\sum_{j=1}^n \lambda_j) \approx 0.9$, or $(\sum_{j=1}^k \lambda_j)/(\sum_{j=1}^{n^+} \lambda_j) \approx 0.9$ ($n^+$ meaning sum over nonnegative eigenvalues only), or $(\sum_{j=1}^k \lambda_j)/(\sum_{j=1}^{n} |\lambda_j|) \approx 0.9$, or none of these?

Thanks

Golabi