Reason behind objective function in Linear Discriminant Analysis

19 Views Asked by At

I don't really understand the objective function to be optimized in Linear Discriminant Analysis (LDA). My question is centered around the same concepts mentioned in this this other one.

The analysis focuses on two concepts also used in ANOVA (Analysis of Variance):

  • the within-class scatter $s_w$
  • the between-class scatter $s_b$
  • the total scatter $s$

These quantities are real numbers but it is quite common to define three $d \times d $ matrices $S_w$, $S_b$, and $S$ such that for any vector $x \in \mathbb{R}^d$ the numbers $x^TS_wx$, $x^TS_bx$, and $x^TSx$ correspond, respectively, to the within-class scatter, the between-class scatter, and the total scatter of the original data projected onto the one-dimensional subspace generated by $x$.

Each matrix represents a bilinear form w.r.t. the canonical basis. If $W$ is an orthonormal matrix, then $W^TS_bW$ represents the same bilinear form as $S_b$ but with respect to a rotated orthonormal basis and similarly for $S_w$ and $S$.

The objective of LDA is maximise the function

$$J(W) = \frac{det(W^TS_bW)}{det(W^TS_wW)}$$

What I don't understand is that $det(W^TS_bW)$ is the product of the eigenvalues of $W^TS_bW$ but the scatter in the rotated basis is instead the sum of the eigenvalues and similarly for $S_b$. Moreover $det(W^TS_wW)$ may be $0$.

References: