How can I use Lagrangian Multipliers to maximize a General Rayleigh Quotient for Linear Discriminant Analysis

79 Views Asked by At

I'm reading up on Linear Discriminant Analysis and have hit the point where one wants to maximize:

$$ \frac{v^{T}S_{b}v}{v^{T}S_{w}v} $$

with respect to $v$, where $S_b$ is the between class variance matrix, $S_w$ is the within class variance matrix and $v$ is an arbitrary unit vector. I want to find the value of $v$ that maximizes the above ratio. I know there are other ways to do this, but I want to do this using Lagrangian multipliers.

So, I set

$$ L = \frac{v^{T}S_{b}v}{v^{T}S_{w}v} - \lambda (v^{T}v - 1) $$

Now, concentrating on the derivative with respect to $v$, I get

$$ \frac{[2S_{b}v (v^{T}S_{w}v) - (v^{T}S_{b}v) (2S_{w}v)]}{(v^{T}S_{w}v)^{2}} - 2\lambda v = 0 $$

At this point, I can factor out $2v$, giving me

$$ \left(\frac{S_{b} (v^{T}S_{w}v) - (v^{T}S_{b}v) S_{w}}{(v^{T}S_{w}v)^{2}} - \lambda I \right) v = 0 $$

But, now I'm stuck. How do I get to

$$ S_{w}^{-1}S_{b}v = \lambda v ??? $$

PS - I'm now also confused by the fact that when I factored out $v$, I ended up subtracting a scalar (i.e. $\lambda$) from a matrix ??? (Note: This is no longer issue. A comment showed me my oversight there)

1

There are 1 best solutions below

0
On BEST ANSWER

First massage the nominator of your formula by expanding a few identity matrices as $I = S_wS_w^{-1}$ to make the term $S_w^{-1}S_bv$ appear: $$ \Big(S_b(v^TS_wv) - S_w(v^TS_bv)\Big)v = S_w\Big(S_w^{-1}S_b(v^TS_wv)-I(v^TS_wS_w^{-1}S_bv)\Big)v = S_w\Big((S_w^{-1}S_bv)(v^TS_wv) - v\big(v^TS_w(S_w^{-1}S_bv)\big)\Big) $$ And then notice that it is equal to zero if $S_w^{-1}S_bv = \mu \, v$ for some scalar $\mu$. I deliberately did not use $\lambda$ here, as others pointed out in the comment, the constraint and its Lagrange multiplier is "inoperative", hence $\lambda = 0$. The solution to your problem is the eigenvector of the matrix $S_w^{-1}S_b$ which corresponds to the largest eigenvalue $\mu$. The maximum value is also $\mu$, which you can see by using $I = S_wS_w^{-1}$ and $$ \frac{v^TS_bv}{v^TS_wv} = \frac{v^TS_w(S_w^{-1}S_bv)}{v^TS_wv} = \mu . $$