I'm trying to do picture recognition. There are 3 types of methods from OpenCV library. Eigenfaces, Fisherfaces and Local Binary Pattern Histogram. These are good, but in practice, Fisherfaces is the best becase Eigenfaces is good if you have a noisy picture and Local Binary Pattern Histogram cannot handle noice, but it can handle linear illumination differences.
And Ficherfaces can handle both.
To do Fisherfaces we need to have a collection $X$ where $X$ contains vector $c$ classes of pictures.
This is our collection of pictures in form of vectors: $$X = X_1, X_2, X_3, ... , X_c$$
This is our picture in form of a vector: $$X_i = x_1, x_2, x_3, ... , x_n$$
Then we need to create scatter matrices.
Here $N_i$ is the index of the vector $x_n$ $$S_B = \sum_{i=1}^c N_i(\mu_i - \mu)(\mu_i - \mu)^T$$
$$S_W = \sum_{i=1}^c\sum_{x_j \in X_i}(x_j - \mu_i)(x_j - \mu_i)^T$$
We can find the means from: $$\mu = \frac{1}{N} \sum_{i=1}^N x_i$$ And this is just the mean of a single class $X_i$ $$\mu_i = \frac{1}{|N_i|}\sum_{x_j \in X_i} X_j$$
Now to my question: How can I solve this opitmization problem?
Wet want to maximize this matrix $W_{opt}$. This is the criteria.
$$W_{opt} = arg max_w \frac{|W^TS_BW|}{|W^TS_WW|}$$
We can solve this optimization problem by solving the generalized eigenvalue problem: $det(A - \lambda I) = 0$:
$$S_W^{-1}S_B V_i = \lambda_i V_i$$
Or if $S_W$ is singular, we can add a small constant $\epsilon$ so $S_W$ can be invertable. That is recommended. $$ (S_W +\epsilon I)^{-1}S_B V_i = \lambda_i V_i$$
Or we can skip the $\epsilon$ and find $S_W^{-1}$ by using House Holder QR-factorization or Singular Value Decomposition. It can handle singular matrices.
The optimization problem can be described as:
$$W_{PCA} = argmax_W |W^TS_TW|$$ $$W_{FLD} = argmax_W \frac{W^TW^T_{PCA}S_BW_{PCA}W}{W^TW^T_{PCA}S_WW_{PCA}W}$$ $$W = W^T_{FLD} W^T_{PCA}$$
Copy from the lecture notes here and here and here. This is the best lecture note.
Questions:
How can I use generalized eigenvalue problem to solve this optimization problem?
Should I first set $W = I$ and then compute $W_{PCA} = argmax_W |W^TS_TW|$ ? Then the following equation $W_{FLD} = argmax_W \frac{W^TW^T_{PCA}S_BW_{PCA}W}{W^TW^T_{PCA}S_WW_{PCA}W}$ and last this equation: $W = W^T_{FLD} W^T_{PCA}$
Then compare it with $$W_{opt} = arg max_w \frac{|W^TS_BW|}{|W^TS_WW|}$$
And check how large $W_{opt}$ is? The largest $W_{opt}$ is the solution?
Edit:
Or does it mean like $W = V_1, V_2, V_2, ... , V_i$ is the collection of eigen-vectors?
All I need to do is to:
- Find your $\mu_i$ and $\mu$ and also $S_w$ and $S_b$
- Solve the generalized eigenvalue problem
$$S_W^{-1}S_B V_i = \lambda_i V_i$$
- Check how large $W_{opt}$? If not largest? Try next sample.
$$W_{opt} = arg max_w \frac{|W^TS_BW|}{|W^TS_WW|}$$