How to interpret maximizing "separability and reciprocal of scattering" in Fisher's LDA?
That is, if $s_1$ is minimized scattering inside (projected) class 1 and $s_2$ is the same for class 2.
Then in so called "Joint criterion" one wants to calculate:
$$\max_w I(w)=\frac{(m_1-m_2)^2}{s_1^2+s_2^2}$$
where $m_i$s are the corresponding means of projected classes.
Particularly,
The reciprocal of minimum of scattering is "deviation"?
But if divides max of projected means with this "deviation", then what does it mean?
That one wants to make the means as separated as possible? And the scattering as small as possible?
You can rewrite it in the more general form as $$ \max_{v}g(v) =\max_{v}\frac{v'S_bv}{v'S_wv}. $$ Intuitively, finding the best projection is the same as maximizing signal to noise ratio or as the MLE maximizes the $\mathrm{MSB}/\mathrm{MSE}$ in ANOVA with Guassian variables/noise. Namely, $v$ maximizes the (squared) distance between $v'\mu_1$ and $v'\mu_2$ which is the so-called "between class" distance, scaled by the "within class" variance. The scaling part is what brings the $S_w^{-1}$, i.e., instead of projecting the data onto eigenvectors of $S_{b}$, you project it onto the eigenvectors of $S_{w}^{-1}S_b$, i.e., onto a space (axis) that maximized the $\mathrm{MSB}/\mathrm{MSE}$ and not just the distance between $m_1$ and $m_2$.