Followed by this book I faced with lack of understanding of4.1.4 section. An author obtained $$\mathbf w\propto \mathbf S_W^{-1}(\mathbf m_2 - \mathbf m_1)$$ and suggested to find a threshold $y_0$ for a model $y(\mathbf x) = \mathbf w^\top \mathbf x$ by modelling class-conditional densities $p(y\mid \mathcal C_k)$ and then using the result of section 1.5.1 (which I do not fully understand).
My idea is the following: $p(y\mid \mathcal C_k) = \mathcal N(y\mid \hat \mu _k, \hat \beta _k^{-1})$, where I have used the ML estimation within the $k$-th class. Then $p(\mathcal C_k \mid y) \propto p(C_k)p(y\mid \mathcal C_k)$, where $p(\mathcal C_k)$ is a ratio between number of points from the class $\mathcal C_k$ to the total number of points. I will assign a new vector $\mathbf x$ to the class for which the quantity $p(\mathcal C_k)p(\mathbf w^\top \mathbf x\mid \mathcal C_k)$ is the largest. However, I am still confused with how to choose $\mathbf w$. Please, explain what does the author want to tell?
$w$ is given by your first equation, which is Fischer's linear discriminant. It is the vector ("direction") in space which gives the largest separation between your two classes when you project onto the line through $w$, assuming your two classes are Gaussian distributed. That's why you compute $w^T x$, this is how far along the line $x$ is in direction $w$.