I am trying to solve Excercise 4.4 of the book Pattern Recognition and Machine Learning by Bishop . Implicitly we need to show that if we :
Maximize : $$w^{T}(m_{2}-m_{1}) \tag{1}\label{1} $$
Subject to : $$w^{T}w= \tag{2}\label{2} 1$$
Then the optimal value for $w$ we obtain is of the form $$w \propto (m_{2}-m_{1})$$
My Approach :
I started with lagrangian which is :
\begin{align} L=w^{T}(m_{2}-m_{1}) + \lambda(w^{T}w-1) \tag{3}\label{3} \end{align}
Where $\lambda$ is the Lagrangian Multiplier.
From this , I get :
\begin{align}
\nabla_{w}L =(m_{2}-m_{1}) + 2\lambda w =0 \\
=> w = - \frac{m_{2}-m_{1}}{2 \lambda} \tag{4}\label{4}
\end{align}
In the official solution here is where the author concludes the solution and writes ,"it follows that $w \propto (m_{2}-m_{1})$ . If i however try to see what lambda looks like , i substitute the value of $w$ in the constraint equation $Eq.2$ to get:
\begin{align}
\frac{(m_{2}-m_{1})^{T} (m_{2}-m_{1})}{4 \lambda^{2}} = 1\\
=>\lambda = \frac{|m_{2}-m_{1}|}{2} \tag{5}\label{5}
\end{align}
Now if i substitue for $\lambda$ to in $Eq. 4$ i get
\begin{align}
w = -\frac{m_{2}-m_{1}}{|m_{2}-m_{1}|} \\
=> \frac{w}{m_{2}-m_{1}} = -\frac{1}{|m_{2}-m_{1}|}
\end{align}
which does not establish direct proportionality between $w$ and $m_{2}-m_{1}$ as $-\frac{1}{|m_{2}-m_{1}|}$ is not constant. Edit2 : The reason why $|m_{2}-m_{1}|$ is not constant is because $m_{1}$(and for that matter $m_{2}$) would change as new points are classified to the classes $C_{1}$ or $C_{2}$
So where my interpretation or mathematics is going wrong ?