While studying Hard-SVM topic in Shalev-Shwartz book I came across the following proof for the distance between point and hyperplane
$$\min\{\|\pmb x-\pmb v\|: \langle\pmb w,\pmb v\rangle + b = 0\}\\ \text{Taking }\ \pmb v = \pmb x\ - (\langle \pmb w, \pmb x \rangle +\ b)\pmb w\ \text{ we have that}\\ \langle\pmb w,\pmb v\rangle+\ b = \langle\pmb w,\pmb x\rangle-\ (\langle \pmb w, \pmb x \rangle\ +\ b)\|\pmb w\|^2\ +\ b = 0,\\ \text{and}\\ \|\pmb x-\pmb v\|=|\langle \pmb w,\pmb x \rangle+\ b|\|\pmb w\| = |\langle \pmb w,\pmb x\rangle\ +\ b|$$
Above is a proof for the distance between point $\pmb x$ and the hyperplane defined by $(\pmb w, b)$ where $\|\pmb w\|=1$ which is $|\langle \pmb w, \pmb x \rangle+b|$
I can derive the same proof by taking a point on the plane say $\pmb y$ and then taking a orthogonal projection of $\pmb x - \pmb y$ on the normal vector of the plane, but not able to understand the proof provided in the book. I would greatly appreciate if anyone can explain the above proof.
PS: I understand the first line in the proof points towards finding a point $\pmb v$ on the plane such that the distance between $\pmb x \ \text{and }\ \pmb v$ is minimized.
Thanks
The proof you provided is not complete. It's only the first part of it.
The distance between a point $\textbf{x}$ and a hyperplane $H$ defined by $(\textbf{w},b)$ is defined by:
$$ d(\textbf{x},H) = \underset{\textbf{v} \in H}{\text{min }} \|\textbf{x}-\textbf{v}\|\ $$
That is, one is trying to find the point $\textbf{v}$ in the hyperplane that minimises the distance to the point $\textbf{x}$. The proof is done by taking any point $\textbf{u} \in H$ and showing that:
$$ \| \textbf{x}-\textbf{u} \| \geq \|\textbf{x}-\textbf{v}\| = |\langle \textbf{w}, \textbf{x} \rangle +b | $$ where $\textbf{v} = \textbf{x} - (\langle \textbf{x}, \textbf{w} \rangle +b) \textbf{w}$.
That is, the point $\textbf{v}$ that we constructed is the one that minimises the distance to the point $x$ and hence $\|\textbf{x} - \textbf{v}\|$ is the distance between the hyperplane and $x$.
Here I used the same notation in the book and skipped the calculus since it's provided in there. The construction of $v$ is based on addition of vectors. You can think of $\textbf{x}$ as a vector from the origin to the point $x$. Similarly, $\langle (\textbf{w}, \textbf{x} \rangle+b) \textbf{w} $ is the vector from the origin to the orthogonal projection on the plane. Hence, the distance we're looking for, i.e., the distance between $x$ and its orthogonal projection, is just the difference between these two vectors.