Linear discriminant analysis - Properties of weight vectors and decision surfaces

Question

Linear discriminant analysis - Properties of weight vectors and decision surfaces

1.4k Views Asked by Bumbble Comm At 05 Apr 2026 - 9:46

I'm learning about "linear discriminant analysis" on "Statistical Pattern Recognition" of A.R. Webb and K.D. Copsey (chapter 5 of 3rd edition).

The general idea is introduced where we suppose to have a set of training patterns (vectors) $x_1, ..., x_n$, each of which is assigned to one of two classes, $\omega_1$ or $\omega_2$.
We seek a weight vector $w$ and a threshold $w_0$ such that:

$w^Tx + w_0 > 0 \Rightarrow x \in \omega_1$
$w^Tx + w_0 < 0 \Rightarrow x \in \omega_2$

The decision surface (the boundary separating region of $\omega_1$ from region of $\omega_2$) is the hyperplane represented by the equation

$g(x) = w^Tx + w_0 = 0$

So far the introduction is clear.

Next, the authors go on saying that this hyperplane

has unit normal in the direction of $w$, and a perpendicular distance $|w_0|/|w|$ from the origin.
The distance of a pattern $x$ to the decision hyperplane is given by $|r|$, where
$r = g(x)/|w| = (w^Tx+w_0)/|w|$
with the sign of $r$ indicating on which side of the decision hyperplane the pattern lies.

No explanation is given on these results, so I'm wondering why the hyperplane normal is parallel to $w$, why the distance from the origin is that one and why the distance of a pattern from the hyperplane is that one.

Could you please give me some insights on how to get to these results myself?

Thanks,
Domenico

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

Do you know scalar product? The scalar products of vectors $a=(a_1,\dots,a_n)^T$ and $b=(b_1,\dots,b_n)^T$ is defined by $$\langle a,b\rangle:=a_1b_1+\dots+a_nb_n$$ i.e., $\langle a,b\rangle=a^Tb$ expressed by matrix multiplication.
A basic property of scalar product is that $\ \langle a,b\rangle=0 \iff a\perp b$.

Let $H:=\{x\mid g(x)=0\}$. If $h,k\in H$ then we have $$w^Th+w_0=w^Tk+w_0=0 \ \implies\ w^T(h-k)=0$$ so that $w\perp(h-k)$, proving that $w\perp H$.

Now measure the distance of $H$ from the origin: a perpendicular line is $\{\lambda w\mid\lambda\in\Bbb R\}$, let's find its intersection with $H$: $\lambda w\in H$ iff $$\begin{aligned} w^T(\lambda w)+w_0 &=0 \\ \lambda|w|^2 &=-w_0\\ \lambda &=-\frac{w_0}{|w|^2} \end{aligned}$$ where we used $w^Tw=\langle w,w\rangle=|w|^2$. So that, with this $\lambda$, we have $\lambda w\in H$ whose length is $|\lambda|\cdot|w|=\displaystyle\frac{|w_0|}{|w|}$.

If we fix any point $h_0$ in $H$ (say, $h_0:=\lambda w$ with the previously found $\lambda$), then $w^Th_0=-w_0$ and the function $g(x)=w^Tx+w_0$ can be rewritten as $$g(x)=w^T(x-h_0)$$ which is exactly the scalar product $\langle w,\ (x-h_0)\rangle$. If $x-h_0$ is is composed of orthogonal and parallel parts to $w$, $\ x-h_0=u+\lambda w$ with $u\perp w$, then $|\lambda w|=|\lambda|\cdot|w|$ will measure the disatance of $x$ from $H$. Multiply by $w^T$ from the left to find $\lambda$: $$g(x)=w^T(x-h_0)=w^Tu+\lambda w^Tw=\lambda|w|^2$$ so $\lambda=\displaystyle\frac{g(x)}{|w|^2}$ and the result follows.

Linear discriminant analysis - Properties of weight vectors and decision surfaces

There are 1 best solutions below

Related Questions in LINEAR-ALGEBRA

Related Questions in MACHINE-LEARNING

Trending Questions

Popular # Hahtags

Popular Questions