Vector/matrix representation of linear discriminant functions

73 Views Asked by At

I'm currently studying machine learning using the book Pattern Recognition and Machine Learning (Bishop, 2006) and faced some confusion regarding the vector/matrix representation of $K$ linear discriminant functions. More specifically, this is from Chapter 4.1.3: Least Squares for Classification.

The specific portion of the book that I'm referring to states that:

Each class $C_k$ is described by its own linear model so that:

$$y_k(\mathbf{x}) = \mathbf{w}_k^T \mathbf{x} + w_{k0}$$

where $k = 1, \dots , K$. We can conveniently group these together using vector notation so that

$$\mathbf{y}(\mathbf{x}) = \tilde{\mathbf{W}}^T\tilde{\mathbf{x}}$$

where $\tilde{\mathbf{W}}$ is a matrix whos $k$th column comprises the $D + 1$-dimensional vector $\tilde{\mathbf{w}}_k = (w_{k0}, \mathbf{w}_k^T)^T$ and $\tilde{\mathbf{x}}$ is the corresponding augmented input vector $(1, \mathbf{x}^T)^T$ with a dummy input $x_0 = 1$.

I'm mainly having trouble understanding exactly how to understand the representations of $\tilde{\mathbf{w}}_k$ and $\tilde{\mathbf{x}}$. My interpretation of the above equation is that we're taking the operation:

$$ \begin{bmatrix} \mathbf{w}_k^T & w_{k0} \end{bmatrix} \begin{bmatrix} \mathbf{x} \\ 1 \end{bmatrix} = \mathbf{w}_k^T\mathbf{x} + w_{k0} = y_k(\mathbf{x}) $$

a total of $K$ times, and therefore we can conveniently represent these linear models as a compact vector.

$\tilde{\mathbf{w}}_k$ I can somewhat infer that since $\tilde{\mathbf{w}}_k \in \Bbb{R}^{D + 1}$ its transpose (since we use $\tilde{\mathbf{W}}^T$ in the actual equation) would be in $\Bbb{R}^{1 \times (D + 1)}$, and multiplying this by the $D + 1$-dimensional vector $\tilde{\mathbf{x}}$ gives us a scalar value (i.e. the linear model equation).

The two questions that I have following these thoughts would be:

  1. How should I interpret the representation of $(w_{k0}, \mathbf{w}_k^T)^T$? Is it supposed to be:

$$\tilde{\mathbf{w}}_k = \begin{bmatrix}w_{k0} & \mathbf{w}_k^T\end{bmatrix}$$

since

$$(w_{k0}, \mathbf{w}_k^T) = \begin{bmatrix}w_{k0} \\ \mathbf{w}_k^T\end{bmatrix}$$

  1. Why does $\mathbf{x}$ have a transpose? My interpretation was that we're multiplying $\mathbf{w}_k^T$ and $\mathbf{x}$ and $\mathbf{x}^T$?
1

There are 1 best solutions below

3
On BEST ANSWER

$(w_{k0}, w_{k}^T)$ is the horizontal concatenation of $w_{k0}$ and $w_k^T$ is the horizontal concatenation between $w_{k0}$ and $w_k^T$. Transposing it would make it a column vector. That is $\tilde{w}_k$ is the $k$-th column of the matrix $W$ and $\tilde{w}_k$ would be the $k$-th row of the matrix $\tilde{W}^T$.

We have

$$\begin{bmatrix} w_{k0} & w_k^T\end{bmatrix} \begin{bmatrix} 1 \\ x\end{bmatrix}=y_k(x)$$

Same reasoning for the second question, $(1, x^T)$ is the horizontal concatenation between $1$ and $x^T$ resulting in a row vector, after which, we further tranpose it $(1,x^T)^T$ to make it a column vector.