I'm currently studying machine learning using the book Pattern Recognition and Machine Learning (Bishop, 2006) and faced some confusion regarding the vector/matrix representation of $K$ linear discriminant functions. More specifically, this is from Chapter 4.1.3: Least Squares for Classification.
The specific portion of the book that I'm referring to states that:
Each class $C_k$ is described by its own linear model so that:
$$y_k(\mathbf{x}) = \mathbf{w}_k^T \mathbf{x} + w_{k0}$$
where $k = 1, \dots , K$. We can conveniently group these together using vector notation so that
$$\mathbf{y}(\mathbf{x}) = \tilde{\mathbf{W}}^T\tilde{\mathbf{x}}$$
where $\tilde{\mathbf{W}}$ is a matrix whos $k$th column comprises the $D + 1$-dimensional vector $\tilde{\mathbf{w}}_k = (w_{k0}, \mathbf{w}_k^T)^T$ and $\tilde{\mathbf{x}}$ is the corresponding augmented input vector $(1, \mathbf{x}^T)^T$ with a dummy input $x_0 = 1$.
I'm mainly having trouble understanding exactly how to understand the representations of $\tilde{\mathbf{w}}_k$ and $\tilde{\mathbf{x}}$. My interpretation of the above equation is that we're taking the operation:
$$ \begin{bmatrix} \mathbf{w}_k^T & w_{k0} \end{bmatrix} \begin{bmatrix} \mathbf{x} \\ 1 \end{bmatrix} = \mathbf{w}_k^T\mathbf{x} + w_{k0} = y_k(\mathbf{x}) $$
a total of $K$ times, and therefore we can conveniently represent these linear models as a compact vector.
$\tilde{\mathbf{w}}_k$ I can somewhat infer that since $\tilde{\mathbf{w}}_k \in \Bbb{R}^{D + 1}$ its transpose (since we use $\tilde{\mathbf{W}}^T$ in the actual equation) would be in $\Bbb{R}^{1 \times (D + 1)}$, and multiplying this by the $D + 1$-dimensional vector $\tilde{\mathbf{x}}$ gives us a scalar value (i.e. the linear model equation).
The two questions that I have following these thoughts would be:
- How should I interpret the representation of $(w_{k0}, \mathbf{w}_k^T)^T$? Is it supposed to be:
$$\tilde{\mathbf{w}}_k = \begin{bmatrix}w_{k0} & \mathbf{w}_k^T\end{bmatrix}$$
since
$$(w_{k0}, \mathbf{w}_k^T) = \begin{bmatrix}w_{k0} \\ \mathbf{w}_k^T\end{bmatrix}$$
- Why does $\mathbf{x}$ have a transpose? My interpretation was that we're multiplying $\mathbf{w}_k^T$ and $\mathbf{x}$ and $\mathbf{x}^T$?
$(w_{k0}, w_{k}^T)$ is the horizontal concatenation of $w_{k0}$ and $w_k^T$ is the horizontal concatenation between $w_{k0}$ and $w_k^T$. Transposing it would make it a column vector. That is $\tilde{w}_k$ is the $k$-th column of the matrix $W$ and $\tilde{w}_k$ would be the $k$-th row of the matrix $\tilde{W}^T$.
We have
$$\begin{bmatrix} w_{k0} & w_k^T\end{bmatrix} \begin{bmatrix} 1 \\ x\end{bmatrix}=y_k(x)$$
Same reasoning for the second question, $(1, x^T)$ is the horizontal concatenation between $1$ and $x^T$ resulting in a row vector, after which, we further tranpose it $(1,x^T)^T$ to make it a column vector.