In Blum, Hopcroft and Kannan's "Foundations of Data Science" they discuss the Perceptron Algorithm (p. 111). In the proof it says:
We will keep track of two quantities, wTw* and |w|2. Each update increases wTw* by at least 1.
(w + xili)Tw* = wTw* + xiTliw* ≥ wTw* + 1
The implication seems that both wTw* and |w|2 are scalar values. I interpret |w|2 as the dot product of w with itself, as in this answer. However wTw* (and similar uses of transpose throughout the proof) does not make sense to me, as I don't see how it can produce a scalar value.
My reasoning is as follows:
The vector of weights w is a row vector.
The vector of weights w*, which assumes the "if" condition of the theorem, is also a row vector.
A transposed row vector, multiplied by a row vector, is a column vector multiplied by a row vector, which produces a matrix the size of the column multiplied by the length of the row.
For example, let:
$$\mathbf{w} = \begin{bmatrix} 1 & 2 & 3 \end{bmatrix}$$ $$\mathbf{w}^* = \begin{bmatrix} 2 & 4 & 6 \end{bmatrix}$$
Then:
$$\mathbf{w}^T\mathbf{w}^* = \begin{bmatrix} 1\\2\\3 \end{bmatrix} \begin{bmatrix} 2 & 4 & 6 \end{bmatrix} = \begin{bmatrix} 2 & 4 & 6\\4 & 8 & 12\\6 & 12 & 18 \end{bmatrix}$$
My assumption is that "≥ wTw* + 1" means "equal to or larger than the sum of two scalars, wTw* + 1". Hence wTw* should be a scalar.
What am I missing?
Thanks in advance.