Recently I've been studying machine learning. I want to find out the growth function of the following hypothesis set:
$$ \mathcal{H}=\left\{h_\mathbf{w}\equiv\mbox{sign}\big(\langle\mathbf{w}, \cdot\rangle\big)\,\colon \mathbf{w}\in\mathbb{R}^2\right\}\,. $$
I've read section 2.3 of this lecture, from Learning Theory course of University of Washington. In the paper (page 3) it claims that the growth function of $\mathcal{H}$ is
$$ g_\mathcal{H}(m)=\binom{m}{0} + \binom{m}{1} + \binom{m}{2} = \frac{m^2+m+2}{2}\,, $$
which is the maximum possible number of sections that $m$ lines can divide $\mathbb{R}^2$ into.
The reason is that each input vector $\mathbf{x}$ can be seen as a line that divides $\mathbb{R}^2$ into two sections. Any $\mathbf{w}_1$ from the same side of the line as $\mathbf{x}$ will assign $\mathbf{x}$ positive. Any $\mathbf{w}_2$ from the opposite side of the line as $\mathbf{x}$ will assign $\mathbf{x}$ negative. If there are $m$ input vectors $\mathbf{x}_1,\ldots,\mathbf{x}_m$ that divide $\mathbb{R}^2$ into multiple sections, any $\mathbf{w}$ from the same section will assign $\mathbf{x}_1,\ldots,\mathbf{x}_m$ the same way.
I understand the reasoning of the paper, but I have a different answer. I think the growth function is
$$ g_\mathcal{H}(m)=2m\,. $$
This is because all the $m$ lines should pass the origin. Therefore, these lines can divide $\mathbb{R}^2$ into $2m$ sections at most, instead of $(m^2+m+2)/2$ sections.
I don't see anything wrong in my answer. Which growth function is correct?
I agree with you.
Let $(x_1,\cdots, x_m) \in (\mathbb{R}^2\setminus \{0\})^m$. Consider the (vector) lines $\{d_1,\cdots, d_m\}$ such that for all $i$, $d_i$ is the orthogonal subspace to $x_i$.
Then $\mathbb{R}^2 \setminus \bigcup_i d_i$ has (usually, and at most) $2m$ connected components which are all convex cones, and on which, for all $w$, $h_w$ is constant. Moreover, for all $w$, the values that $h_w$ takes on these components are all distinct. Therefore, for all $w$, $h_w$ takes (usually and at most) $2m$ values, and therefore, the growth function of $\mathcal{H}$ is $m \mapsto 2m$.
I think that the author of the document got confused between linear things and affine things, because the argument looks valid if we consider $\mathcal{H}_{aff} := \{\langle w,\cdot\rangle + b \ \vert w\in \mathbb{R}^2,\ b \in \mathbb{R}\}$. Moreover, the document does not discuss the growth function of this set of affine classifiers. The relevant idea being, both for the linear and the affine case, "count how many sectors can be obtained if you cut through $m$ hyperplanes", maybe the author was thinking one day about linear hyperplanes, the other day about affine hyperplanes, and wrote down the correct proof for the wrong statement.