Perceptron Convergence: Monotonically Approach Solution?

42 Views Asked by At

I'm new to learning about perceptrons, but I saw a proof (for perceptron for binary classification with the caveat of forcing the separator through the origin) that, assuming the data is linearly separable along with two other assumptions:

  1. There exists $\theta^*$ such that $y^{i} * (\frac{\theta^* \cdot x^{i}}{||\theta^*||}) \geq \gamma \gt 0$ for all $i$, (where $\theta^*$ represents the normal vector to the linear separator, and which points toward a positive classification (as opposed to negative, for the two classes); $y^i$ the correct label, and $x^i$ the data point);
  2. $||x^i|| \leq R$, for some $R \gt 0$.

that there is an upper-bound on the number of "mistakes" (updates, or in other words changes made to the parameters defining the linear separator) made by the perceptron algorithm (that happens to be at most $(\frac{R}{\gamma})^2$).

My question is whether during the process of the perceptron algorithm (be it for perceptron forced through the origin or in general, where a constant $\theta$ term is also permitted in the equation for its linear separator) for linearly-separable data, whether it monotonically converges? In other words, for each mistake/update that the perceptron algorithm makes to the parameters describing the linear separator, does it's predicted separator "get closer" to one that successfully separates that data?

As a concrete example, if we force perceptron to draw its linear separator through the origin, and restrict our data points to be in 2D, and knew of some linear separator already (ie visually) that could separate that data, would the perceptron algorithm in this case have the angle between its predicted separator line and that of this known successful one always decrease (or at least, not increase) for each update it makes to the parameters describing its separator?