The proof of convergence of a perceptron learning model (which is a simple machine learning algorithm) relies on linear separability of the underlying data. To fix ideas, for a set of data points $x_{(i)} \in R^n$, and classes $y_i \in \{-1,1\}$, there must $\exists w \in R^n, ||w||=1$ and $d \in R$, such that $$ y_i\sum\limits_{k=1}^n(w_k {x_{(i)}}_k+d)>\gamma>0 \quad \forall i $$
Suppose that the set of points $x_i$ is linearly separable, so that the above relation holds. I have two questions (please excuse the form, this is my first post here):
1) The above expression is equivalent to the following, where $\alpha = 1$ $$ y_i\sum\limits_{k=1}^n\left(w_k sign({x_{(i)}}_k) \ abs({x_{(i)}}_k)^\alpha+d\right)>\gamma>0 \quad \forall i $$
Does a similar expression hold if $\alpha \neq 1$? If not, are there any expressions which generalize the separability condition to an exponent $\alpha$ in the coordinates $x_{(i)}$?
2) I stumbled upon the above expression while trying to generalize the perceptron algorithm with a nonlinear weight correction. A simple version of the algorithm is as follows: $$ k \leftarrow1; w_n\leftarrow 0; d\leftarrow 0\\ \text{While} \ \exists \ x_{(i)} \ \text{such that}\ \ y_i\sum\limits_{k=1}^n(w_k {x_{(i)}}_k+d)< 0 \quad \\ \text{select one such}\ i \\ w_{n+1}\leftarrow w_n + y_i x_i\, ; \ d \leftarrow y_i\\ n \leftarrow n+1 $$ Is there a suitable nonlinear generalization of $w$ upgrading step to a nonlinear upgrade depending on an exponent $\alpha$, e.g. $(w_{n+1})_k = (w_{n})_k + y_i sign({x_{(i)}}) \ abs({x_{(i)}})^\alpha $, with nice convergence properties?