SVM algorithm for machine learning. Algebra in which it was constructed and description?

131 Views Asked by At

As I understood this is a simple full description of SVM algorithm:

There are set of elements (mathematically points). These elements describe as ordered pairs of Cartesian product of two sets X and Y. The approach is to draw a line in the "plane" X x Y such that:

1.1. The points from different classes are on opposite sides of a given line

1.2. The parameters are chosen so direct that maximized minimum distance points to the line

If the line fails to separate original points in original coordinated.

Then let's made:

2.1 Bijective change of coordinates of points using a suitable nonlinear transformation in trying to find the line in the new coordinates

2.2 If we find such tranformation, then when inverse transformation to the original coordinates we can transform out line

2.3 If we still have problem to find such line, we can modify optimization criteria to allow perform some mistakes and weight this mistakes


QUESTIONS:

I have a bunch of question. I'll be glad in answer in any of them:

q1. Why guys from machine learning (some of them) use so complicated terminology like in Russian wiki about SVM?

I think that they are familiar with such concepts as cartesian product of two arbitary sets, and they familiar with 'line' concept? I really don't understand the reason of extra sophistication explanation.

q2. I used term 'line'. I do not understand why they use term 'hyperplane'. This word is 10/4 longer. And such term frightens for me.

q3. Why to guys from machine learning create extra definition of 'kernel trick'. They called step 2.1 with this name. In counting methods 'nonlinear change of coordinates' is called as 'nonlinear change of coordinates'. Why to introduce new term, which beside give wrong hint about convolution kernel?

q4 So I don't see any information in SVM from english wiki about algebra in which it was defined. It is step number "0" in any mathematical sub field to define objects with which you "natively" working.

Why guys from machine learning pass this step?

1

There are 1 best solutions below

10
On

Q2. The term hyperplane is preferred in this sort of classification problem because one frequently wants to work with high dimensional data. When working in a $2$-dimensional space, "dividing the space in two" means finding a line ($1$-dimensional affine subspace) that separates your two groups of points. When working in a $d$ dimensional space, we want to look for a $(d-1)$ dimensional affine subspace, i.e. a plane or hyperplane for $d = 3$ or $d \geq 4$. A hyperplane is the most general way to refer to it in arbitrary $d$.

Q1, Q3. Historically, machine learning has developed from computer science (artificial intelligence, neural networks, etc.) It's only within the last 20 or so years that it has become heavily intertwined with statistics. Hence, a lot of the terminology of ML is different even from other fields of statistics. It's less "introducing unnecessary new terms" and more that terms in ML developed on their own, and by the time the field reached the level of mathematical/statistical sophistication it has today people in the field were uninterested in changing terminology to match that of mathematicians.

Q4. Once again, I think it helps to remember that this is a field that was started by engineers and computer scientists. Historically, they're working with different ideas of mathematical rigor - the first instinct of an engineer or a statistician is not to consider the algebra in which he or she is working.

Generally, I think it would be a mistake to consider ML a mathematical sub-field. There may be mathematical ways of formulating and understanding algorithsm from ML, but the people working on it are statisticians and computer scientists. Thus, many of the assumptions and defaults from mathematics don't carry over.