Discriminant Analysis: the polynomial kernel of Support Vector Machine

81 Views Asked by At

The following 【Quiz】 and 【Official Answer】are the rough translation (with minor modification) of Quiz No.03-1 of the exam of the "2018's semi-first grade of Japan Statistical Society Certificate (JSSC)" You can find the original of this quiz from this URL(But written in Japanese).


【My Question】
How can we derive Equation 2 from Equation 1 ? What is the relationship between $(1 + \textbf{x}^{T} \textbf{x}')^{4}$ and $(x-3)(x-1)(x+1)(x+3)$.

As a characteristic of data on Fig.1, I recognize the following.

  • All Data near x = -4 are positive cases
  • All Data near x = -2 are negative case
  • All Data near x = 0 are positive cases
  • All Data near x = 2 are negative case

Therefore, I understand that, if there is a function such that

  • positive near x = -4,
  • negative near x = -2 ,
  • positive near x = 0, and
  • negative near x = 2 ,

it can be correctly identified. And I can understand that the following Equation 2 satisfies the above requirement.

However, I cannot imagine the relationship between $(1 + \textbf{x}^{T} \textbf{x}')^{4}$ and $(x-3)(x-1)(x+1)(x+3)$.
I feel, a formula like $ (1+(x,y)\binom{x'}{y'})^4$ will never be a formula like $(x-a)(x-b)(x-c)(x-d)$


【Quiz】
Two types of data, positive-case (+1) and negative-case (-1), are shown in Fig.1.
Fig.1
Fig.1

In order to determine whether each data in FIG.1 are a positive case or a negative case, Support Vector Machines using the polynomial kernel of Equation 1 is performed.

Equation 1: $\ \ k(\textbf{x}, \textbf{x}') = (1 + \textbf{x}^{T} \textbf{x}')^{p} $

Here, $\textbf{x},\textbf{x}' \in {R}^2$ , ${\textbf{x}}^{T} $ is the transpose vector of ${\textbf{x}} $ , and p shall be a positive integer.

At this time, what is the smallest p that can correctly distinguish all data? Explain why.


【Official Answer】
By using a fourth-order polynomial kernel, the following function of Equation 2 is configured.

Equation 2: $\ \ f(x) = sign\{ (x-3)(x-1)(x+1)(x+3)\} $

This function completely discriminates the data. On the other hand, in the polynomial kernel of the third-order or less, since the change of the sign are three times or less, the data in FIG.1 cannot be discriminated.

P.S.
I'm not very good at English, so I'm sorry if I have some impolite or unclear expressions.