Check My Solution -> Problem: Deriving Bias term in the context of Support Vector Machines given the Weight Vector.

19 Views Asked by At

Please check my solution/understanding of deriving the bias term in the context of Support Vector Machines.

Information given,

(1.) $t_i(\mathbf{w}^T\mathbf{x}_i + b)=1$

(2.) $\mathbf{w} = \sum_{i=1}^{N} a_i t_i \mathbf{x}_i$

(3.) Kernel Function: $k(\boldsymbol{x}_i, \boldsymbol{x}_j) = \boldsymbol{x}_i^{T}\boldsymbol{x}_j$

(4.) $t_i^2=1$

My solution:

We know $t_i^2=1$ so multiply both sides of (1.) by $t_i$ to get,

$$t_i^2(\mathbf{w}^T\mathbf{x}_i + b)=t_i$$

$$\mathbf{w}^T\mathbf{x}_i + b = t_i$$

Isolating for $b$ we get,

$$b_i = t_i -\boldsymbol{w}^T\boldsymbol{x}_i$$

Note above this accounts for a single support vector $i \in S$ where $S$ is the set of all support vectors. Now we can use (2.) and substitute this in our expression for $b_i$ where $i$ indexes the support vector and $j$ indexes over all of the training samples,

$$b_i = t_i - (\sum_{j=1}^{N} a_j t_j \boldsymbol{x}_j^T)\boldsymbol{x}_i$$

Using the kernel function defined in (3.) and the fact that the dot product is commutative we have,

$$b_i = t_i - \sum_{j=1}^{N} a_jt_j k(\boldsymbol{x}_i, \boldsymbol{x}_j)$$

The above is the bias for one support vector $i \in S$. We can take $b$ as the average bias across all support vectors. Therefore, let $N_S$ represent the total number of support vectors, the bias is as follows,

$$b = \frac{1}{N_S}\sum_{i \in S}(t_i - \sum_{j=1}^{N} a_jt_j k(\boldsymbol{x}_i, \boldsymbol{x}_j))$$