Please check my solution/understanding of deriving the bias term in the context of Support Vector Machines.
Information given,
(1.) $t_i(\mathbf{w}^T\mathbf{x}_i + b)=1$
(2.) $\mathbf{w} = \sum_{i=1}^{N} a_i t_i \mathbf{x}_i$
(3.) Kernel Function: $k(\boldsymbol{x}_i, \boldsymbol{x}_j) = \boldsymbol{x}_i^{T}\boldsymbol{x}_j$
(4.) $t_i^2=1$
My solution:
We know $t_i^2=1$ so multiply both sides of (1.) by $t_i$ to get,
$$t_i^2(\mathbf{w}^T\mathbf{x}_i + b)=t_i$$
$$\mathbf{w}^T\mathbf{x}_i + b = t_i$$
Isolating for $b$ we get,
$$b_i = t_i -\boldsymbol{w}^T\boldsymbol{x}_i$$
Note above this accounts for a single support vector $i \in S$ where $S$ is the set of all support vectors. Now we can use (2.) and substitute this in our expression for $b_i$ where $i$ indexes the support vector and $j$ indexes over all of the training samples,
$$b_i = t_i - (\sum_{j=1}^{N} a_j t_j \boldsymbol{x}_j^T)\boldsymbol{x}_i$$
Using the kernel function defined in (3.) and the fact that the dot product is commutative we have,
$$b_i = t_i - \sum_{j=1}^{N} a_jt_j k(\boldsymbol{x}_i, \boldsymbol{x}_j)$$
The above is the bias for one support vector $i \in S$. We can take $b$ as the average bias across all support vectors. Therefore, let $N_S$ represent the total number of support vectors, the bias is as follows,
$$b = \frac{1}{N_S}\sum_{i \in S}(t_i - \sum_{j=1}^{N} a_jt_j k(\boldsymbol{x}_i, \boldsymbol{x}_j))$$