My question is: How to derive $ b^* $ (optimal $ b $) when all $ \alpha_{i} $ is $ 0 $ or $ C $?
For a given SVM primal problem:
$$ \text{minimize } \frac{1}{2}w^{T}w + C\sum_{i=1}^{l}\xi_{i}$$ $$ \text{subject to } y_{i}(w^{T}\phi(x_{i}) + b) \ge 1 - \xi_{i} $$ $$ \xi_{i} \ge 0 $$
I already know how to derive the dual problem:
$$ \text{min } \frac{1}{2}\alpha^{T}Q\alpha - \mathbf{1}^{T}\alpha$$ $$ \text{subject to } 0 \le \alpha_{i} \le C$$ $$ \mathbf{y}^{T}\alpha = 0 $$ $$ \text{where } Q_{ij} = y_{i} y_{j} \phi(x_{i})^{T} \phi(x_{j}) $$ And I know by KKT condition, we have: $$ \alpha_{i}(1 - \xi_{i} - y_{i}(w^{T}\phi(x_i) + b)) = 0 $$ $$ \beta_{i}(-\xi_i) = 0 $$ $$ \nabla_{w} L(w, b, \xi, \alpha, \beta) = 0 \Rightarrow w = \sum_{i=1}^{l} \alpha_{i} y_{i} \phi(x_{i}) $$ $$ \nabla_{b} L(w, b, \xi, \alpha, \beta) = 0 \Rightarrow \sum_{i=1}^{l}\alpha_{i} y_{i} = 0 $$ $$ \nabla_{\xi} L(w, b, \xi, \alpha, \beta) = 0 \Rightarrow C = \alpha_{i} + \beta_{i}$$
I just saw this paper's 13th and 14th page, but it said just sample an $ 0 \lt \alpha_{i} \lt C $ so we can find $ b $ by $$ \alpha_{i}(1 - \xi_{i} - y_{i}(w^{T}\phi(x_i) + b)) = 0 $$ $$ \beta_{i}(-\xi_i) = 0 $$ because $ \xi_i = 0 $ and $ 1 - \xi_{i} - y_{i}(w^{T}\phi(x_i) + b) = 0 $
But, what if $\alpha_{i} $ is $ 0 $ or $ C $?
Note: I took the notation in this slide
In order to determine the bias $b$ you use
$$Y(\boldsymbol{x}) = \boldsymbol{w}^T\boldsymbol{\phi}(\boldsymbol{x})+b$$
for the support vectors for which $0< \alpha_i<C$ and $\xi_i=0$, because they lie on the boundary of the margin.
Then you will have to solve
$$y_i\left[\sum_{m\in S}\alpha_my_m\boldsymbol{\phi}^T(\boldsymbol{x}_i)\boldsymbol{\phi}(\boldsymbol{x}_m)+b\right]=1,$$
in which $S$ is the set of all indices of the support vectors for the bias $b$.
In Pattern Recognition and Machine Learning by Bishop, the author also argues that it is better to average this equation over the support vectors.