In this paper, "SphereFace: Deep hypersphere embedding for face recognition", at section 3.2 "Introducing Angular Margin to Softmax Loss":
It mentioned that the decision boundaries will produce an angular margin of $\frac{m-1}{m+1} \theta^1_2$ where $\theta_2^1$ is the angle between $W_1$ and $W_2$.
My question is, how to understand this decision boundary? What are the steps in order to get this? The paper didn't show how they get the angular margin.
Here is the background about the question:
They proposed a loss which minimize angular distance between $x$ and $W$ instead of usual softmax loss function (I have skipped some details):
$L = \sum - log(\frac{e^{f_{y_i}}}{\sum_j e^{f_j}})$ where $f_j$ is the feature vector for class $j$ and $f_{y_i}$ is the feature of target class.
As $f_j$ is computed from $W_j$ $x$ (skipped bias for simplicity), and it actually equals to $||W_j||$ $||x|| cos (\theta_j)$ where $\theta_j$ is the angle between $W_j$ and $x$.
If we normalize the weight $W$ to have length of 1, $||W||=1$, then $f_j = ||x|| cos (\theta_j)$.
Finally they modified the loss function to $L = \sum - log(\frac{e^{||x||cos(\theta_{y_i})}}{\sum_j e^{||x||cos(\theta_{j})}})$
When this loss is completely minimized (all training data are correctly classified), the decision boundary for class 1 is $cos(m\theta_1)=cos(\theta_2)$, andthe decision boundary will produce an angular margin of $\frac{m-1}{m+1} \theta^1_2$. Although they mentioned about the keyword "angular bisector", but I still cannot understand it.