The primal optimization problem for finding the optimal margin classifier is:
\begin{align} \arg\min_{\mathbf{w}}\frac{1}{2}\|\mathbf{w}\|^2_2 \\ \text{subject to } \quad y^{(i)}(\mathbf{w^T}\mathbf{x^{(i)}}+\mathbf{b}) \ge 1 \end{align}
The Lagrangian for this optimization problem is:
$\mathcal{L}(\mathbf{w},\mathbf{b},a) = \frac{1}{2}\|\mathbf{w}\|^2_2 - \sum_{i}\alpha _i[y^{(i)}(\mathbf{w^T}\mathbf{x^{(i)}} + \mathbf{b})-1]$. Taking $\nabla _{\mathbf{w}}\mathcal{L}(\mathbf{w},\mathbf{b},a)=0$, we have $\mathcal{w} = \sum_i \alpha_i y^{(i)}\mathcal{x^{(i)}}$.
Based on that we have: $\mathcal{w^Tw} = \big[ \sum_i \alpha_i y^{(i)}\mathcal{x^{(i)}} \big]^T \big[ \sum_i \alpha_i y^{(i)}\mathcal{x^{(i)}} \big]$.
How we come up with this form: $\mathcal{w^Tw} = \sum_{i,j} y^{(i)}y^{(j)} \alpha_{j}\alpha_{j} (\mathbf{x}^{(i)})^T {\mathbf{x}^{(j)}}$ ?. In other words which algebric rules where aplied to have this result?
Thank you!
First
$\big( \sum_j \alpha_j y^{(j)}\mathcal{x^{(j)}} \big)^T \big( \sum_i \alpha_i y^{(i)}\mathcal{x^{(i)}} \big) = \big( \sum_j (\alpha_j y^{(j)}\mathcal{x^{(j)}} )^T\big) \big( \sum_i \alpha_i y^{(i)}\mathcal{x^{(i)}} \big)$,
because $(a+b)^T = a^T+b^T$, then since $\alpha_i$ and $y_i$ are scalar, $(\alpha_j y^{(j)}\mathcal{x^{(j)}} )^T = \alpha_j y^{(j)}\mathcal{x^{(j)}}^T$ and thus
$\big( \sum_j (\alpha_j y^{(j)}\mathcal{x^{(j)}} )^T\big) \big( \sum_i \alpha_i y^{(i)}\mathcal{x^{(i)}} \big) = \sum_j \alpha_j y^{(j)}\mathcal{x^{(j)}} ^T \big( \sum_i \alpha_i y^{(i)}\mathcal{x^{(i)}} \big) = \sum_j \sum_i \alpha_j y^{(j)}\mathcal{x^{(j)}} ^T \alpha_i y^{(i)}\mathcal{x^{(i)}} = \sum_{i,j} \alpha_j y^{(j)}\mathcal{x^{(j)}} ^T \alpha_i y^{(i)}\mathcal{x^{(i)}} = \sum_{i,j} \alpha_j\alpha_i y^{(j)} y^{(i)}\mathcal{x^{(j)}} ^T \mathcal{x^{(i)}}$