Let $\Sigma$ be the covariance matrix. The first principal axis of PCA, $u_1$, can be found by solving the optimization problem: $$\max_{u_{1}}=u^T_1\Sigma u_1$$ subject to the constraint $u^T_1u_1=1$. By taking the derivative of the Lagrangian function with respect to $u_{1}$, we end up with $\Sigma u_1=\lambda_1 u_1$. From here it is easy to conclude that the best eigenvector is the one associated with the largest eigenvalue of the covariance matrix.
The optimization problem related to the j-th principal axis contains an additional family of constraints, $u^T_ju_i=0, i=1,\dots,j-1$, that ensure the orthogonality with respect to the previous vectors. Here, if we derive the Lagrangian function we get: $$\mathcal{L}(u_j, \lambda_j, \gamma_1, \dots, \gamma_{j-1})=u_j\Sigma u_j-\lambda_j(u^T _j u_j - 1)-\sum _{i=1}^{j-1} \gamma_i u^T_j u_i$$ $$\frac{\partial\mathcal{L}}{\partial u_j}=2\Sigma u_j-2\lambda_j u_j-\sum _{i=1}^{j-1} \gamma_i u_i$$ $$\frac{\partial\mathcal{L}}{\partial u_j}=0\iff 2\Sigma u_j =2\lambda_j u_j+\sum _{i=1}^{j-1} \gamma_i u_i$$
What can we conclude from this? I know that if we multiply both sides of the previous equation for the transpose of any of the previous principal axes we get that the related $\gamma_i$ multiplier vanishes. However, I don't know how to interpret this.