Consider symmetric matrices $A$ and $E$ (both with dimensions $n \times n$), and let $\hat{A} = A + E$. If $\mathbf{u_1}$ is the eigenvector of $A$ corresponding to the largest eigenvalue of $A$ (denoted as $\lambda_1$), then we can approximate $\mathbf{\hat{u}_1}$ (the eigenvector of $\hat{A}$ corresponding to the largest eigenvalue of $\hat{A}$) as: $$ \mathbf{\hat{u}_1} \approx \mathbf{u_1} + \sum_{i = 2}^n \frac{\mathbf{u_i}^t E \ \mathbf{u_1}}{\lambda_1 - \lambda_i}\mathbf{u_i} $$
In my coursework we were simply told this is a first order approximation, but I would like to know how this equation is derived. Additionally, I am aware that the equation holds for small $\|E\|$ but isn't this relative to the size of $n$? It would seem that the accuracy of the approximation is a function of how large $\|E\|$ is relative to $n$ and not simply how large $\|E\|$ is (considering that the elements of the normalized eigenvectors $\mathbf{u_i}$ would get increasingly smaller).
It is not specifically about QM, it is just perturbation of matrices. In your notation, consider $A+\epsilon E$ and assume the eigenvalues and eigenvectors have Taylor expansions in $\epsilon$. Plug the expansions into the eigenvector equation $(A+\epsilon E)u(\epsilon)=\lambda(\epsilon)u(\epsilon)$, take the indicated inner product, and equate coefficients of $\epsilon$.