Assume that we have a real symmetric matrix $\mathbf{A}\in\mathbb{R}^{n\times n}$ obtained as following : $$\mathbf{A}=\mathbf{N}-\mathbf{P},$$ with $\mathbf{N}\in\mathbb{R}^{n\times n}$ and $\mathbf{P}\in\mathbb{R}^{n\times n}$ two real symmetric positive-definite matrices.
What is the best algorithm (in terms of computation time) to compute the eigenvector corresponding to the largest (algebraic) eigenvalues of $\mathbf{A}$ matrix ?
For the requested information, the dimension of the matrices is typically in the following order of magnitude: $10^{2}\leq n \leq 10^{5}$.
The matrices are dense and as regards their the spectrum of eigenvalue, I do not know how it is distributed.
However, I can describe the computation method of $\mathbf{N}$ and $\mathbf{P}$ : $$ \mathbf{N} = \sum_{i=1}^{d_n} v_i \mathbf{n}_i\mathbf{n}_i^\top $$ $$ \mathbf{P} = \sum_{j=1}^{d_p} u_j \mathbf{p}_j\mathbf{p}_j^\top $$ with $d_n \geq n$, $d_p \geq n$, $\mathbf{n}_i \in \mathbb{R}^{n}$, $\mathbf{p}_j \in \mathbb{R}^{n}$, $||\mathbf{n}_i||_2 \leq 2$, $||\mathbf{p}_j||_2 \leq 2$, $v_i \in \mathbb{R}^+$, $u_j \in \mathbb{R}^+$, $\sum_i v_i = 1$ and $\sum_j u_j = 1$.
I have access to a lot of computational power and I can use a parallel algorithm on CPU or GPU.
The added information presents us with a wealth of possibilities. We are given matrices with dimensions between 100 and 10,000 and a structure which facilitates parallelism at a scale which is not normally seen.
We have essentially two choices, either use a subspace iteration which is really a glorified version of the power method or use a "direct" method which computes all eigenvalues using $O(n^3)$ arithmetic operations.
The "standard" direct method consists of a two sided reduction to tridiagonal form using Householder reflectors followed by an application of the QR algorithm to determine the eigenvalues. This algorithm is implemented in both LAPACK (serial) and SCALAPACK (parallel). The cost of this approach is prohibitive towards the large end of the scale. A few years back, $n=100,000$ could be done in 6 hours time using a few hundred cores. However, for small matrices and especially if $d_n$and $p_n$ (the number of terms) are very large, I recommend assembling the matrix and computing all eigenvalues.
Now because of matrices are written as sums of known terms there are excellent opportunities for parallel computation of the action of $A$, i.e the map $V \rightarrow AV$, where $V$ is a tall matrix. As a result it becomes attractive to apply the subspace iteration method to compute the dominant eigenspace for $A+\alpha I$. Here $\alpha > 0$ is constant large enough to ensure that the new matrix is positive definite. You will also want to apply Rayleigh-Ritz acceleration to improve the rate of convergence. Moreover, you will have to experimentally determine the number of $r$ columns which are appropriate for $V$ as the rate of convergence depends upon the distribution of the eigenvalues, but start with $r \ll n$.
Unless the expansion coefficients $v_i$ and $u_j$ decay rapidly and a low rank approximation becomes viable, I do not see any way to exploit the fact that they sum to $1$.
In short, I cannot give you a definitive answer, but you have a place to start. Otherwise, I recommend inquiring at the Computational Science forum for StackExchange.
You will need LAPACKs routines DSTEQR and DSYTRD for the small problems and in order to do Rayleigh-Ritz acceleration in your own subspace iteration method. The names of the ScaLAPACK routines escape me at the moment.