Let $P(X)$ be a probability density function over $\mathbb{R}^{n}$, and assume its random variable $X$ to have a zero mean and a covariance matrix:
$\Sigma = cov(X, X) = \mathbb{E}[XX^T]$
In case of kernel PCA methods, $X$ is actually the feature vector of the original data point, and $n$ may be finite or infinite depending on the associated kernel.
Let $\{ X_i \}_{i = 1}^{N}$ to be i.i.d. samples from $P(X)$, and define estimator of $\Sigma$ as:
$\hat{\Sigma} = \frac{1}{N} \sum_{i = 1}^{N} X_i X_i^T$
Let $\{ \lambda_j, v_j \}$ and $\{ \hat{\lambda}_j, \hat{v}_j \}$ be eigenpairs of $\Sigma$ and $\hat{\Sigma}$ respectively, where eigenvalues are sorted in decreasing order.
How close are $n \times n$ matrices $\Sigma$ and $\hat{\Sigma}$? How close are their eigenvalues $\{ \lambda_j \}$ and $\{ \hat{\lambda}_j \}$, and eigenvectors $\{ v_j \}$ and $\{ \hat{v}_j \}$? Specifically, can we lower-upper bound the inner product $<v_k, \hat{v}_m>^2$ more tightly than the obvious $[0, 1]$ interval?
Below I summarize several papers that answer some parts of my question. If anyone has more suggestions, please share them below.
Here I add short summary of papers I found to be relevant for the above questions.
Denote by $V_k$ and $\hat{V}_k$ sub-spaces that correspond to spans of $\{ v_j \}_{j = 1}^{k}$ and $\{ \hat{v}_j \}_{j = 1}^{k}$ respectively. Denote by $P_k$ and $\hat{P}_k$ the orthogonal projectors on these sub-spaces. Finally, denote by $V_k^{\perp}$ and $\hat{V}_k^{\perp}$ the orthogonal complements of $V_k$ and $\hat{V}_k$.
John Shawe-Taylor,2005 produces probabilistic bounds on eigenvalue sums, i.e. $\sum_{j = 1}^{k} \lambda_j$. Additionally, there are bounds on projection of new data onto $\hat{V}_k$ and onto $\hat{V}_k^{\perp}$. Finally, it gives bounds on $\hat{v}_j^T \Sigma \hat{v}_j$.
Laurent Zwald, 2006 bounds Hilbert-Schmidt norm of a differences $\Sigma - \hat{\Sigma}$ and $P_k - \hat{P}_k$. Likewise, there is bound on $<v_k, \hat{v}_k>^2$.
Gilles Blanchard, 2007 there are bounds on projection of new data onto $V_k^{\perp}$ and onto $\hat{V}_k^{\perp}$. Likewise, bounds produced to relate sums $\sum_{j = k}^{n} \lambda_j$ and $\sum_{j = k}^{n} \hat{\lambda}_j$. This work refines some of the results of 1.
Roman Vershynin, 2010 bounds operator norm of $\Sigma - \hat{\Sigma}$, showing that this norm is small with high probability when $N = O(n)$ and the distribution has finite fourth moment.
Lorenzo Rosasco, 2010 bounds $\sum_{j \geq 1} (\lambda_j - \hat{\lambda}_j)^2$, $\sup_{j \geq 1} (\lambda_j - \hat{\lambda}_j)^2$, and $|tr(\Sigma) - tr(\hat{\Sigma})|$.
Andreas Louka, 2017 bounds $|<v_k, \hat{v}_m>|$ for any $k$ and $m$, showing its bound to be inverse proportional to $|\lambda_k - \lambda_m|$. Further, a probabilistic version of this bound is proposed relating expressions $|<v_k, \hat{v}_m>|$ and $|\lambda_k - \hat{\lambda}_k|$ with a measure of the concentration of the distribution in the direction of $v_k$.
Wainwright Martin J, book, 2019 bounds 2-norm of $\Sigma - \hat{\Sigma}$ (Section 6), and $||v_1 - \hat{v}_1||_2$ (Section 8). Also, some theorems are provided especially for sparse covariance matrices.