In general when considering an RKHS $\mathcal{H}$ over a set $X$ one has a kernel function $k:X\times X\rightarrow\mathbb{C}$. It is then not too hard to show that its reproducing property implies a decomposition $$k(x, y) = \sum_{e\in\mathcal{B}}e(x)\overline{e(y)}$$ for any orthonormal basis $\mathcal{B}$ of $\mathcal{H}$.
An important theorem by Mercer now states that for functions $k\in L^2(X\times X)$ over a measurable space $X$ satisfying a extended positive-definiteness condition (Mercer's condition) the function $k$ is in fact a genuine kernel and it admits a decomposition of the form $$k(x, y) = \sum_{i=1}^\infty \lambda_i e_i(x)\overline{e_i(y)}$$ for a countable orthonormal basis of eigenvectors of (the integral operator associated to) $k$.
My question is now about the (vague) similarity between these two equations. Since the eigenvectors in the second equation form an orthonormal basis, we should have been able to write it as in the first equation, i.e. without eigenvalues, if the function space $L^2$ had been an RKHS. However, these $L^2$-spaces aren't RKHSs so this would explain why the equations are not exactly similar.
Now I wonder where the RKHS enters the story in Mercer's theorem? What RKHS are we considering here? Since the usual way of constructing the RKHS starting from the kernel is by considering a linear span of $\{k(\cdot, x):x\in X\}$ and completing it (as in the Moore-Aronszajn theorem), I would expect it to be something close to an $L^2$-space. But as said before, it can't be exactly $L^2$.
Mercer's theorem works only if the RKHS is constructed on a finite measure space $(\mathcal{X},\mu)$, i.e. $\mu(\mathcal{X}) < \infty$. A relevant example is when $\mathcal{X}$ is compact. In this case, you get the "eigendecomposition formula": $$ k\left(\mathbf{x}, \mathbf{x}^{\prime}\right)=\sum_{i=1}^{\infty} \lambda_{i} \phi_{i}(\mathbf{x}) \overline{\phi_{i}(\mathbf{x}^{\prime})} .$$
This theorem does not work in the general case, for example with the Lebesgue measure $\lambda$ defined on $\mathbb{R}^d$ which is only locally finite. This is the reason for the difference and the result below follows from Bochner's theorem:
$$ k\left(\mathbf{x}, \mathbf{x}^{\prime}\right)= \int_{\mathbb{R}^d} \psi(\mathbf{x}) \overline{\psi(\mathbf{x}^\prime)} d\lambda .$$
Finally, if you want to define a kernel using an infinite-dimensional $\ell^2$ feature map, you may write it as: $$ k(\mathbf{x}, \mathbf{x}^{\prime}):=\sum_{i=1}^{\infty} f_{i}(\mathbf{x}) f_{i}(\mathbf{x}^{\prime}) .$$