Mercer's theorem states that a positive semi-definite kernel $k$ can be decomposed as $k(x,y) = \sum_{m=1}^M \lambda_m \phi_m(x) \phi_m(y)$, where $\lambda_m$ are the eigenvalues and $\phi_m$ are the eigenfunctions of the kernel integral operator $$T_k: L_2(\mu) \rightarrow L_2(\mu) \\ T_kf(x) = \int k(x,x') f(x') d\mu(x)$$ The eigenfunctions are orthonormal in $L_2(\mu)$. For data $x_1, \dots, x_n$ drawn i.i.d. from $\mu$, and defining $$\Phi = [\phi(x_1) \dots \phi(x_n)]$$ we see that $\mathbb{E}[\Phi \Phi^T] = n I$, where $I_M$ is the identity matrix (we assume $T_k$ has $M$ eigenvalues). This is clear by orthonormality of the eigenfunctions.
Now, I came across a paper in which it is claimed that $\Phi^T \Phi = M I_n$. It is unclear to me, why that would hold. Note that $(\Phi^T \Phi)_{ij} = \sum_{m=1}^M \phi_m(x_i) \phi_m(x_j)$.
Even if this identity is meant to hold only in expectation, I don't see it. We do have $\mathbb{E}[(\Phi^T \Phi)_{ii}] = \sum_{m=1}^M \mathbb{E}[\phi_m(x_i)^2] = M$. But for $i \neq j$, we get $\mathbb{E}[(\Phi^T \Phi)_{ij}] = \sum_{m=1}^M \mathbb{E}[\phi_m(x_i)]^2 $ where we use independence of $x_i, x_j$. Why is that expression zero?