Mercer's theorem states that a function $k: \mathbb{R}^d \times \mathbb{R}^d$ is a kernel if and only if the matrix $K = k(x_i, x_j)$ is PSD for all $X = \{x_1, \dots, x_n \} \subset \mathbb{R}^d$.
Given some function $k$, I have to prove that it is a valid kernel. I am having trouble analyzing the kernel matrices from Mercer's theorem, so my approach is different.
I construct a feature map $\phi: X \rightarrow \mathbb{R}^n$ for any given data set $X$, such that for all $x_i, x_j \in X$ we have $k(x_i, x_j) = \phi(x_i)^T \phi(x_j)$. I can then write $$K = \Phi \Phi^T$$ where I define $\Phi_{i,j} = \phi_j(x_i)$. Thus for any $z \in \mathbb{R}^n$ we have $$z^T K z = \| \Phi^T z \|^2 \ge 0$$ Hence $K$ is PSD and I am done.
Is this fine? I think it should be, but $\phi$ depends on $X$ and I am new to this topic and want to make sure I understood Mercer correctly. Bonus question, is there a way to use this data-dependent $\phi$ that I construct to get the general feature map associated with the RKHS?