I'm watching a YouTube video on the math behind adversarial and moment matching networks. It is covering the idea of maximal mean discrepancy between two probability distributions. It is called "moment matching" because the idea is define a transform $\Phi$ that maps a vector to a space that is descriptive of the containing distribution's moments. This allows for symmetric comparison of two distributions based on their moments.
Then a statement is made that you don't need to know this $\Phi$ directly, because as it turns out, if you expand the distance formula $|\Phi(x_i) - \Phi(y_j)|^2$ it expands to a bunch of terms that contain $\Phi(x_i)^T\Phi(x_i)$ or $\Phi(y_i)^T\Phi(y_i)$.
So this is where things get fuzzy for me. The professor says that these terms can be replaced with a "kernel" of your choosing.
In my experience, the word "kernel" is synonymous with "null space". Can some one fill in the blanks on this statement?