Assume we have two variables $f_i$ and $f_j$, both of which follows Gaussian distribution, with corresponding means $\mu_i$ and $\mu_j$.
Can we get to the conclusion that: ||$\mu_i$ - $\mu_j$||$^2_2$ $\leq$ $\frac{1}{N_i}\frac{1}{N_j}$$\Sigma$||$f_i$-$f_j$||$^2_2$ ?
Where $N_i$ is the total number of samples in the first Gaussian distribution, and $N_j$ is the total number of samples in the second Gaussian distribution.
Sorry that I don't have a mathematical background and I might lost some coefficients or terms in the above formula. It seems a lot of ML papers are using such conclusions to derive some upper bound for KL divergence. After some searching online, it may also be related to kernel density estimation, is it right?
Could anybody provide any guidance on that? Thanks in advance