In the book Hands-On Machine Learning by Aurélien Géron, in chapter two, the author states:
Another approach to transforming multimodal distributions is to add a feature for each of the modes (at least the main ones), representing the similarity between the housing median age and that particular mode. The similarity measure is typically computed using a radial basis function (RBF)—any function that depends only on the distance between the input value and a fixed point. The most commonly used RBF is the Gaussian RBF, whose output value decays exponentially as the input value moves away from the fixed point. For example, the Gaussian RBF similarity between the housing age x and 35 is given by the equation exp(–γ(x – 35)²). The hyperparameter γ (gamma) determines how quickly the similarity measure decays as x moves away from 35. Using Scikit-Learn’s rbf_kernel() function, you can create a new Gaussian RBF feature measuring the similarity between the housing median age and 35:
from sklearn.metrics.pairwise import rbf_kernel
age_simil_35 = rbf_kernel(housing[["housing_median_age"]], [[35]], gamma=0.1)
A question arises here among the students, and that's, why does it solve bimodality? We need to solve bimodality to train a good and representative machine learning model, whatever that is, but how does RBF transpose multimodal distributions to normal distributions?
My hypothesis is, RBF calculates the similarity between every point and the median point from the data, assigns a normal distribution for each point, and that's somehow convoluting $f(x)$ with a normal distribution, and by the Central Limit Theorem, that should converge to a normal distribution. So, does that make sense?