Transforming random variable to have distribution of another

1.5k Views Asked by At

How can I transform a random variable so that it's distribution matches that of a reference variable?

I have two vectors of randomly sampled values, each from a separate distribution. I don't know which family either belongs to.

The chart below shows their respective distribution:

enter image description here

3

There are 3 best solutions below

3
On

Here's something you can try as a first approximation: focus on using a linear transformation, and aim for the goal of getting the two variables to have the same mean and variance. Call the two random variables $X, Y$ and their respective means / standard deviations $\mu_X, \mu_Y$ and $\sigma_X, \sigma_Y$.

We want to find coefficients $a, b$ such that $a X + b \stackrel{d}{=} Y$; consequently, we will have $\mu_X = \mu_Y$ and $\sigma_X = \sigma_Y$. The standard deviation of the variable $a X + b$ is $a \sigma_X$; since we want this to be equal to $\sigma_Y$, we set $a \sigma_X = \sigma_Y \implies a = \frac{\sigma_Y}{\sigma_X}$. Similarly, the mean of $a X + b$ is $a \mu_X + b$; we want this to be equal to $\mu_Y$, and we already know $a$, so we can solve for $b$.

Since you don't have access to the parameter coefficients $\mu$ or $\sigma$ for either variable, you can use their estimators $\overline x, \overline y, s_X, s_Y$ as an approximation.

Note that this transformation won't be perfect. No easy transformation will smooth out those jagged edges on the blue density to make them smoothly match the red one. But, it's a start. Does this help?

0
On

You've been very contradictory as to whether you want to transform the samples or the distributions. If it's the latter, all you have to do is convert between percentiles. Given an $x$ value from one distribution, find what percentile that point is, and then find what $x$ value corresponds to the percentile for the other distribution. If we call the cumulative distribution for teal $T$, and the one for red $R$, then given a point on the teal curve, you can take $R^{-1}(T(x))$, and that will give you the corresponding $x$ value for $R$.

2
On

Many thanks to Aaron for pointing me in the right direction. I used his answer as a starting point and added a few bells and whistles.

I took the ECDF of the variable I want to match to (the testing dataset in the chart below) and fit a polynomial regression with the variable as the response and ECDF value as the predictor.

Using this regression, I then predicted how the variable I want to transform (the training dataset) values should be modified, using its ECDF values as the input. Below are the resulting distributions.

enter image description here