Data/Feature normalization

55 Views Asked by At

Let's say I have a set of random elements of the interval $(-1,1)$.

$$S=\{0.03,-0.1,0.5,-0.45,...\}$$

I'm looking for a bijective function $f(x)$ which normalizes the elements of $S$ such that the mean is close to 0 and the variance is close to 1.

Since I need those elements as input for a neural network with thanh activation function, it's very important that the normalizing function is bijective.

I thought about something like the normal distribution which gives control over the mean and standard deviation.

Maybe someone knows a specific function with the required attributes or a procedure to solve this task.

1

There are 1 best solutions below

2
On BEST ANSWER

Find the sample mean $\bar X$ and the sample SD $S_X$ of your observations in $(0,1).$ Then make the transformation $Y_i = (X_i - \bar X)/S_X.$ The result is that $\bar Y = 0$ and $S_Y = 1.$ Keep track of the original values $\bar X$ and $S_X,$ and you can reclaim the original $X_i$s from the $Y_i$s.

This transformation is somewhat similar to the 'standarization' often employed to make it possible to use standard normal tables. You can call it 'standardization' if you want, but its use does not depend on having normal data, and it does not produce normal data.

Here is a quick demonstration in R:

 x = runif(10)
 x
 [1] 0.18141713 0.01858419 0.02015122 0.16797647 0.34340317
 [6] 0.70625278 0.10702176 0.87925563 0.39144687 0.06993951
 a = mean(x); s = sd(x)
 y = (x-a)/s
 y
 [1] -0.3618861 -0.9119488 -0.9066552 -0.4072897  0.1853157
 [6]  1.4110507 -0.6131996  1.9954682  0.3476112 -0.7384664
 mean(y);  sd(y)
 [1] -1.387779e-17  # essentially 0
 [1] 1
 x1 = s*y + a  # reclaimed x's
 x1
 [1] 0.18141713 0.01858419 0.02015122 0.16797647 0.34340317
 [6] 0.70625278 0.10702176 0.87925563 0.39144687 0.06993951