Normalize a list of standard deviations

644 Views Asked by At

I have several columns of features that I need to normalize to values between 0 and 1 (or -1 and 1, etc.--just something standard) in order to do regressions using ML algorithms (such as SVR, KNN, etc.).

The problem is that many of the features are either actual standard deviations, or permutations of standard deviations (basically meaning that the magnitude of their values carries the same 'information' as standard deviations).

I was trying to think of a way to normalize the standard deviations without losing the ratios-- if I do a regular normalization (subtract min, divide by max), then all of my standards become equidistant.

Then I realized I could calculate the percentages enclosed by the standard deviation.

What is the formula for turning a standard deviation into it's percentage (I am going to assume a normal distribution)?

I found some Z-score stuff...but what I really need is: f(1.0) = 0.68/2 (I think 0.68 is enclosed by one std, right?) and f(-1.0) = -0.34. Then I would divide by 2 and add 0.5 (setting the values between 0 and 1).

My best guess right now is the error function of the standard deviations, ranging from values of 0-1, with its point of inflection centered on 0... Is that correct?

1

There are 1 best solutions below

0
On

Nvm. Got it. Not sure whether I should delete now...but I will leave it up until/unless I get negative feedback...

In order to convert your standard deviation information to percentages--f(std) -> prob(std)--just run the 'erf' (error) function on your std.

The error function is:

$$0.5*(1+erf((x-\mu)/(\sigma*\sqrt{2})))$$

Use $\mu = 0$, and $std = x$.

The error function must be approximated on the computer. I just copied my approximation from some code I found somewhere:

def erf(x):
   # save the sign of x
   sign = 1 if x >= 0 else -1
   x = abs(x)

   # constants
   a1 =  0.254829592
   a2 = -0.284496736
   a3 =  1.421413741
   a4 = -1.453152027
   a5 =  1.061405429
   p  =  0.3275911

   # A&S formula 7.1.26
   t = 1.0/(1.0 + p*x)
   y = 1.0 - (((((a5*t + a4)*t) + a3)*t + a2)*t + a1)*t*math.exp(-x*x)
   return sign*y