I'm doing some "exploratory" data analysis over a large set of classes/proteins, with a few hundred different features (I.E. Continuous variables) extracted from the data. The features are calculated by different criteria (Letter frequency, letter group frequencies, physiochemical parameters, protein length, etc'), and there's no reason to assume that any feature has a normal distribution (but I don't know what sort of distribution it might have).
My goal is to normalize the features, so I can use the features to discriminate between different classes, using machine learning/python/matlab most likely. (For that I need to normalize the features).
So, what's the best way to normalize the features for the different groups? (Standard normalization, i.e sample-mean/Var , doesn't seem appropriate, since the underlying distribution(s) may be non normal, and dividing into percentiles loses a lot of information).
Thank you very much, and I apologize if this is trivial.