I am interested in functions $f: \mathbb{R}^+ \to \mathbb{R}$, for the purpose of mapping non-negative statistical features of objects (such as lengths) to the whole real line. Then, I intend to use statistical methods to classify objects of different classes based on these mapped features. In particular, I am using machine learning methods such as Support Vector Machine (SVM) and neural networks.
The problem with using the unmapped features to do classification in $\mathbb{R}^+$ is that a feature of great magnitude is weighted much more heavily than a feature close to zero, even though these two scenarios represent opposite extremes and should be of equal importance.
So, $\log$ does the trick, but it has problems too. There will inevitably be noise in the data, and so you may get a feature that is extremely close to zero, and after the log is taken, the feature will be negative with extreme magnitude. Noise in those features close to zero is heavily amplified by the log.
One fix is to add a small $\varepsilon$ to the features before taking the log, but that seems rather uninspired. Any ideas for what other mappings I could use, or other remedies? Or am I doomed to such negative effects by the nature of functions $f: \mathbb{R}^+ \to \mathbb{R}$?