I am currently writing about a dataset of collected handwritings. I want to show some characteristics of the dataset. For example I think it is interesting to show how long it took users to create the dataset.
So I extract for each recording the time and thus get a list of non-negative real numbers. As a few instances have values > 30,000 and some < 5, but most instances are in [30, 60], I want to cut off those outliers and visualize only the rest in a plot.
So I remove the top 0.5% and the bottom 0.5% before visualizing it (where the x-axis is the time $t$ and the y-axis is the number of recordings with recording time lower than $t$).
Is there a name for removing the lower 0.5% and the top 0.5% of all datasets? (where 0.5% refers to the total number of datasets, not to the values)
When I have done this in the past, this was called "trimming" (not my term).
I used this to make graphics more visible, and I typically trimmed the top and bottom 5% of values, not 5% of number of points.
More specifically, if the values ranged from 0 to 100, I removed all the points with values that were > 95 or < 5, and then rescaled the display so the remaining points were displayed from the min to the max (usually 0 to 255).
I found that this made details in the data much easier to see.
Another method is to generate a histogram of the values and adjust the displayed values such that the modified display has a uniform histogram. This essentially uses the inverse distribution.