Find the pollution percentage of a dataset

23 Views Asked by At

I think this is a simple question but I really doubt what should I consider. In a signal for example with 20000 data, if I know 4000 of data are spikes, how should I compute the pollution percentage? something like:

x = 4000 / 20000 * 100
y = 4000 / (20000 - 4000) * 100

or anything else? the true pollution percentage is noises / all or noise / valid_data ?

1

There are 1 best solutions below

1
On BEST ANSWER

Assuming half of the data is noise then I would certainly expect a $50$% pollution (i.e. $x$) rather than $100$% (i.e. $y$).

Note also that $y$ is unbounded, do you really want pollution to go to infinity when all data is corrupted ?

Since your wording is "pollution percentage" only $x$ fits the range $0..100$.