Finding an average when there is fake data

61 Views Asked by At

A reader $i$ reads $N_i$ articles per day (usually like 1-20). There are many (e.g. million) readers. Some minor number (e.g. 100 or 1000) of readers are fake. A fake reader reads many (e.g. 1000 or 10000) articles per day.

We pay readers.

So the obvious problem is to filter away fake readers. How to do this?

I'd find an average number $M$ of read articles per day and decrease $N_i$ to $3M$ if $N_i > 3M$ to pay only for $3M$ reads rather than to fake $N_i$ reads.

The main question is how could I calculate the average in such a way that fake readers would not distort the average much?