I have prices from wholesalers for a product. But sometimes the wholesalers doing mistakes and list the product to a unrealistic price. I want to filter the unrealistic prices out. This is an example:
[0] => Array
(
[articlenumber] => 32158021n
[wholesaler] => wholesaler_0
[price] => 37.00
)
[1] => Array
(
[articlenumber] => 32158021n
[wholesaler] => wholesaler_1
[price] => 204.33
)
[2] => Array
(
[articlenumber] => 32158021n
[wholesaler] => wholesaler_2
[price] => 209.32
)
[3] => Array
(
[articlenumber] => 32158021n
[wholesaler] => wholesaler_3
[price] => 211.12
)
[4] => Array
(
[articlenumber] => 32158021n
[wholesaler] => wholesaler_4
[price] => 243.89
)
As you can see wholesaler_0 is offering the product to an unrealistic price (37 compared to >200). Which is the best way to identify the faulty price?
What I have thought:
To check a price I would calculate the average of all other prices. Which would be for the first price:
37 compared to 217.17, but what now?
I could say everything that is 50% of the average price is invalid, so (217.17/2) = 108.59 > 37
and that would make the price invalid, as every price below 108.59$ would be invalid.
BUT
Lets say we have a 1 dollar price compared to 3 dollar average price, which would mean that the price is one third of the average price. But this is a realistic price, because there is only a difference of 2 dollar. It would mean that the percentage range of accepted prices should increase if the price gets smaller.
What is the correct way to calculate this? Could someone give me an advice?
Thanks in advance!
EDIT: Here is my solution: I have created a exponential formula which gives me the right percentage which a value can differ to be a faulty price. To build this I have used the help of a coordinate system:
80 * 0.96x +2.0
There's no one "correct" way to detect outliers, and heuristics that incorporate your domain-specific knowledge could work better than a general-purpose approach. But I'd consider calculating a z-score for each entry based on the mean and standard deviation of the item prices.