Justifying the use of an unconventional metric to rate accuracy of predictions

85 Views Asked by At

I was having a discussion with friends and at some point we decided to make predictions on a quantity (value of daily new covid cases in a specific area). We all made our predictions and then we looked at the real value. Let's say the real value was $15$, Alice predicted $21$, Bob $11$, and other friends above $21$. We said Alice "won", and then I jokingly said but if you take the relative error then Bob won, because $$\frac{|21-15|}{21} \approx 0.28 < \frac{|11-15|}{11} \approx 0.36$$ In other words, you take the absolute difference and you divide it by the prediction (not the true value).


Edit: As an answer pointed out, and I confirmed, relative error is defined as the absolute error divided by the true value, not the prediction. In this case, I do not want to take the relative error, because it will produce the exact same verdicts as the absolute error. Relative error is useful to compare predictions for different targets (i.e., different real values), but in our case we have a single real value, so in essence it's not different to the absolute error. Let's call my metric (where I divide with the prediction value) Thanassis's Metric (TM). Trademarking it would be TM™ :) Smaller TM means the prediction is better (so it's another error metric).


My friends protested: "You can't do that! This does not make any sense!". Even though I made the argument in jest, I was surprised by the claim that this does not make sense. I tried to argue that when we are making predictions it's fine to take TM. At least, I do it all the time, it seems intuitive to me. I tried to give some examples and after a few attempts we settled on this: Suppose you see an aerial photo of a crowd of $2000$ people and you are called to make a prediction of how many people you see in the photo. A prediction of $100$ is far far worse to me than a prediction of $4000$, even though the absolute error (and the relative error) is smaller in the first case. When I try to explain the rationale behind it, I end up with the following: When we are making predictions that span several orders of magnitude (and this is often the case with predictions), we are concerned about getting the order of magnitude right. Think about it this way: this person that guessed $100$ in my example, they could have guessed $100\,000$ in another case (when the target is again $2000$), so we are not capturing this kind of error if we are just taking the absolute difference.

I guess instead of taking the TM we could have taken the absolute error of the logs $$|\log(\text{target}) - \log(\text{prediction})|$$

The logs difference metric is a direct "translation" of my rationale (we are interested about the orders of magnitude). Interestingly, I see that the logs method does not yield the same verdict on my initial example (target $15$, predictions $11$ and $21$). $11$ is the better prediction. But it does yield the same verdict in the more extreme example. Maybe TM is indeed a bad metric to use and the difference of logs is the right metric to use for the thing I want to achieve.

In any case, these are my questions (all falling under a general question on rating the accuracy of predictions):

  • How would you justify/refute the usage of TM on rating predictions the way I described it above?
  • How would you justify/refute using the difference of the logs for the same purpose?
  • Do you know of any real world examples that are using either metric?

Edit 2: I partly answered my own question below, by refuting the TM metric and providing some graphs of the different errors to support taking the "relative difference" as a metric. I would love to see more thoughts on the matter or examples when different metrics are used.

2

There are 2 best solutions below

2
On

I have never seen relative error being divided by the prediction and not the target. The standard definition of relative error is divided by the target value. If you divide the difference by the prediction, you create a bias toward prediction, which means it's not quite "accuracy" that you want (at least not in the traditional sense). For example, if the target is 10 and the predictions are 8 and 12. Then 8 is the more accurate prediction in your definition, which doesn't make sense.

Also, in your example, Bob predicted 11 and Alice predicted 21. So Bob won in the traditional sense. Suppose I make the prediction that the value is 100000000. Then I would have won by your definition.

0
On

TM is a bad metric indeed. A simple example shows its undesirable properties.

Assume that the target (true value) is $2000$ and that the two predictions are $1200$ and $4000$ respectively. $1200$ is a better prediction both in absolute difference and in "relative difference". Yet, TM yields $\frac23$ for the $1200$ prediction while it yields $\frac12$ for the $4000$ prediction. So it's not fit for purpose.

More generally, the difficulty in comparing predictions comes when we have two predictions on either side of the target. How do we judge which one is "closer"? The TM metric gives a boost to the larger prediction, because its TM value can never exceed $1$. On the other hand, the TM value for the smaller prediction is unbounded.

The graphs below show the different errors when the target value is $2000$, and the predictions span a wide range of values from $20$ to $200\,000$. The first graph includes the absolute difference of the logs, while the second graph includes the relative difference which is just the exp of the logs difference. enter image description here enter image description here

The graphs are plotted in a log-log scale to better show the wide range of input values, and also the wide range of output values. We can see that the relative diff (and log diff) is symmetric as opposed to the absolute difference or the TM.

I realised later, that what I've been doing intuitively was much closer to the relative difference rather than the TM metric. The TM was just born out of a poor effort to formulate what I was doing intuitively.

I still think that the relative difference is a better metric when we are dealing with predictions that span multiple orders of magnitude, and I'd love to see examples where this is used, or further justification for using it.