Incorporating the absolute difference and the relative difference in a single metric

12 Views Asked by At
  1. I have a series of 12 connected 'zones' that have a specific numerical value at any given point in time, either negative or positive. This amounts to time series data.

  2. I have some forecasted values for each zone and I have some realised values. The data are discrete, one value per hour and run over a two year period.

  3. The forecasted values of the zones can change depending on 2 tunable parameters but the realised results always remain the same as I'm working with historical data.

  4. The error is defined as the difference between the forecasted and the realised result.

I wish to robustly compare the errors to evaluate results from different methods. A method being some tweaking of one of the parameters that gives a new forecast.

For example. Lets say I modify one of the parameters of the algorithm and get some newly forecasted values that effectively give me a different error (difference between the forecasted and the realised values), what would be the best way to compare the new result to the original forecasted values for such a large dataset?

As I see it there are two features I need to consider:

Feature one; the absolute size of the error

Feature two; the relative size of the error compared to the value of the zone at a give n time.

Incorporating both of these without setting thresholds of what a significat error is though is tricky.

Example:

Scenario 1: lets say for a given zone at a given time we have a value of 1000 and an error of one, its easy to properly assess this error. It is a small error and is a small percentage of the zones value at that point in time and not very significant.

Scenario 2: A zone has a small value, lets say 5, and a small error lets say 4, then this is a very large percentage of the value of the zone even though it is what could be considered a small error.

Scenario 3: a zone has a large value, 12000 but also a large error of 200, conversley as a percentage of the zones value is is small even though the error is rather large.

I also thought about scaling the datasets but then the errors would scale too and not solve the problem of their relative significance.

I also thought about setting threshholds of what a significant error is and using that as a flag in the results but this brings in a degree of human influence I would rather avoid having.

I hope the problem is clear. Does anyone have any good suggestions?

This Post Has a good idea, which works quite well. I'd love to know where it came from and further explore alternative, similar ideas