I have a series of 12 connected 'zones' that have a specific numerical value at any given point in time, either negative or positive. This amounts to time series data.
I have some forecasted values for each zone and I have some realised values. The data are discrete, one value per hour and run over a two year period.
The forecasted values of the zones can change depending on 2 tunable parameters but the realised results always remain the same as I'm working with historical data.
The error is defined as the difference between the forecasted and the realised result.
I wish to robustly compare the errors to evaluate results from different methods. A method being some tweaking of one of the parameters that gives a new forecast.
For example. Lets say I modify one of the parameters of the algorithm and get some newly forecasted values that effectively give me a different error (difference between the forecasted and the realised values), what would be the best way to compare the new result to the original forecasted values for such a large dataset?
As I see it there are two features I need to consider:
Feature one; the absolute size of the error
Feature two; the relative size of the error compared to the value of the zone at a give n time.
Incorporating both of these without setting thresholds of what a significat error is though is tricky.
Example:
Scenario 1: lets say for a given zone at a given time we have a value of 1000 and an error of one, its easy to properly assess this error. It is a small error and is a small percentage of the zones value at that point in time and not very significant.
Scenario 2: A zone has a small value, lets say 5, and a small error lets say 4, then this is a very large percentage of the value of the zone even though it is what could be considered a small error.
Scenario 3: a zone has a large value, 12000 but also a large error of 200, conversley as a percentage of the zones value is is small even though the error is rather large.
I also thought about scaling the datasets but then the errors would scale too and not solve the problem of their relative significance.
I also thought about setting threshholds of what a significant error is and using that as a flag in the results but this brings in a degree of human influence I would rather avoid having.
I hope the problem is clear. Does anyone have any good suggestions?
This Post Has a good idea, which works quite well. I'd love to know where it came from and further explore alternative, similar ideas