I have some data on location in 3D space, and am analyzing a couple of models that are supposed to predict the said location. The data I have is a collection of distances as a function of time between the observed and modeled locations. If I can aggregate the error distance data within some time range, what is the most appropriate way to do it? Should I present the mean? The RMS? Should I weight it somehow since there is "so much more space" at a distance of 2 meters, say, than there is at a distance of half a meter?
Is there a fairly definite answer to this question, or is it going to be a "Well, it all depends on what you want to do ...." kind of thing? Thanks.
If you are analyzing two models, and probably want to compare them to identify which shows the best prediction of the location, you should firstly assess the distribution of data to check whether it satisfies normality or not. To do this, you could run an appropriate test (e.g., the Shapiro-Wilk test). If the distributions are normal, you can simply describe the error distance data as mean ± standard deviation. If the distributions are not normal, you can try some transformation to check whether the distributions can be "normalized" (e.g., applying a logarithmic transformation) or, more simply, you can use the median [interquartile range]. The interquartile range is the range included between the 25th and 75th percentile of the distribution. I would not recommend you the use of the RMS in this context, since this measure is more suitable for other purposes.
After this, you can compare the two sets of data to assess whether the discrepancy between the predicted and observed location differs between the two models. For this comparison, a Student's t test for paired samples is the correct test for normally distributed data, whereas a paired sample Wilcoxon signed rank test could be a good choice in case of non-normal distributions.
The idea of weighting data according to the absolute distance may be appropriate, as the observed-modeled discrepancies are expected to increase with increasing distance. Such additional analysis on weighted data should be performed by repeating all above-mentioned steps.