Finding a relative error measure on a data set proportional to another

1k Views Asked by At

I have a set of exact data points $\mathcal{X}=\{X_i\}$ and another approximate one $\mathcal{Y}=\{Y_i\}$ where there is a correspondence between $X_i$ and $Y_i$ for all $i$. If $\mathcal{Y}$ was exact they would be proportional: $Y_i=CX_i$. Below is a plot together with a linear fit:

enter image description here

What I need is a measure of the error on $\mathcal{Y}$, i.e. something that tells me how close it is to being proportional to $\mathcal{X}$. I want to use this to check how good my calculation of $\mathcal{Y}$ is (it's computed using Metropolis Monte Carlo but the standard bootstrap error estimates from that are too low, as is apparent when putting in error bars in the above plot).

Since I don't know $C$ my best idea is to use the difference between the fit and the $Y_i$. The first thing I thought of was to simply use the RMS difference; i.e. if the linear least squares fit is $Y_i=AX_i+B$, I'd use

$\sigma=\sqrt{\sum_{i=1}^n(Y_i-AX_i-B)^2/n}$.

This works pretty well but it would be nice to have something like a relative error. One reason for this is that I have similar data sets to the above for other parameters, where both $C$ and the values $X_i$ and $Y_i$ themselves are very different magnitudes. And I would like to be able to compare errors between these different parameter data sets.

So then I thought it might use something like

$\sigma_{rel}=\sqrt{\sum_{i=1}^n\Big(\frac{Y_i-AX_i-B}{Y_i}\Big)^2/n}$,

but this seems to give way too big errors. For the plot above for example I get $\sigma_{rel}=1.733$, which suggests (if $\sigma_{rel}$ worked the way I thought) that on average the points have an error of almost twice the magnitude of the value itself (rendering every digit false). Obviously there's something wrong with my error measure.

My final thought was to try the linear correlation coefficient:

$\sigma_{corr}=\frac{\sum_i(X_i-\bar{X})(Y_i-\bar{Y})}{\sqrt{\sum_i(Y_i-\bar{Y})\sum_i(X_i-\bar{X})}}$,

where bar denotes average. This is supposed to say how close the data fits a linear model, where $\sigma_{corr}=1$ for perfect fit (with positive $A$).

The problem with this one is that it gives too high numbers; for example the data plotted above has $\sigma_{corr}=0.982$. This makes sense I guess because the data is clearly pretty much linear, but for my use (as a relative error estimate on $\mathcal{Y}$) it doesn't work.

Does anyone have a good idea for a relative error measure on $\mathcal{Y}$?