$[1]$ Does good $R^2$ necessarily tells us that the experiment went right? $[2]$ Is not extrapolation dangerous when used for a far point?

54 Views Asked by At

In an experiment, the relation between $x$ and $y$ is linear.

$x_\text{actual} = \alpha_x x_\text{observed} + \beta_x + \gamma_x$. [Here, $\alpha$ is the slope, $\beta$ is the intercept, and $\gamma$ is a random error].

(In most of the practical cases, $\alpha_x$ is very close to $1$, and $\beta_x$ and $\gamma_x$ are both close to $0$).

Similarly,

$y_\text{actual} = \alpha_y y_\text{observed} + \beta_y + \gamma_y$. [Here, $\alpha$ is the slope, $\beta$ is the intercept, and $\gamma$ is a random error].

For illustration, see the following plot (same idea applied for both variables) (there is a slope $\alpha$, and an intercept $\beta$, and a random error for each measurement $\gamma$).

enter image description here


Now after implementing the experiment, we have $n$ observed points:

$$(x_1,y_1),(x_2,y_2),(x_3,y_3),\dots,(x_n,y_n)$$


On these $n$ data points (not the plot above, the plot above shows the relation between observed $x$ and actual $x$, or the relation between observed $y$ and actual $y$, but not the relation between observed $x$ and observed $y$), I have two things in mind:

FIRST: having a bad coefficient of determination, $R^2$, will tell us that something went wrong in the experiment. However, having a good coefficient of determination, does not necessarily guarantee that everything in the experiment went right. My coworker disagrees with me and he says "good $R^2$ means everything went right".

SECOND: Imagine we have very close points to each other, then the straight line fit can play too much, and we cannot rely on these points to extrapolate, (except possibly extrapolating for a close point), but I think we should not extrapolate a far point relying on a line that is obtained from points which are very close to each other. Again he disagrees with me.


For FIRST, am I right, or my coworker? If I am right, can you provide me a reference for that?

For SECOND, am I right, or my coworker? If I am right, how can we use the observed data points together with $\alpha_x, \beta_x, \gamma_x, \alpha_y, \beta_y, \gamma_y$ to determine the $x$-interval in which extrapolation is valid such that we ensure that the calculated $y$ using the line is within $\pm P\text{%}$ from the actual? ($P$ is provided).

Maybe some statistical tests, like $t$-test, (Maybe), and I am not sure how.


Your help would be appreciated. THANKS!