Suppose one has a set of numbers. To help understand my question, suppose that these numbers are from two different temperature sensors. In this first example, both sensors are placed in the same environment and should read the same temp
Col 1 Col 2
10 10
20 19
30 29
20 20
20 19
30 30
20 19
10 9
20 20
30 28
Since the sensors are in the same environment, they should read the same, but they don't so I need to correct for their offset. To calculate a correction factor between these two sets of numbers, so that column 2 is as equal to col 1 as possible, I do a regression analysis. For a linear regression the equation would be:
y=0.8041x + 3.7143
or
Col 2= 0.8041 * Col 1 + 3.7143
Now suppose I have a second set of numbers. In this second example the numbers represent the same sensors, but this time they are placed in different environments. So I expect them to read differently, but I also expect them to retain the same error I calculated above
Col 3 Col 4
11 10
21 19
30 27
20 20
21 19
30 25
20 18
11 15
20 20
30 25
My question is- is there a way to apply the same correction factor calculated from the first set of numbers to the second set? To be more specific, I am not looking to do this:
Col 4= =0.8041* Col 3 + 3.7143
and get this
Col 3 Col 4 (new based on regression)
11 12.5
21 20.6
30 27.8
20 19.7
21 20.6
30 27.8
20 19.7
11 12.5
20 19.7
30 27.8
as I loose all information about the original column 4. I am hoping to find a way to use the correction factor from Col 1 and Col 2 as a "calibration", and apply it to Column 4 in a way that retains the original information in that column but adjusts it to reflect the calibration equation.
If I assume Col 3 is correct and Col 4 is off, I was thinking the equation would look something like this
Corrected Col 4= Col 4 * (??Correction factor??)
To answer your question, let's look at what "error" means and what types of error you could have.
In your first problem, you have an overdetermined system: two measurements for one data point at each time, so a linear regression is essentially the same thing as solving the linear least squares problem for $A^TAv=A^Tb$, and $y = v_1x+v_2$.
What this results in is a model wherein the second sensor $y$ returns a scaled version of the first sensor $x$ plus an offset. It is not true to say that the offset, $v_2$ is the "error" -- unless $v_1$ is close to $1$. This is because the scaling factor $v_1$ is a slope, and the offset factor is introduced to minimize the squared error over the entire range of values.
Error can be considered to be random (uncertain or unknowable fluctuations in the process being observed) or systematic (a mean-shift in the value being observed due to uncertainty in the observation process). What you are looking to compute is the systematic error of sensor 2 with respect to sensor 1.
In this case, what I would do is compute the average difference between measurements, rather than using the offset of the linear regression. This will give you an estimate of the amount by which sensor 2 differs from sensor 1. Only then will you be able to quantify the potential relative drift in a different environment.
So, $$\epsilon = \frac{1}{n}\displaystyle\sum_{i=1}^n x_i-y_i,$$ $$y = x+\epsilon.$$