Scaling data sets to match each other with least error

Question

Scaling data sets to match each other with least error

50 Views Asked by Bumbble Comm At 11 May 2026 - 8:25

I have two data sets: A & B with values: $a_1,a_2...a_n$ and $b_1,b_2...b_n$ that represent the values for the same elements ($x_1,x_2...x_n$). For instance, $x_1 = a_1$ in the first data set and $x_1 = b_1$ in the second data set.

These data sets have very different values, but its relative values should be the same ($\frac{a_i}{a_j}=\frac{b_i}{b_j}$). This is not the case because the data come from experiments. I would like to obtain a scaling constant to multiply data set B to match data set A with the least error.

What is the best method to do this?

Edit: Also, each value in B has an uncertainty measurement, how can I take into a account this effect? As I must be more focused on matching the values that have the least uncertainty.

Original Q&A

There are 2 best solutions below

Bumbble Comm On 14 Jun 2018 - 3:19

This looks like a linear least-squares fit problem. If you imagine $y=\{a\}$ and $x=\{b\},$ (in a shameless abuse of notation) you're interested in a relationship $y=mx+c,$ most likely with $c=0.$ For most solvers, you can force the origin to be on the line and thus eliminate $c$. Your $m$ found from the solver is the solution to your problem. Taking the measurement uncertainty into account is more difficult. Thinking...

[EDIT]: I am imagining that you could normalize the data points somehow, and use the uncertainty to weight the points more or less: more uncertainty equals less weight, and less uncertainty equals more weight. This would have to be a reversible process, however. This sounds a bit like feature scaling, except that we're not normalizing every data point the same. You could probably get this to work, I think.

**Bumbble Comm** · Accepted Answer

Here is an example using data similar to yours. At a hospital, blood tests are routinely performed on newborn babies to determine whether too many red cells are present in the blood. Two methods of assaying blood cells are in common use: hematocrit (which determines the percent by volume of red cells) and hemoglobin (which is found by making a chemical determination of the amount of hemoglobin in the blood, expressed as grams per deciliter).

We have laboratory measurements of both, called LabCrit and LabHgb for 43 newborn babies. A regression 'through the origin' ($0$ y-intercept), as suggested by @AdrianKeister (+1), gives the following result:

Regression Equation

LabHgb = 0.340060 LabCrit

R-sq = 99.97%

Notes: (1) One reason for monitoring newborns in this way is that some babies are born with too many red cells, a potentially life-threatening condition, which is easily remedied if detected immediately. (2) It is well-known that hemoglobin (in g/dl) is about $1/3$ times hematocrit (in %), so our findings match what has been observed before. (3) The reason for this particular study was to determine the feasibility of using a new optical method to assay red blood cells (4) Data from Herzog and Felton: Hemoglobin screening for normal newborns: J. Perinatology, XIV, 4, July 1994.

Scaling data sets to match each other with least error

There are 2 best solutions below

Related Questions in STATISTICS

Related Questions in STATISTICAL-INFERENCE

Related Questions in MACHINE-LEARNING

Trending Questions

Popular # Hahtags

Popular Questions