Scenario:
A measuring tool consists of a round measuring bung (red circle) equipped with three probes (labeled $a$, $b$, and $c$). These probes are evenly spaced at angles of $120^\circ$ from each other ($\theta = 120^\circ$).
The master part (blue circle), in which the measuring bung is seated, has a known inner bore diameter of $D$. Refer to the diagram below (figure 1):
However, it is important to note that the provided diagram does NOT accurately represent the real-world situation. In practical terms, it is nearly impossible to mount each probe perfectly tangential to the measuring bung. Consequently, each probe is mounted at a unique and unknown starting distance from the center of the measuring bung. These starting distances are denoted as $r_a$, $r_b$, and $r_c$ for probes $a$, $b$, and $c$, respectively. For a clearer representation, please refer to the revised diagram below (figure 2):
The following variables are known:
- $a$, $b$, $c$ - the distances measured by the 3 probes
- $\theta$ - the probe mounting angle around the center of the measuring bung
- $D$ - the bore diameter of the master part
The following variables are unknown:
- $r_a$, $r_b$, $r_c$ - the probe mounting distances from the center ofthe measuring bung
Objective:
The objective is to find $r_a$, $r_b$, and $r_c$. These distances will be determined during the calibration process (using the master part). Once these distances are determined, the master part can be replaced with a new part having an unknown diameter, such that the diameter of any part can be accurately measured.
Measurement Process:
The calibration process involves taking any number of measurements $(K)$ with the tooling seated in various configurations. Each time, the tooling was seated in a new position. Using CAD software, I can determine the probe measurements ($a_k, b_k, c_k$) for each measurement $(K)$.
$D = 25.0;$
$R = 12.5;$
$θ = 120^\circ;$
$K=01 :\qquad a_{01} = 5.2491254 ;\quad b_{01} = 3.6421203 ;\quad c_{01} = 5.5479800 ;$
$K=02 :\qquad a_{02} = 3.5261623 ;\quad b_{02} = 4.0069957 ;\quad c_{02} = 6.9097135 ;$
$K=03 :\qquad a_{03} = 5.2425714 ;\quad b_{03} = 2.2366199 ;\quad c_{03} = 6.9231890 ;$
$K=04 :\qquad a_{04} = 5.6053228 ;\quad b_{04} = 4.0762025 ;\quad c_{04} = 4.5958384 ;$
$K=05 :\qquad a_{05} = 2.6211544 ;\quad b_{05} = 4.3330685 ;\quad c_{05} = 7.3330685 ;$
$K=06 :\qquad a_{06} = 5.6305936 ;\quad b_{06} = 1.7881182 ;\quad c_{06} = 6.8966301 ;$
$K=07 :\qquad a_{07} = 3.1218340 ;\quad b_{07} = 5.3989306 ;\quad c_{07} = 5.7511286 ;$
$K=08 :\qquad a_{08} = 2.8136425 ;\quad b_{08} = 3.7336225 ;\quad c_{08} = 7.7652347 ;$
$K=09 :\qquad a_{09} = 6.7312779 ;\quad b_{09} = 1.6301678 ;\quad c_{09} = 5.7985227 ;$
$K=10 :\qquad a_{10} = 3.7744161 ;\quad b_{10} = 5.6724511 ;\quad c_{10} = 4.7375570 ;$
Find $r_a; r_b;r_c;$ (Prove that $r_a = 8;r_b = 9;r_c = 6;$ for this example)
Solution based on answers from Fedja
This is my understanding on the solution provided by Fedja... from the perspective of a learner. If you are reading this and you are not Fedja, I recommend you read his answer instead of this.
We will deploy the least-squares method to find numerical approximations of the $r_a$ , $r_b$, and $r_c$.
Let's establish a few important functions:
A mathematical expression for the radius, in terms of the variables a,b,c,ra,rb, and rc. I recommend referring to the answer provided by Joshua Wang for a detailed understanding.
$R=\sqrt{\frac{\left(x^2+y^2+xy\right)\left(x^2+z^2+xz\right)\left(y^2+z^2+yz\right)}{3x^2y^2+3x^2z^2+3y^2z^2+6x^2yz+6{xy}^2z+6xyz^2}}$The mean quadratic error G.
This function returns the square root of the sum of the squared errors for each measurement point $(k)$. It provides a measure of how well the given guess ($r_a$, $r_b$, $r_c$) fits the measurements. This is the function that is being minimized in order to find the values of $r_a, r_b$ and $r_c$ that provides the best fit to the given measurements.
$G\ =\ \sqrt{\frac{1}{K}\ \sum_{k=0}^{K}\left(Error_k\right)^2}$
Where, $K$ is the number of measurements. $Error_k=\sqrt{\frac{\left({x_k}^2+{y_k}^2+x_k\ast y_k\right)\left({x_k}^2+{z_k}^2+x_kz_k\right)\left({y_k}^2+{z_k}^2+y_kz_k\right)}{3{x_k}^2{y_k}^2+3{x_k}^2{z_k}^2+3{y_k}^2{z_k}^2+6{x_k}^2y_kz_k+6x_k{y_k}^2z+6\ast x_ky_k{z_k}^2}}-R$
Where
$x_k=a_k+r_a $
$y_k=b_k+r_b $
$z_k=c_k+r_c $$F(x,y,z)$ – This function is used as a constraint in the algorithm to ensure that the calculated values of $r_a$, $r_b$, and $r_c$ satisfy the given measurements.
$F\left(x,y,z\right)=3x^2y^2+3x^2z^2+3y^2z^2+6x^2yz+6{xy}^2z+6xyz^2-\frac{\left(x^2+y^2+xy\right)\left(x^2+z^2+xz\right)\left(y^2+z^2+yz\right)}{R^2}$The partial derivatives of $F\left(x,y,z\right)\ - F_x\left(x,y,z\right)$, $F_y\left(x,y,z\right)$, $F_z\left(x,y,z\right)$ with respect to $x,\ y,\ z$. The partial derivatives are an essential to optimization process as they directly affect magnitude and direction that $r_a$ , $r_b$, and $r_c$ is incremented with each iteration ${(dr}_a{,dr}_b{,dr}_c)$.
$F_x(x,y,z) = 6xy^2+ 6xz^2+ 12xyz + 6y^2 z + 6yz^2 - \frac{(2x + y)(x^2+ z^2+ xz)(y^2+ z^2+ yz)}{R^2} -\frac{(x^2+ y^2+ xy)(2x + z)(y^2+ z^2+ yz)}{R^2} $
$F_y\left(x,y,z\right)=F_x\left(x,y,z\right) $
$F_z\left(x,y,z\right)=F_x\left(x,y,z\right) $
(Due to the symmetry of $F(x,y,z)$)An system of linear equations for the increments ${(dr}_a{,dr}_b{,dr}_c)$. These equations represent the conditions that need to be satisfied for the increments ${(dr}_a{,dr}_b{,dr}_c)$ to ensure that the changes in the function $F\left(x,y,z\right)$ with respect to each variable are zero at the specific point ${(x}_k{,y}_k{,z}_k)$.
$F_x\left(x_k,y_k,z_k\right)dr_a{+F}_y\left(x_k,y_k,z_k\right)dr_b+F_z\left(x_k,y_k,z_k\right)dr_c=-F(x_k,y_k,z_k)$
The left-hand side of this equation represents change in $F$ at the point ${(x}_K{,y}_K{,z}_K)$ caused by the increments ${(dr}_a{,dr}_b{,dr}_c)$, while the right-hand side is the negative value of $F$ at that point. By setting them equal to each other, you are essentially trying to make the net change in $F$ equal to zero. This equation ensures that the increments ${(dr}_a{,dr}_b{,dr}_c)$ are chosen in a way that counteracts the change in $F$ caused by variations in $(x, y, z)$ and moves the solution towards the desired minimum or maximum of $F$ with each iteration.
$\frac{f(x_K,y_K,z_K)}{R}\ dr_a=0 $
$\frac{f(x_K,y_K,z_K)}{R}\ dr_b=0 $
$\frac{f(x_K,y_K,z_K)}{R}\ dr_c=0 $
These equations represent additional constraints on the increments ${(dr}_a{,dr}_b{,dr}_c)$. These equations ensure that the increments ${dr}_a{,dr}_b$ and ${dr}_c$ are constrained in a way that aligns with the behaviour of the function $F(x_K,y_K,z_K)$ and its relationship with $R$. By satisfying these conditions, the algorithm can find solutions that minimize the error and converge towards the desired result.
With the required functions established lets define the steps of the iterative method.
The following steps will be executed:
Initialize the values of $r_a$ , $r_b$, and $r_c$ to some initial guess, such as $R/2$.
Construct the matrices $U$ and $V$ by evaluating the partial derivatives and function values for each measurement point $k$.
Solve the system of linear equations to obtain the increments $V$
$V\ =\left(U^TU\right)^{-1}U^TV$Update $r_a$ , $r_b$, and $r_c$ by adding the corresponding elements of $V$ to the previous values.
Check if any of the updated values of $r_a$, $r_b$, or $r_c$ are negative. If so, revert them back to their previous values.
Calculate the mean quadratic error $G$ based on the updated values of $r_a$, $r_b$, and $r_c$
Compare the new mean quadratic error $G$ with the previous minimum error $e$ If it is lower, update $e$ and store the values of $r_a$, $r_b$ and $r_c$ as the new suspected minimizer.
Repeat steps 2-7 until the termination condition is met (e.g., when $e$ is within the desired tolerance or after a maximum number of iterations).
We can now implement these steps. (Implemented in Asymptote):
//Measurements from CAD
real[] a = {5.2491254, 3.5261623, 5.2425714, 5.6053228, 2.6211544, 5.6305936, 3.1218340, 2.8136425, 6.7312779, 3.7744161};
real[] b = {3.6421203, 4.0069957, 2.2366199, 4.0762025, 4.3330685, 1.7881182, 5.3989306, 3.7336225, 1.6301678, 5.6724511};
real[] c = {5.5479800, 6.9097135, 6.9231890, 4.5958384, 7.3330685, 6.8966301, 5.7511286, 7.7652347, 5.7985227, 4.7375570};
//Masterpart radius
real R = 12.5;
//Number of measurements
int K = a.length;
//Define the mean quadratic error G
real G(real ra, real rb, real rc, real[] a, real[] b, real[] c)
{
real s = 0;
for (int k = 0; k < K; ++k)
{
real x = ra + a[k];
real y = rb + b[k];
real z = rc + c[k];
real error = sqrt(((x^2 + y^2 + x * y) * (x^2 + z^2 + x * z) * (y^2 + z^2 + y * z)) / (3 * x^2 * y^2 + 3 * x^2 * z^2 + 3 * y^2 * z^2 + 6 * x^2 * y * z + 6 * x * y^2 * z + 6 * x * y * z^2)) - R;
s += abs(error)^2;
}
return sqrt(s / K);
}
//Define the main function F
real f(real x, real y, real z)
{
return 3 * x^2 * y^2 + 3 * x^2 * z^2 + 3 * y^2 * z^2 + 6 * x^2 * y * z + 6 * x * y^2 * z + 6 * x * y * z^2 - (x^2 + y^2 + x * y) * (x^2 + z^2 + x * z) * (y^2 + z^2 + y * z) / R^2;
}
//Define the partial derivatives
real Fx(real x, real y, real z)
{
return 6 * x * y^2 + 6 * x * z^2 + 12 * x * y * z + 6 * y^2 * z + 6 * y * z^2 - (2 * x + y) * (x^2 + z^2 + x * z) * (y^2 + z^2 + y * z) / R^2 - (x^2 + y^2 + x * y) * (2 * x + z) * (y^2 + z^2 + y * z) / R^2;
}
real Fy(real x, real y, real z)
{
return Fx(y, x, z);
}
real Fz(real x, real y, real z)
{
return Fx(z, x, y);
}
// Initialize variables
real e = G(R / 2, R / 2, R / 2, a, b, c);
real ra = R / 2;
real rb = R / 2;
real rc = R / 2;
int maxIterations = 50;
real tolerance = 0.0001; // Tolerance for termination condition
int iteration = 0;
// Perform the iterations
while (iteration < maxIterations && e > tolerance)
{
iteration = iteration + 1;
// Construct the matrices U and V
real[][] U, V;
for (int k = 0; k < K; ++k)
{
real x = ra + a[k];
real y = rb + b[k];
real z = rc + c[k];
U[k] = new real[] {Fx(x, y, z), Fy(x, y, z), Fz(x, y, z)};
V[k] = new real[] {-f(x, y, z)};
U[k + K] = new real[] {abs(f(x, y, z)) / R, 0, 0};
V[k + K] = new real[] {0};
U[k + 2 * K] = new real[] {0, abs(f(x, y, z)) / R, 0};
V[k + 2 * K] = new real[] {0};
U[k + 3 * K] = new real[] {0, 0, abs(f(x, y, z)) / R};
V[k + 3 * K] = new real[] {0};
}
// Calculate the least squares solution
V = inverse(transpose(U) * U) * transpose(U) * V;
real rra = ra;
real rrb = rb;
real rrc = rc;
// Update ra, rb, rc
ra += V[0][0];
rb += V[1][0];
rc += V[2][0];
// Check for negative values and update if necessary
if (ra < 0) ra = rra;
if (rb < 0) rb = rrb;
if (rc < 0) rc = rrc;
// Update the minimum value if the new error is lower
if (G(ra, rb, rc, a, b, c) < e && ra > 0 && rb > 0 && rc > 0)
{
e = G(ra, rb, rc, a, b, c);
}
}
write("iterations completed= ",iteration);
write("ra = ", ra);
write("rb = ", rb);
write("rc = ", rc);
write("error = ", e);
Results:
After executing the script we get the following results:
We found the solution in only 6 iterations and the mean error of the calculated radius is $1.647\times10^{-8}$ mm.



OK, time to start typing. As I said, it is a long story, so I'll have to post in parts. I'll start with clearly stating what (IMHO) are the main objectives here. One has to design an algorithm/theory that, given the (possibly somewhat erratic) set of measurements $a_k,b_k,c_k$ for $k=1,\dots,K$ of the muster part of the known radius $R$ (I prefer to deal with the radius rather than with the diameter),
a) produces the values of $r_a, r_b, r_c$ that allow to explain all the given data within the measurement error whatever that error is (it may be unknown)
b) allows to predict with reasonable certainty what will be the device calibration error compared to the standard measurement error in the data, so one could just by looking at the data tell whether the calibration with these data will be successful or you'd better take new (additional) measurements before computing.
c) is easy to communicate as a set of simple formulas, recipes, and code pieces even if the derivations are omitted.
So, what I did was to write a test program that generates random data and couple it with a universal algorithm that would try to find the solutions for that data and then compares those solutions with the truth. It took me about 5 days to find the universal algorithm that works on every reasonable (again, IMHO) data set, so I designed something that satisfied me completely only this morning. Please, accept my apologies for the delay.
You are, probably, most eager to see the algorithm first. Warning: you'll need to code yourself to implement it and to play with it, so forget all about CAS and "solve()" or "findroot()" commands. We'll do things from scratch in the old-fashioned way.
The algorithm
Following Joshua's formula, define the functions
(the language is Asymptote; if you know C or something similar, it should be reasonably easy to follow). num(), den(), F() should be clear, RAD() is the formula for R in Jeshua's variables and G() computes the mean quadratic error of the equations for $R$ given the guess $r_a,r_b,r_c$.
We shall also need the partial derivatives
(I used the symmetry of $F$ to avoid retyping the long expressions). Now our task is just to minimize $G$. We will run 50 tries and choose the best result.
Let us store the best achieved guess and the corresponding value somewhere and initialize it to something:
On each try, we may take the initial guess at random in the cube $[0,2R]^3$, i.e., write
There is a better choice but to discuss it now would be quite a digression, so I'll postpone it until later.
Now we do the following iteration 50 times.
We set up an overdetermined system of linear equations for the increments $dr_a, dr_b, dr_c$ with 4 equations for each $k$ (so we'll have $4K$ equations with $3$ unknowns).
For each $k$, we compute the corresponding $x_k=r_a+a_k, y_k=r_b+b_k, z_k=r_c+c_k$ and set the 4 equations $$ F_x(x_k,y_k,z_k) dr_a+F_y(x_k,y_k,z_k) dr_b+ F_z(x_k,y_k,z_k) dr_c=-F(x_k,y_k,z_k) \\ (F(x_k,y_k,z_k)/R) dr_a=0 \\ (F(x_k,y_k,z_k)/R) dr_b=0 \\ (F(x_k,y_k,z_k)/R) dr_c=0 $$ Then we find the least square solution $dr_a, dr_b, dr_c$ of this overdetermined system and replace $r_a,r_b,r_c$ by $r_a+dr_a, r_b+dr_b, r_c+dr_c$ except that if some replacement results in a negative value, we just don't do it. Now compute $G$ of the new set and if it is better that the minimum observed so far, update the suspected minimizer and the minimal value.
The corresponding piece of code is below (Asymprote allows one an easy handling of matrix algebra, so I chose it for this reason plus we'll need its graphing capabilities to discuss analytics).
Remember that one needs to run this piece 50 times (in my simulations the largest necessary number to reach the precision of the true data was 16, but let's give it the standard engineering 3-fold margin).
That's the full "universal algorithm". What I claim is that for reasonably diverse measurements (we'll discuss what it means below), it will achieve the full explanations of the given measurements. However, a full explanation is not yet the truth and it is its predictive value we are after. But that calls for a separate discussion that I'll post later. For now, I'll just say that my simulations show that just $K=3$ is seldom sufficient (to the extent that multiple exact solutions are possible), so one needs at least 4 calibration measurements and, based on the error analysis, I would really strongly recommend 8 to 10, though one can often get away with 5 to 7.
The stability analysis
Now it is time to discuss the second question. Before I go into it, I'd like to make two important general points about experimental science in general:
There is no way to derive the truth from observations. All we can possibly hope for is an explanation of the observed data/phenomena that agrees with all observations within an observation error.
We do not need the truth. All that we really need is an explanation with high predictive power.
Note that there may be an explanation that agrees with all available observations but is still very poor when it comes to making predictions. A lot of such explanations were developed using very sophisticated mathematical models for explaining the COVID dinamics and most modern and ancient ideolodical ideas seem to fall into the same category. I'll abstain from going into the general discussion of what it might mean for the world we live in here but we need to understand clearly what it means for our problem nevertheless.
The truth for us is the actual triple $(R_a, R_b, R_c)$. Suppose that the typical error measurement of $(a,b,c)$ in your device is of size $\varepsilon$. Then the first principle tells you that when testing a solver, you cannot and shouldn't judge the performance of any solver by the actual difference between the values $r_a,r_b,r_c$ it outputs and the corresponding $R's$, i.e. by the quantity $|Ra-r_a|+|R_b-r_b|+|R_c-r_c|$ or something similar. All you can and should judge it by is the set of the differences $|RAD(r_a+a_k,r_b+b_k,r_c+c_k)-R|$ on your data $(a_k,b_k,c_k)$ and if all these values (almost) always do not exceed $\varepsilon$, you will have to declare that the solver did a good job no matter how far the actual values it output are far from the truth. Indeed, all the measurements have been explained within the measurement error, so there is no way one can disprove from these observations alone that the output is the truth. In this respect the solver I proposed is nearly perfect and becomes even more perfect with a few extra twists.
The second principle means that we do not actually care how much we deviate from the truth $(R_a,R_b,R_c)$. All we care about is the predictive power of our explanation, i.e., the maximum of possible differences $|RAD(r_a+a,r_b+b,r_c+c)-RAD(R_a+a,R_b+b,R_c+c)|$ over all plausible future measurements $(a,b,c)$. If that is of order $\varepsilon$ too, then we should be completely happy even if our deviation from the truth is much larger than $\varepsilon$ itself.
One should also understand that some sets of observed measurements are better than other. For instance, if we have the set in which all measurements were taken from essentially the same position, they would be easy to explain together but the predictive power of such explanation will be most likely pretty low for other positions. In particular, if you do the measurements from the concentric positions of the measuring tool and the master ring, you'll just have one set of $a,b,c$ no matter how you rotate and it can be easily explained by many off-center positions as well. So one obvious practical advice is that if you have any control over placing your measuring tool when calibrating at all, you should aim for placing it as far from the center of the master ring as you possibly can. We'll see below that when using the calibrating tool for measurement of unknown radii, you should do exactly the opposite and place it as close to the concentric position as you can, if you can control it at all.
Now let us do the formal analysis keeping these points in mind.
The true Joshua's system of equations is not algebraicly solvable, so we will have to approximate the equations to carry out our analysis. Assuming that we are not too far from the concentric position, we can just try to approximate the equation $F(x,y,z)=0$ by its second order Taylor expansion near the point $x=y=z=R$. Note that normally the stability analysis is carried out by just using the linear approximation but our issue is that the function is symmetric, so its linear part at the diagonal must be symmetric as well and, thereby, is just proportional to $(x-R)+(y-R)+(z-R)$, so the differentiial is very degenerate and the stability of the solution, if it is there at all, must come from the quadratic or higher order terms.''
The second order approximation of the equation $F(x,y,z)=0$ is $$ \frac{(x-R)+(y-R)+(z-R)}3+\frac{(x-y)^2+(y-z)^2+(z-x)^2}{18R}=0; $$ The second term actually has a geometric meaning: it is approximately $\frac{\lambda^2}{4R}$ where $\lambda$ is the distance between the centers of the measuring tool and the master ring. I'll skip the derivations for now (but can provide them upon request)
Plugging in $x=r_a+a, y=r_b+b, z=r_c+c$ and putting all the known quantities of the right, we get the equation $$ \left[\frac{r_a+r_b+r_c}3+\frac{(r_a-r_b)^2+((r_b-r_c)^2+(r_c-r_a)^2)}{18R}\right] \\ +\frac{a-b}{9R}(r_a-r_b)+\frac{b-c}{9R}(r_b-r_c)+\frac{c-a}{9R}(r_c-r_a)= \\ R-\frac{a+b+c}3-\frac{(a-b)^2+(b-c)^2+(c-a)^2}{18R} $$ We have $K$ such equations for $a=a_k, b=b_k, c=c_k$. Note that unlike the original system coming from $F(x,y,z)=0$, this new system is explicitly solvable: if we denote the quantity in the brackets by $U$ and introduce $r_{ab}=r_a-r_b$, etc., the LHS will be just $$ U+\frac{a-b}{9R}r_{ab}+\frac{b-c}{9R} r_{bc}+\frac{c-a}{9R}r_{ca}\,. $$ Adding to this system the obvious equation $r_{ab}+r_{bc}+r_{ca}=0$, we get a linear system with four unknowns $U,r_{ab}, r_{bc}, r_{ca}$, which can be readily solved by the least square method even if it is overdetermined. Once we know these 4 variables, it is easy to recover the sum $r_a+r_b+r_c$ from $U$ and then $r_a,r_b,r_c$ themselves.
The stability of the solution to this system with respect to the perturbations of $a_k$, $b_k$, $c_k$ can be readily analyzed and one can hope that the stability properties of the actual system near the true solution are not too different. Let $A$ be the matrix of the system with $K$ rows $1, \frac{a_k-b_k}{9R},\frac{b_k-c_k}{9R},\frac{c_k-a_k}{9R}$ and one row $10,1,1,1$ for the last trivial equation (actually, since it should hold exactly, for the least square method one should rather put some big constant instead of $1$ into the last row). Let also $B$ be the column given by the RHS of the above system augmented by $0$ in the bottom position. The whole system then just becomes $AS=B$ where $S$ is the column with our $4$ unknowns and the least square solution becomes $S=(A^TA)^{-1}A^TB$ and it minimizes the norm $\|AS-B\|$. Note that from the latter fact it is easy to see that the $r_{ab},r_{bc},r_{ca}$ part of the solution remains the same if we subtract from $B$ any multiple of the column with $K$ ones and $1$ zero because this subtraction can be perfectly compensated by changing the value of $U$. Thus, to analyze the change in the pairwise differences, we can use the RHS $B'$ instead of $B$, where $B'$ is obtained from $B$ by subtracting the average value of its first $K$ entries from each of them, which makes the norm of $B'$ as small as it can be.
Note that introducing the errors into $a_k, b_k,c_k$ changes both $A$ and $B$. Let $E$ be the change in the matrix $A$ and $e$ be the change in the vector $B$. These hanges are with entries of order $\varepsilon/R$ and $\varepsilon$ respectively where $\varepsilon$ is the measurement error of the device.
Now, linearizing, we see that the change in the pairwide difference part of $S$ is about $$ -(A^TA)^{-1}(E^TA+A^TE)(A^TA)^{-1}B'+(A^TA)^{-1}A^Te $$ Thus the relevant quantities are $$ C_1=\|(A^TA)^{-1}A^T\| \text{ and }C_2=C_1^2\|B'\|/R $$ giving the typical error in the pairwise differences of order $\varepsilon \sqrt(C^2+C_1^2)=C\varepsilon$ (ignoring the absolute numeric coefficients that can be easily derived analytically or just discerned from the simulations).
The quantity $C_1$ can be also expressed as $\sqrt{(\|A^TA)^{-1}\|}$, which has the advantage of being a norm of a $4\times 4$ symmetric matrix, so its rough computation is trivial. Thus, the constant $C$ can be easily determined and, most importantly, its determination can be made from the observed data alone. It essentially measures the deviation from the truth. The error in the sum $r_a+r_b+r_c$ is smaller and, unless we have a really pathological scenario, is determined by the linear approximation, which is robust and doesn't depend on the possible singularity of $A$ too much (I'll produce some graphs and diagrams later to show what I mean). But we remember that we should never be interested in the truth itself (unless we want to get engaged into meaningless arguments and gruel pointless fights, which seem to go all over the planet all the time, presumably in the name of that very truth), only in the predictive power of our explanations. So our next task will be to convert the estimate of the deviation from the truth we derived into the estimate of the predictive power of our calibration. We will see something interesting there too, but not today.
*The predictive power (the evaluation of the quality of the calibration from the data)"
Apologies for the phylosophical digression, but I found it necessary to explain what we should be after here. I noticed that you were amazed that Mathematica could find a solution from just 3 observations (and, most likely, with machine precision regardless of the measurement errors). That is not really something to be excited about. First , my homeemade contraption is perfectly capable of this too. And second, more importantly, what you really want it to do based on the data you feed in is to give you an explicit bound on the predictive power of its explanation, i.e., to tell you what error you should expect in the real future measurements after your calibration is done and what is the actual (not declared) accuracy of the measuring tool. Note that 3 observations will never allow you to answer even the latter question because the system of 3 equations with 3 unknowns has an exact solution more often than not. So you would like to increase the number of observations just for this purpose, if nothing else. Now we are going to address the more interesting first question.
The underlying idea is simple. We will deal with the same approximation as above $$ R\approx \frac{x+y+z}3+\frac{(x-y)^2+(y-z)^2+(z-x)^2}{18R} $$ Ignoring for the moment that $R$ on the RHS is also unknown, we can say that the calibration error will influence this formula by first the error in the average, which will be the error in $r_a+r_b+r_c$ and second by the error in the quadratic term, which will be the scalar product of the vector $(x-y,y-z,z-x)$ and the deviation from the truth in the vector $(r_{ab},r_{bc},r_{ca})$. We have already estimated the norm of the latter vector by $C\varepsilon$ where $\varepsilon$ is the actual accuracy of the machine (I do not presume it known) and $C$ is the above constant that we can compute from the data alone. The norm of the first vector is essentially the deviation $\lambda$ of the center of the measuring tool from the center of the master ring, so, bringing all that stuff together, we see that the expected error in the second term is proportional with some numeric coefficient to $C\Lambda \varepsilon$ where $\Lambda=\lambda/R$ is the relative (to the radius of the master ring) deviation from the center. For the first term, we may expect that its error is about the same as if the approximation were just linear, in which case we can invoke the independence of errors and additivity of variance to conclude that it should be of order $\frac 1{\sqrt K}$. There is also a question of numeric coefficients, but they depend on what exactly we are going to estimate, so let us discuss that now. Note at any rate that the big error in the pairwise difference vector (compared to $\varepsilon$) can be compensated by small $\Lambda$. So, if you intend to measure just the rings slightly exceeding the measuring tool itself, you shouldn't be afraid of it too much. The measurement will come out nearly correct despite $r_a, r_b, r_c$ will be rather wrong individually. However, nothing can compensate the error in the sum.
What I did for testing was to take the (simulated) calibrated tool, was to place it at given $\Lambda$ into a random larger ring of comparable radius and take the maximal error in the determination of that radius over all rotations in that fixed position (i.e., for every rotation I computed true $a,b,c$ from $R_a,R_b,R_c$ and hten used these values to evaluate the radius from them and calibration values $r_a,r_b,r_c$. Let's call this maximal error $E(\Lambda)$ (technically it also depends on the radius, but funnily enough, this dependence is nearly zero). My predictor $P(\Lambda)$ for this error will be just a weighted sum of $\frac 1{sqrt K}$ and $C\Lambda$ times $\varepsilon$. The sweet coefficients (assuming I want $P$ to be an almost guaranteed upper bound) are $$ P(\lambda)=[\sqrt{\frac 4{3K}}+\frac C4\Lambda]\varepsilon\,. $$
It remains to run a few thousand simulations varying the calibration setups to see if the predictor is efficient. What I did was to create a few dot diagrams plotting the points $E(\Lambda)/P(\lambda),P(\Lambda)/\varepsilon$ choosing the same amount of points (500) for each interval between two integers for the value of $P$ (technically I just ran completely random calibrations and placed the results on the dot diagram until intervals got saturated; note that in the dot diagram it is important to have the same number of points for each interval for the value of the predictor to make the distributions easy to compare). Below are such diagrams for several values of $K$ (the value of $P$ is vertical in the range from $0$ to $50$ and $E$ is horizontal). The green line on the left shows which intervals are, indeed, saturated after 20000 runs, so you can compare the corresponding boxed visually. The grid is $1$ by $1$. In addition to each dot dyagram, I I drew the hystogram of $E(\Lambda)/P(\lambda)$ for $5<P(\Lambda)<6$. Here is a typical pair of diagrams for $K=4$:
You can see that, indeed, there are a lot of points to the left of $1$ ($E=P$) but not too many to the right of it for every value of $P$ for which we got the required number of points in the corresponding horizontal box at all, which means that our predictor has some reasonable predicting power.
Of course, the good calibration is a calibration with reasonably low $C$, so that the calibrating error does not exceed the true measuring tool error, i.e., the machine is calibrated to its capability. Note that agreeing the values on the observations within the required precision is not enough. If $C$ is large, the calibration will still be most likely erratic and you'll not be able to use the tool unless $\Lambda$ is guaranteed to be small. Note also that $C$ is computable from the data alone, so you can decide whether your results are satisfactory or not. For the decision including the whole range of $\Lambda$ up to $0$, we will also need a predictor of the error in the sum, but let's play with this simple one-parameter predictor first.
The value of $C$ is determined by the luck and the typical relative distance to the center $\Lambda_c$ during the calibration. The general rule is that $C$ is inversely proportional to $\Lambda_c$ and also diminishes with $K$ like $1/\sqrt K$, but also its distribution (assuming $K$ random measurements with typical relative deviation from the center $\Lambda_c$ ) gets more concentrated with $K$.
Here are the distributions of $C\Lambda_c\sqrt(K)/4$ for $K=4$ and $K=12$ and $\Lambda_c=0.05$ and $\Lambda_c=0.5$ (truncated at 6). Can you guess which is which?
*An intermission"
Before going into the next topic, I'd like to do two things. The first one is to give you one more piece of code. You can just copy the whole module to an asy file (my file name for it is aboreappsolve.asy, so if you want us to easily share some code modules without renaming in the Asymptote import commands, I suggest you use the same name when saving the file).
This module just solves the approximate system we have discussed above and produces a few parameters that will be needed for constructing the final error predictor. The approximate solution is the .sol field of the AS (Almost Solution) structure. You can (and, IMHO, should) also try to use it as the initial approximation in the main solver algorithm as the first try. Just write
triple T=APPSOLVE(a,b,c,R).sol; ra=T.x; rb=T.y, rc=T.c;
in the main code. (and import aboreappsolve; in the beginning of it, of course).
In the vast majority of cases, it is good enough to finish the story off in 4-10 iterations on the very first try after that but not always, so do not discard the random tries entirely yet. BTW, I found it more convenient and making more sense for the testing to stop the iteration loop in a single try not when the required accuracy is reached but just when it stops moving, i.e., when the increments are comparable to the machine precision. But we'll discuss the testing techniques later.
The $C$ we were talking about is the .sig field of the AS structure. The other two fields real[] Aa and real[][] A are going to be used to get a refined predictor for the error in the sum $r_a+r_b+r_c$. That is really interesting only if we want to deal with really small $\Lambda$ after the calibration with decent value of $C$. Otherwise the $C$-term normally dominates that extra correction anyway.
The second thing I want is to get some feedback from you onwhat I have said already. If everything is clear, just say "All clear". But if you feel lost or overwhelmed already, trying to push more mathematics and programming down your throat before the previous parts are fully digested will be just a waste of time for both of us. So, please react! I hate talking into the empty space. I wish we were sitting in the same room for this discussion, but, unless you would be excited to make a trip to the middle of nowhere in rural Ohio, this is, probably, not an option.
Now, the code module, as promised:
aboreappsolve.asy