How to find $y(x,z)$ from the given set of data?

303 Views Asked by At

I have the following set of data:

enter image description here

$x_1,x_2,x_3,\dots,x_m$ are increasing in arithmetical progression.

$y_1,y_2,y_3,\dots,y_n$ are increasing in arithmetical progression.

$z(x_i,y_1),z(x_i,y_2),z(x_i,y_3),\dots,z(x_i,y_n)$ are exponentially increasing for $i=1,2,3,\dots,m$.

$z(x_1,y_i),z(x_2,y_i),z(x_3,y_i),\dots,z(x_m,y_i)$ are exponentially decreasing for $i=1,2,3,\dots,n$.

All values in the table are positive.

The required is to find the value of $y$ in terms of $x$ and $z$, [i.e. $y(x,z)$].

How to find $y(x,z)$ from the given set of data?


Consider the following example:

The data set is:

enter image description here

Say we need to find the value of $y$ when $x=172$ and $z=3.1527$. Clearly, from the table, we can find that $y(172,3.1527)=25.50$.

What if we want to find $y(150,3.1729)$? which is not in the table

The answer is $y(x,z)=x(-1+\ln(z))$, so $y(150,3.1729)=150(-1+\ln(3.1729))=23.1969$

I actually know the expression $\boxed{y(x,z)=x(-1+\ln(z))}$ because I made this example to demonstrate my problem.

What if we do not know that relation, how should we obtain it?


This problem is related to my research in the chemical laboratory, where the values of $x$'s are the speeds of the centrifuge (rpm), the values of $y$'s are the volumes of the samples (mL), and the values of $z$'s are the length in (mm) of the extracted liquid in the pictures that are taken by a camera.


Regardless of my research, and regardless of the units (rpm, mL, mm), can we generalise a method?


Any help would be really appreciated. THANKS!

6

There are 6 best solutions below

2
On

An exponential function is kind of $f(x)=z=a+b·e^{c·x}$

First compute all regresions, one for each row.
For example, the row $y=21.00$ results in $z=2.89657+1.73346\cdot e^{(-0.0134048\cdot x)}$
And the row $y=23.25$ results in $z=2.91707+1.96536\cdot e^{(-0.135084\cdot x)}$

Next calculate the z values for given $x=150$ for all rows. In the example $z(21.00, 150) = 3.12866683809$ and $z(23.25, 150) = 3.17615876206$

Do another exponential regression $f(y_{150})=z_{150}=A+B·e^{C·y}$ with the "y" values of the table and those "150" column "z" calculated values.

And now get $y= \frac{1}{C} ln\frac{(z-A)}{B}$

6
On

HINT.

Formally, is described the model $$z(x,y) = e^{a+bx+cy},$$ or $$\ln z = a+bx+cy,\tag1$$ where $b<0, c>0,$ $$x_i=12+2.25i,\quad y_i= 100 + 24j.$$

The table data model is $$z_{i,j} = e^{\large \frac xy+1}\tag2$$

Table data

Assuming the discrepancy in the form of $$d(a,b,c) = \sum w_{i,j}(\ln z_{i,j} - a - bx_i - Cy_j)^2,\tag3$$ where $w_{ij}$ is the arbitrary matrix of weights,

one can get the point (A,B,C), which provides $\min d(a,b,c)$ in accordance with the described model.

This point is the stationary point of $d(a,b,c).$

So $\operatorname{grad} d(A,B,C) = 0,$ or \begin{cases} \sum w_{i,j}\ (\ln z_{i,j} - A - Bx_i - Cy_j)=0\\ \sum w_{i,j}\ i\ (\ln z_{i,j} - A - Bx_i - Cy_j)=0\\ \sum w_{i,j}\ j\ (\ln z_{i,j} - A - Bx_i - Cy_j) = 0.\tag4 \end{cases}

This leads to the linear system \begin{cases} S_{00}A + S_{10} B + S_{01}C = R_{00}\\ S_{10}A + S_{20} B + S_{11}C = R_{10}\\ S_{01}A + S_{11} B + S_{02}C = R_{01},\tag5 \end{cases}

where $$S_{kl} = \sum w_{ij}x_i^k y_j^l,\quad R_{kl} = \sum w_{ij} x_i^k j^l \ln z_{ij}.\tag6$$

Using weight array $w=1$ gives

w=1

and looks unusable. This situation is happened, because the data table does not correspond to the model.

However, applying of the weight array in the form of $$w_{ij}=e^{-\left(\Large\frac{5(x_i-150)}{24}\right)^2-\left(\Large\frac{2(y_j-23.25)}{2.25}\right)^2}$$

localizes the model and gives

solution w<>1

So the estimation is $$Y = \dfrac{\ln z -A-Bx}C \approx \dfrac{\ln z - 1.13517\ 52307 + 0.00091\ 33448x}{0.00675\ 67568}, \tag7$$

and the result

y(150,3.1729)

looks suitable.

Note that the constants "2" and "5" in the $w$ formula were obtained empirically, with the goal of the good approximation of the table data near of the expected point. If the table data model corresponds with the given model better, then these constants can be decreased or $w=1$ can be used.

0
On

If I had to work such a problem, I should use a bilinear model based on the $p > 4$ surrounding closest data points in the table.

This means that $$\log(z)=a+ b x+c y+d x y$$ This is a simple task to achieve in the least-square sense and, when done, extract $$y=\frac{\log (z)-a-b x}{c+d x}$$

I did not use (on purpose) the fact that the $x$'s and the $y$'s are in arithmetic progressions. But this will help a lot to find the surrounding points in the table.

0
On

When the dynamics of the process that generated the data is known, then a model that has a specific structure can be proposed. This is the case of many processes in mechanics, chemical kinetics, etc. When those dynamics are unknown, the identification process is transformed into a divinatory procedure. There are also the so called black box procedures involving neural nets, etc. In this the case we adopted a smooth model inspired by the data quality.

After trying different algebraically simple models, the best one according to our settings is

$$ z(x,y) =\frac{a_1 x}{x-a_2}+\frac{b_1 y}{x-b_2}+c_1 $$

regarding the furnished data, the parametric values are

$$ \cases{ c_1=2.78608\\ a_1=-0.0596403\\ a_2=58.6578\\ b_1=2.7319\\ b_2=20.0174} $$

Follows a MATHEMATICA script which gives those results

f[X_, Y_] := Exp[a1 X + b1 Y + c1]
f[X_, Y_] := Exp[a1 /(X - a2) + b1/(Y - b2) + c1]
f[X_, Y_] := Exp[a1 /(X - a2) + b1 Y + c1]
f[X_, Y_] := a1 X/(X - a2) + b1 Y/(X - b2) + c1
For[i = 1; error2 = 0, i <= Length[X], i++,
 For[j = 1, j <= Length[Y], j++,
  error2 = error2 + (wrds2[[j, i]] - f[X[[i]], Y[[j]]])^2
  ]
]
sol = NMinimize[error2, {c1, a1, a2, b1, b2},Method -> "DifferentialEvolution"]

Follows the table level surfaces plot and the data points in red and cuts along the $y$ axis on $z(x,y)$

enter image description here

and the adjusted formula

$$ y = \frac{(x-b_2) (x (a_1+c_1-z)-a_2 (z-c_1))}{b_1(x-a_2)} $$

enter image description here

0
On

General Premise

A proper regression analysis of a set of data points (2D,3D,..) shall be done with due regard to the underlying phisical phenomena and statistical considerations that generated the data.

The physical model is the only hint that may suggest a proper relation (mathematical function) among the data.
The statistical and phisical considerations are to determine what "deviations" the data may have from the assumed model.
This fundamentally means to assess which variables are (relatively) "exact" and which prone to error, if the errors may be assumed to be independent, non-systematic, constant-variable variance (homo/hetero- scedastic).

Omitting this step translates into an arbitrary regression.
Take for instance a linear 2D regression: you may have that the $x_k$ are "precise", while the $y_k$ are prone to error, or viceversa.
You may also have errors in both, in which case you should apply Total least squares regression, for instance.
The result in the three cases are usually different.

Your case

Coming to your case, and dealing with it very concisely, you need first to assess the following questions:
a) which of the $3$ variables are "exact"/"erroneous"?
b) may the errors be reasonably assumed to be non systematic, not related and not cross-related ?
c) are the variables hetero/homo-scedastic ?

Now, if the answer to a) is $x,y$ exact and $z$ erroneous, than you shall apply a 2D regression to your data, get $z(x,y)$ according to your presumed model, solve it to extract $y(z,x)$.

Which regression analysis to apply depends on the answer to b). If it is a full yes, then you can apply simple least squares.

The answer to c) is much important in your case, when $z(x,y)$ is assumed to be an exponential function.
In fact, if the relative errors in $z$ may be assumed to be i.i.d., then $\log(z)$ is homo-scedastic and you can apply linear regression on that. Otherwise you cannot reliably do that.

Finally, if instead the answer to a) is that $y$ is erroneous, while $x,z$ are relatively exact, then practically you have just to transform your table into a $y(x,z)$, by listing all the available $z_k$ in the top row, and filling with the corresponding $y(x,z)$.
After that the process is same as the above.

A final note about your closing question

What if we do not know that relation, how should we obtain it?

As told above, only the underlying physical process can suggest a model to adopt, which means a mathematical relation among the observable data (linear, polynomial, exponential, ...) and which ( of the most important) parameters (the unknowns in the regression) are to be included in it. A typical example is the intercept, or some other point which is phisically obvious that the model shall include.
And what about the range of validity of the regression? Only a careful consideration of the process jointly with the purposed use of the model can asses that. A precious help in deciding the model is provided by a scatterplot, but that shall be only and remain just an aid.

0
On

When the two "exponentially increasing", resp., "exponentially decreasing" claims are meant to be exact the whole problem reduces to the simple algebraic problem of determining a small number of parameter values.

The two claims can be written out in the following form: There are functions $a(x)$, $b(y)$, $\lambda(x)$, $\mu(y)$ such that we have identically in $x$ and $y$ the relations $$z(x,y)=e^{a(x)}\>e^{\lambda(x) y},\qquad z(x,y)=e^{b(y)}\>e^{-\mu(y)x}\ ,$$ which implies $$a(x)+\lambda(x) y=b(y)-\mu(y) x\ .\tag{1}$$ Differentiating $(1)$ with respect to $x$ and $y$ gives $$a'(x)+\lambda'(x)y=-\mu(y),\qquad \lambda(x)=b'(y)-\mu'(y) x\ .\tag{2}$$ Plugging $-\mu(y)$ and $\lambda(x)$ from $(2)$ into $(1)$ leads to $$\bigl(a(x)-xa'(x)\bigr)-\bigl(b(y)-y b'(y)\bigr)\equiv\bigl(\lambda'(x)+\mu'(y)\bigr)xy\ .$$ This implies $\lambda'(x)+\mu'(y)\equiv0$, or $$\lambda(x)=\lambda_0+\tau x,\qquad \mu(y)=\mu_0-\tau y$$ for constants $\lambda_0$, $\mu_0$, $\tau$. Furthermore we have $$a(x)-xa'(x)=b(y)-y b'(y)=c$$ for a certain $c$, and this implies $$a(x)=\alpha x+c,\qquad b(x)=\beta x+c$$ for certain $\alpha$ and $\beta$.

It remains to determine the constants $\lambda_0$, $\mu_0$, $\tau$, $c$, $\alpha$, $\beta$ from the given data.