What considerations should I have into account when linearizing a non-linear model for linear regressions?

31 Views Asked by At

I'm looking for some bibliography about what I should/must have into account when I have a model and experimental data that can be expressed in a way such that I can use a linear regression method to fit and determine experimental parameters. I mean, what are the conditions I have to check between the variables or anything before performing the regression.

My case, for contextualization on what I'm looking for: I have this model corresponding to magnification $m$ using the convention of the newtonian form of thin-lens equation

$$m=\frac{f}{x_o}$$

($f$ is focal length and $x_o$ is distance from the primary focal point of the lens to the object) This can be expressed, using Gauss's law convention, as

$$m=\frac{f}{s_o-f}$$

($s_o$ is the distance from lens to object, object distance) Which can be turned into the linearizable form

$$s_o m = f + fm$$

I have data of $s_o$ and $m$ and I want to determine the focal length $f$, a linear regression model $Y=mX + b$, where $Y=s_om$ and $X=m$ could be used to achieve that, am I right?, but here is my question, what should I check before that? Since this is experimental data, I have uncertainty, and the relative uncertainty of the size of the object increases as $m$ decreases (this is, as $s_o$ increases above $2f$, the case in my experiment) because the object in the observation screen gets smaller while the measurement scale stays the same. Now, $f$ have to be determined with its uncertainty, using error propagation formulas, and I don't know also if this specific situation with uncertainty has some to do with the way I use the linear regression model, and this is kind of stuff I want to have clear.

Additionally, is it equivalent to use $ms_o = f+fm$ or $ms_o = f(1+m)$ (with $X=1+m$)? Thanks for your help.

1

There are 1 best solutions below

0
On

Heuristically, there is a simpler approach. We have

$$f = {s_o m \over m+1}.$$

Since you say you have data for $s_o,m$, for each observed value of $s_o,m$, compute the inferred value of $f$, and then aggregate those values in some way (e.g., compute the average or median). This is not necessarily optimal, in terms of obtaining the most information possible from limited observations, but it might be good enough for your setting, if you don't have a lot of noise.

If you want the optimal method, the exact optimal inference procedure will depend on your noise model. You haven't specified a noise model, but it sounds like you are inching towards one. Perhaps your noise model is

$$\begin{align*} \overline{s}_o &= s_o + \mathcal{N}(0,\sigma_s^2)\\ \overline{m} &= m(1 + \mathcal{N}(0,\sigma_m^2)) \end{align*}$$

where $\overline{m}$ is the observation, $m$ is the true value of $m$, and $\mathcal{N}(\cdot,\cdot)$ represents a Gaussian distribution with the listed mean and variance. Of course, $f$ is related to $s_o,m$ by the equations given above.

If you have a specific noise model, such as the one above, and priors for $m,s_o$, then you can infer $f$ using maximum likelihood decoding. Specifically, define the likelihood $L(f)$ by

$$L(f) = p(f | \overline{m},\overline{s}_o),$$

i.e., it is the likelihood of a possible value of $f$ given the observations $\overline{m},\overline{s}_o$. Your noise model should enable you to write an explicit expression for $L$. Then, use an off-the-shelf optimization tool to find $f$ that maximizes $L(f)$. In some cases, the expression for $L(f)$ might turn out to be so simple that maximum likelihood inference can be done by computing an average of some values, but this will depend on the specific noise model that is appropriate in your setting. It is not obvious to me that linear regression will be optimal.