Approximate ar function so that it becomes linear in parameters (without Taylor)

108 Views Asked by At

TL;DR (original question):

I am looking for a function that has roughly the form of

$$f(x) = \exp\left(−(x/a)^2\right) − \log\left((x/a)^2+1\right)$$

but is linear in its model parameter (here $a$) so that I can use linear least squares to determine the parameter. Is there such a function?


EDIT (more explanations):

It seems there are a couple of misunderstandings:

Background

I do not ask for for "linear regression" (fitting a line through points). Instead I want to do a non-linear regression with the function $f$ (fitting $f$ through points). In order to do a regression, one needs to perform a mathematical optimization - either with a numerical iterative method (e.g. Newton's method) or (if available) with an analytical closed form formula.

Most of the time, regression can be done by formulating a least-squares optimization cost function

$$ J(\Theta) = \sum_i \left( y_i - f(x_i | \Theta) \right)^2 $$ where \Theta is the collection of all optimization variables. Here I only have $a$. The optimization solves the problem $$ \min_{\Theta} J(\Theta) $$

If this cost function is linear in the optimization variables, it can be solved with a closed form formula. If not, afaik, numerical iterative optimization is required. That is what I want to avoid due to the low speed.

For that I see two options of approximation:

  1. reformulate the problem
  2. reformulate $f$

reformulate the problem

Let's see an example for the first case and choose Logistic Regression, which has the original (not approximated) problem formulation

\begin{align} f(x) &= \frac{1}{1 + exp(-\Theta\cdot x)} \\ J(\Theta) &= \sum_i \left( y_i - f(x_i | \Theta) \right)^2 = \sum_i \left( y_i - \frac{1}{1 + exp(-\Theta\cdot x_i)} \right)^2\\ \min_{\Theta} & \Big( J(\Theta) \Big) \end{align}

This can only be solved using iterative optimization, but it can be solved in closed form after reformulating it to

\begin{align} log\left( \frac{1}{f(x)} - 1 \right) &= -\Theta\cdot x \\ J(\Theta) &= \sum_i \left( log\left( \frac{1}{y_i} - 1 \right) + \Theta\cdot _i \right)^2\\ \min_{\Theta} & \Big( J(\Theta) \Big) \end{align}

So in other words we have the linear equation system, where one of the rows of this system is

$$ log\left( \frac{1}{y_i} - 1 \right) = \Theta\cdot x_i $$

This system can be solved in closed form and which approximately solves the original problem. (I am not being super precise here, but I hope it is good enough.)

reformulate $f$

An obvious solution is to approximate $f$ using Taylor, but I am hoping for something better. One reason is that Taylor for multivariate functions is a real pain.

Thus I am asking for an alternative function $g(x)\approx f(x)$ which is linear in the function parameters. An If there is a way to write $f$ such that linear least-squares can be performed (and thus the closed form)

Back to the question and why I asked it.

Honestly, I have my doubt that the approach I called "reformulate the problem" will be possible here, but this would be preferred. If someone has an idea, I would be very happy.

That is why I was asking for an approach of the type "reformulate $f$".

The answer that Claude Leibovici provides is neither of those approaches exactly and it does not eliminate the need for iterative optimization. That being said, it is kind of close to the approach of "reformulate the problem", since it does indeed reformulate the problem, albeit not so that it becomes linear in the optimization variables. However, it is still an improvement speed wise, because the function is nearly linear. We still need iterative optimization, but it will converge much faster than using the original problem. After optimizing the reformulated problem, one might use the result as the initial starting point for optimizing the original problem, thus fine tuning the results.

What do I still want to know.

Well, the original question is still completely open...

2

There are 2 best solutions below

5
On

$$y(x)=\exp\left(−(x/a)^2\right) − \log\left((x/a)^2+1\right)$$ Let $t=\frac{x}{a}$ $$y(t)=\exp\left(−t^2\right) − \log\left(t^2+1\right)$$ Consider the inverse function $\quad t(y)$ $$t(y)=\frac{x(y)}{a}\quad\implies\quad x(y)=a\:t(y)$$ This is a linear function wrt $a$. You can use linear least squares to estimate the parameter $a$.

More concretely :

With the data $(x_1,y_1)\:,\:(x_2,y_2)\:,\:...\:,\:(x_k,y_k)\:,\:...\:,\:(x_n,y_n)$

For each $(x_k,y_k)$ compute the root $t_k$ of the equation :

$$\exp\left(−(t_k)^2\right) − \log\left((t_k)^2+1\right)-y_k=0$$

You obtain a new data : $(t_1,x_1)\:,\:(t_2,x_2)\:,\:...\:,\:(t_k,x_k)\:,\:...\:,\:(t_n,x_n)$

Proceed to the linear regeression for $a$ : $$x_k=t_k\:a+\epsilon_k$$

NUMERICAL EXAMPLE : enter image description here

4
On

To continue from @JJacquelin's answer you need to solve for $t$ the equation $$y=\exp\left(−t^2\right) − \log\left(t^2+1\right)$$ Let $t^2=z$ and the problem is to solve for $z$ $$y=\exp\left(−z\right) − \log\left(z+1\right)$$ Now, assuming $y>0$, consider that you look for the zero of function $$f(z)=z+\log (y+\log (z+1))$$ which is close to linearity (this is very good for Newton method. When you have solved the equation for $y_n$, use the result as the starting guess for the next point (assuming that you did sort the point by $y$).

The iterates will be given by $$z_{n+1}=z_n-\frac{\log (y+\log (z_n+1))+z_n}{1+\frac{1}{(z_n+1) (y+\log (z_n+1))}}$$

If I may suggest, this procedure giving $z_n$, then $t_n$ and the linear regression $a$, I recommend that you polish the solution using nonlinear regression since what is measured is $y$ and not any transform.