Question
Assume we have a function $f(x, \Theta) \in \mathbb R$, where $x\in\mathbb R^n$ is the function input, $\theta\in\mathbb R^p$ is the collection of the function parameters $\theta_i \in \mathbb R$ and $p,n\in\mathbb N$. I want a approximation of this function, such that it is linear in its parameters $\theta_i$. How can I do this?
I was thinking using an expansion, but this seems to fail for me, since I do not have an expansion point available (see section Motivation), so I am happy with another method.
Motivation
I want to be able to fit a function based on a linear least-squares cost function using closed form $$ \hat {\Theta}=(X^{T} \cdot X)^{-1} X^{T} \cdot y $$ where $X$ is the design matrix (not necessarily directly $x$ from $f(x)$, but $x$ transformed without using function parameters, e.g. $exp(x)$ or $x^2$). Please assume that using iterative optimization method (e.g. Netwon's method) is not an option, otherwise this would change this question entirely in the first place.
For the curious regarding the computer science aspects: For the linear least-squares I have between 500 and 60000 multivariate data points with dimensionality between 2 and 64 - it is very case depended. I have to run between 50.000 and 1 million of those linear-least-squares (maybe later even more.)
What I have tried so far (an example): Polynomial approximation
(I am not interested in the Gaussian, this is just an example.)
For example, Gaussian normal distribution $$ a_g \cdot \exp(-0.5 \cdot ((x-\mu)/\sigma)^2) + b_g $$ is linear in $a,b$, but not $\mu,\sigma$, where $\Theta = (a_g,\mu,\sigma,b_g)$, while the quadratic function $$ -a_q \cdot (x-\mu)^2 + b_q $$ is linear in all it parameters $\Theta = (a_q,\mu,b_q)$ and it somewhat approximates the Gaussian close to $\mu$, but not at all otherwise.
This gave me the idea to use Taylor series expansion (omitting subscript) \begin{align} &a \cdot \exp(-0.5 \cdot (x-\mu)^2/\sigma^2) + b \\ =\ & a \cdot \exp(-0.5 \cdot (x_0-\mu)^2/\sigma^2) + b \\ -\ & a \cdot \exp(-0.5 \cdot (x_0-\mu)^2/\sigma^2) \cdot (x_0-\mu)/\sigma^2 \cdot (x-x_0) \end{align} which obviously does not produce a function that is linear in the parameters.
So I had the idea to use Taylor expansion on the parameters $\mu,\sigma$ instead, but I would not know at which point $\Theta_0$
\begin{align} &a \cdot \Big( \exp(-0.5 \cdot (x-\mu)^2/\sigma^2) \Big) + b \\ =\ a \cdot \Big( & \exp(-0.5 \cdot (x-\mu_0)^2/\sigma_0^2) \\ & + \exp(-0.5 \cdot (x-\mu_0)^2/\sigma_0^2) \cdot (x-\mu_0)/\sigma_0^2 \cdot (\mu-\mu_0) \\ & + \exp(-0.5 \cdot (x-\mu_0)^2/\sigma_0^2) \cdot (x-\mu_0)^2/\sigma_0^3 \cdot (\sigma-\sigma_0) \Big) + b \end{align}
However, I need to have a good guess for $\Theta_0$.
What I have tried so far (an example): Non-polynomial approximation
Another option to approximate $$ a_g \cdot \exp(-0.5 \cdot ((x-\mu)/\sigma)^2) + b_g $$ would be to use another Gaussian $$ a_g \cdot \exp(-0.5 \cdot x^2) + b_g $$ which would approximate the original Gaussian well, if the original values of $\mu,\sigma$ are close to $0,1$, respectively.
That gave me the idea to use an approximation that is, unlike Taylor expansion, not restricted to polynomial terms (and thus a relaxation of constraints), but which must be linear in its model parameters everywhere (thus adding constraints), unlike Taylor, which needs to be cut off after the first-derivative-terms.
Maybe such an approximation might not require a $\Theta_0$?
In case the question cannot be answered in general...
... (which would bring the greatest advantage to the community though):
I am currently mostly interested in the function $$ f(x, \Theta) = a \cdot \exp(-0.5 \cdot ((x-\mu)/\sigma)^2) − \log\left(((x-\mu)/\sigma)^2 + 1\right) + b $$ where $\Theta = (a_q,\mu,\sigma,b_q)$.