Question on optimization algorithm to train peculiar regression

32 Views Asked by At

I've been in my operations research course, and we have been working on optimization in particular problems within regression. We hypothesize that for variables $h,s,d,t,$ there is this set relationship between them: $$h = \beta_s e^{s^2} + \beta_d d\sin(-\beta_t t) + \beta_0 + \epsilon,$$ where $\epsilon \sim Normal(0,\sigma^2)$. I am supposed to figure out a reasonable solution for finding our $\beta$'s given our training sample set $\{<h_n,s_n,d_n,t_n>\}$. My professor has recommended that we use a squared-error loss function, i.e. $$L(\mathbf{h},\mathbf{\hat{h}}) = \sum_{i=1}^n (h_i - \hat{h}_i)^2.$$ Given this, I would have liked to have had a closed form for directly calculating the $\beta$'s (like there is in standard linear regression), but given the nonlinearities present in this model, I have not been able to find a simple closed-form solution for cases like this. Hence, I have decided to find some algorithm to calculate this, but I have been told that standard gradient descent on the loss function is not going to help me in this non-linear situation due to the high likelihood of reaching a local minimum (rather than a global minimum). I would ask, is there another method within operations research that will be useful for training an extremely non-linear regression like this? As a reference, the gradient of this function will be this: $$\frac{\partial L}{\partial \beta_0} = \sum_{i=1}^n 2(h_i - \hat{h}_i) \times (-1)$$ $$\frac{\partial L}{\partial \beta_s}= \sum_{i=1}^n 2(h_i-\hat{h}_i) \times (-e^{s_i^2})$$ $$\frac{\partial L}{\partial \beta_d}=\sum_{i=1}^n 2(h_i-\hat{h}_i) \times -d_i \sin(-\beta_t t_i)$$ $$\frac{\partial L}{\partial \beta_t} =\sum_{i=1}^n 2(h_i - \hat{h}_i) \times -\beta_d d_i \cos(-\beta_t t_i) \times (-t_i).$$

1

There are 1 best solutions below

0
On

Assuming that you have $n$ data points $(s_i,d_i,t_i,h_i)$ and the model $$h = \beta_s e^{s^2} + \beta_d d\sin(-\beta_t t) + \beta_0 $$ you already notice that the model is nonlinear; closed form formulae are not possible to obtain and only numerical methods could be used. Nonlinear regression being required, you need to start with "reasonable" estimates and this is a problem.

However, you can make the problem simpler noticing that the model is nonlinear just because of parameter $\beta_t$. If you fix it to a given arbitrary value, parameters $\beta_s,\beta_d,\beta_0$ are immediately obtained using multilinear regression( regressors being $e^{s_i^2}$ and $d_i\sin(-\beta_t t_i)$).

The values of these three parameters are implicit functions of $\beta_t$.

Doing it you have

$$L(\mathbf{h},\mathbf{\hat{h}}) = \sum_{i=1}^n (h_i - \hat{h}_i)^2=\Phi(\beta_t)$$ Try a series of values of $\beta_t$ until you see more or less $\Phi(\beta_t)$ going through a minimum. Now, you have your estimates with which you can start the nonlinear regression.

You could even skip the nonlinear regression solving numerically $\Phi'(\beta_t)=0$ that is to say searching graphically the solution of ,say, $$\Phi(1.001\beta_t)-\Phi(0.999\beta_t)=0$$