Nonlinear optimization using gradient descent method

58 Views Asked by At

I have been trying to solve a nonlinear optimization problem of the following kind:

$x$ and $A$ are two matrices of size say 320 x 220. A function of $x$ and $A$ is defined as:

$$M(i,j)={f_1\sum_{m=1}^{400} G_mA(i,j)\exp(\frac{K1_m}{x(i,j)})+ f_2\sum_{m=1}^{400} G_mA(i-10,j)\exp(\frac{K2_m}{x(i-10,j)})+f_3\sum_{m=1}^{400} G_mA(i-20,j)\exp(\frac{K3_m}{x(i-20,j)}}\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,$$

Here, $G$, $K1$, $K2$, $K3$ are constant vectors having 400 elements. Also,$f_1,f_2,f_3$ are constants.

The problem is to evaluate the optimal values of $x$ and $A$ that would minimize the following $l_2$ norm:

$$objective = O = \|S(i,j)-M(i,j)\|_2$$

Here, $S(i,j)$ is observed data (hence constant) and is of the same size as $M(i,j)$.

I tried with gradient descent but the reduction in the objective function stagnates after a few iterations. If I keep only $A$ as my decision variable (using some synthetic data for this trial) and provide the values of $x$ to the system, the GD proceeds well enough and converges to give a meaningful solution for optimal $A$. Similarly, if I just keep $x$ as my decision variable, providing the correct values of $A$, I get correct optimal values of $x$. But if both these are kept variable, the algorithm stagnates. I cannot figure out what could be the reason. any suggestions on how to overcome this?

I am following the general steps of the GD wherein the update steps for the $ith$ element of $x$ and $A$ are as follows:

$$xnew_i = x_i-t*\frac{\partial O}{\partial x_i}$$

$$Anew_i = A_i-t*\frac{\partial O}{\partial A_i}$$

$t$ is the step size. My question is whether its okay to do it this way or has it got some obvious mathematical flaw that I don't see. Should there be a different step size for $x$ update and $A$ update ?