I have asked this question before, however no one could answer it. I wish to know how can I minimize a quadratic expression over two parameter spaces. I have developed a rudimentary method and I want to see if it is right way to proceed. I would welcome any improvements or suggestions. Lets say the quadratic to be minimized is, \begin{equation} e=I+w_1^TAw_1+w_2^Tb \end{equation} where, $w_1,w_2,b \in \mathbb{R}^n$, $A \in \mathbb{R}^{n\times n}$ and $I \in \mathbb{R}$. I want to minimize $e$ over both $w_1$ and $w_2$. In order to do that, I construct $J=\frac{1}{2}e^2$. Here I feel there can be two ways of approaching this problem,
Method 1: Applying gradient descent (GD) to both $w_1,w_2$ separately, i.e., \begin{equation} w_{n+1}^1=w_n^1-\eta_1\frac{\partial J}{\partial w_n^1}\\ w_{n+1}^2=w_n^2-\eta_1\frac{\partial J}{\partial w_n^2} \end{equation} where, superscript denotes the two parameters and subscript denotes the iteration.
Method 2: I re write $e$ in terms one single composite parameter and then try to minimize it over that composite parameter, for instance, \begin{equation} e=w^T\alpha w+w^T\beta+c \end{equation} where, \begin{equation} w=\begin{pmatrix} w_1 \\ w_2 \end{pmatrix}, \alpha=\begin{pmatrix} A & 0\\ 0 &0 \end{pmatrix}, \beta=\begin{pmatrix} 0 \\ b \end{pmatrix} \end{equation} And now minimizing this expression over composite parameter space $w$. \begin{equation} w_{n+1}=w_n-\eta\frac{\partial J}{\partial w}=w_n-\eta\frac{\partial J}{\partial e}\frac{\partial e}{\partial w}=w_n-\eta e(\alpha w+b) \end{equation}. I want to know if my reasoning is correct or if someone can improve upon this. Also my major problem is, the gradient descent over composite parameter space works as long as $\alpha,\beta$ and $c$ remains constant, but if lets say I change $\alpha,\beta$ and $c$ then it is not able to go to new minimina.
Minimizing $e$ and minimizing $\frac12 e^2$ are not equivalent since it is possible that $e$ can take negative value.
We have to be careful about existence of minimum. If any component of $b<0$, then we can let the corresponding $w_2$ component to get arbitrary large. Fixing the other variables, we can get arbitrarily negative. Hence the minimum doesn't exists.
Hence we require $b \ge 0$, and the optimal $w_2$ to choose would be to set it to $0$.
Hence minimizing $e$ is equivalent to minimizing $I+w_1^TAw_1$. Again, if $A$ is not positive semidefinite, then we do not have a minimum. If $w_1^TAw_1<0$, we can scale $w_1$ to have large magnitude and it can become arbitrarily negative.
Hence the only only case to consider is when $A$ is positive semidefinite.
The optimal value would be to let $w_1=0$.
Hence the optimal objective value is $I$.
In general if you have a function $f_1(w_1)+f_2(w_2)$ where $w_1$ and $w_2$ do not influence each other, you can optimize them separately.