How to minimize a quadratic expression over two parameter spaces?

46 Views Asked by At

I have asked this question before, however no one could answer it. I wish to know how can I minimize a quadratic expression over two parameter spaces. I have developed a rudimentary method and I want to see if it is right way to proceed. I would welcome any improvements or suggestions. Lets say the quadratic to be minimized is, \begin{equation} e=I+w_1^TAw_1+w_2^Tb \end{equation} where, $w_1,w_2,b \in \mathbb{R}^n$, $A \in \mathbb{R}^{n\times n}$ and $I \in \mathbb{R}$. I want to minimize $e$ over both $w_1$ and $w_2$. In order to do that, I construct $J=\frac{1}{2}e^2$. Here I feel there can be two ways of approaching this problem,

Method 1: Applying gradient descent (GD) to both $w_1,w_2$ separately, i.e., \begin{equation} w_{n+1}^1=w_n^1-\eta_1\frac{\partial J}{\partial w_n^1}\\ w_{n+1}^2=w_n^2-\eta_1\frac{\partial J}{\partial w_n^2} \end{equation} where, superscript denotes the two parameters and subscript denotes the iteration.

Method 2: I re write $e$ in terms one single composite parameter and then try to minimize it over that composite parameter, for instance, \begin{equation} e=w^T\alpha w+w^T\beta+c \end{equation} where, \begin{equation} w=\begin{pmatrix} w_1 \\ w_2 \end{pmatrix}, \alpha=\begin{pmatrix} A & 0\\ 0 &0 \end{pmatrix}, \beta=\begin{pmatrix} 0 \\ b \end{pmatrix} \end{equation} And now minimizing this expression over composite parameter space $w$. \begin{equation} w_{n+1}=w_n-\eta\frac{\partial J}{\partial w}=w_n-\eta\frac{\partial J}{\partial e}\frac{\partial e}{\partial w}=w_n-\eta e(\alpha w+b) \end{equation}. I want to know if my reasoning is correct or if someone can improve upon this. Also my major problem is, the gradient descent over composite parameter space works as long as $\alpha,\beta$ and $c$ remains constant, but if lets say I change $\alpha,\beta$ and $c$ then it is not able to go to new minimina.

2

There are 2 best solutions below

5
On

Minimizing $e$ and minimizing $\frac12 e^2$ are not equivalent since it is possible that $e$ can take negative value.

We have to be careful about existence of minimum. If any component of $b<0$, then we can let the corresponding $w_2$ component to get arbitrary large. Fixing the other variables, we can get arbitrarily negative. Hence the minimum doesn't exists.

Hence we require $b \ge 0$, and the optimal $w_2$ to choose would be to set it to $0$.

Hence minimizing $e$ is equivalent to minimizing $I+w_1^TAw_1$. Again, if $A$ is not positive semidefinite, then we do not have a minimum. If $w_1^TAw_1<0$, we can scale $w_1$ to have large magnitude and it can become arbitrarily negative.

Hence the only only case to consider is when $A$ is positive semidefinite.

The optimal value would be to let $w_1=0$.

Hence the optimal objective value is $I$.

In general if you have a function $f_1(w_1)+f_2(w_2)$ where $w_1$ and $w_2$ do not influence each other, you can optimize them separately.

1
On

As I undestand your question, $I,A,$ and $b$ are given and you wish to find $w_1,w_2$ to minimise $e.$ Since matrices are involved, we'll re-name $I$ as $r$ to avoid any possible cconfusion with the identity matrix. If $b$ is not the zero vector, it has at least one non-zero component, say $b_i$. Let $w_1$ be the zero vector and $w_2$ the vector whose $i-$th entry is $K$ and whose other entries are all 0. Then $$e=r+Kb_i$$ and by choosing $K$ appropriately we can make $e$ take any value whatsoever, so there is no min or max. If $b$ is the zero vector, let $$C=\frac{1}{2}(A+A^T).$$ Then $$e=r+w_1^TCw_1$$ and $C$ is symmetric. If $C$ is positive semi-definite, the minimum value of $e$ is $r$ which occurs when $w_1$ is the zero vector. Otherwise, let $P$ be a non-singular matrix such that $P^TCP=D$ where $D=\text { diag }(d_{11},...,d_{nn})$ is a diagonal matrix. Then $$e=r+(P^{-1}w_1)^TD(P^{-1}w_1).$$ Since $C$ is not positive semi-definite, $D$ has at least one negative term, say $d_{ii}.$ Let $v$ be the vector whose $i-$th entry is $K$ and whose other entries are all 0. Let $w_1=Pv.$ Then $$e=r+K^2d_{ii}$$ which shows that $e$ can be made unboundedly negative.