Gradient Descent with spherical and simplex constraint

2.6k Views Asked by At

I have been working on a problem with a spherical constraint and another normalization constraint.

To be precise I have a function $\mathrm{H}(X_{i})$, and the $X_{i}$ are the variables that I wish to optimise. The constraints are

\begin{equation} \sum_{i=1}^{N} X_{i}^2 = 1 \qquad \sum_{i=1}^{N} X_{i} = m \end{equation}

I tried the method of lagrange multipliers to construct the function $\mathrm{H}'$ in the following form:

\begin{equation} \mathrm{H}' = H + \nu_{1}(\sum_{i=1}^{N} X_{i}^2 - 1) + \nu_{2} (\sum_{i=1}^{N} X_{i} - m) \end{equation}.

However, for some reason that I haven't been able to figure out this doesn't seem to work numerically. (I can compute $\nu_{1}$ and $\nu_{2}$ analytically. This is what I injected into the final gradient descent routine).

Following the response here, I was wondering whether a similar thing could be done for the normalization constraint above. I tried the following procedure:

  1. Compute the gradient of the function $\mathrm{H}$ i.e. without the constraints. Then follow the link above. This fixes the spherical constraint.

  2. Then "reproject" the new variable $X_{i}(t+\Delta t)$ on the plane defined by $\sum_{i}^{N} X_{i} = m$.

For the last step, I chose a vector $((1-m)/N, ... (1-m)/N)$ and then performed standard linear algebra operations for projecting $X_{i}(t+\Delta t)$.

This however doesn't seem to work too well in practice: The gradient decreases and so does the function $\mathrm{H}$. The spherical constraint is satisfied as well. However the normalization constraint isn't.

Any suggestions/ideas/references for such a problem? I have scoured the web for problems of such type. The spherical constraint seems a pretty standard one but the other one doesn't seem to occur in many places. I haven't seen any references that treat the two together. Thanks!

1

There are 1 best solutions below

7
On BEST ANSWER

First of all, the general "brute force" solution of doing projected gradient descent should work:

  1. Compute the unconstrained gradient $\nabla H$;
  2. Project the gradient onto the tangent space of the constraints (optional but can reduce the numerical difficulty of the next step). In other words, solve the subproblem $$\min_{v} \|v-\nabla H\|^2 \quad \mathrm{s.t}\quad 2X \cdot v = 0; \mathbf{1}\cdot v = 0.$$
  3. Take a step $X \leftarrow X + v$.
  4. Project back onto the constraint surface: solve $$\min_{\tilde X} \|X-\tilde X\|^2 \quad \mathrm{s.t.} \quad \|\tilde X\|^2=1; \tilde X\cdot \mathbf{1} = m$$ using e.g. Newton's method.

In the question that you linked, they have used the fact that the sphere has a closed-form exponential map to substantially simplify step 4: they compute a position $\tilde X$ directly on the sphere without needing to step + project.

Generally speaking the constraint manifold is too complex to allow such tricks. However in your case, notice that the constraint manifold is the intersection of a sphere and a plane, e.g. it is a sphere of dimension $N-1$, and therefore it is possible to write down a reduced representation of the set of all feasible points. In particular, let $v_1,\ldots,v_{n-1}$ be a completion of an orthonormal basis for $\mathbb{R}^N$, together with $\mathbf{1}/\sqrt{N}.$ Then all points satisfying your constraints are of the form $$X = \frac{m}{N}\mathbf{1} + \sum_{i=1}^{n-1} \alpha_iv_i$$ with $\|\alpha\|^2 = 1-\frac{m^2}{N^2}.$

You can thus do gradient descent directly on $h(\alpha) = H \circ X$ using the technique from the linked post, and avoid needing to deal with the constraints at all.