What does it mean that a Jacobian matrix has full rank in an optimisation problem?

3.5k Views Asked by At

I learned that when we want to find candidate points for our optimisation problem of

$$\underset{x}{max} \quad f(\mathbf{X}) \text{ s.t. } g_j(\mathbf{X}) = c_j$$

where $\mathbf{X}$ is a point in multidimensional space, i.e. a vector, we find them through solving $\frac{\partial L}{\partial x_i}$, where $L$ is our Lagrangian,

$$L(\mathbf{X}, \Lambda) = f(\mathbf{X}) - \sum_{j=1}^{m}\lambda_j \,g_j(\mathbf{X})$$

and verifying that the rank of the jacobian matrix, $J_g(\mathbf{X})$ is $m$. What does it mean when the jacobian matrix has not full rank, $m$?

I know from linear algebra, that it means columns (or rows) are linearly dependent. However, I fail to deduct from that what this actually means for the optimisation problem.

Partially I understand that less-than-full-rank means that a constraint has similar first derivatives, however, whata does that mean for the optimisation problem?

Slide on which I base my question on: Slide

1

There are 1 best solutions below

0
On BEST ANSWER

Let's assume the constraint functions $g_1, \ldots, g_m$ are smooth ($C^\infty$), though this can be relaxed.

The rows of the Jacobian $J_g(X)$ are the gradients of the constraints at $X$; let me write them as $\nabla g_1(X), \ldots, \nabla g_m(X)$.

If these are linearly independent (that is, if the Jacobian has full row rank), then they remain linearly independent in a neighborhood of $X$. You can see this by considering the determinant of $J_g(X) J_g(X)^\top$: it is nonzero at $X$, and it is continuous as a function of $X$; therefore, it remains nonzero in a neighborhood of $X$.

It is a fact from differential geometry that---under the conditions described above---the set of solutions to the constraints (locally) is a smooth manifold, that is, a smooth surface: at $X$, this surface has a well-defined tangent space (a linearization) and a normal space (the orthogonal complement of the tangent space with respect to some inner product).

The normal space is spanned exactly by the gradients of the constraints. In other words: vectors of the form $\sum_{j=1}^m \lambda_j \nabla g_j(X)$ are vectors normal to the surface, and all vectors normal to the surface can be written (uniquely) in that way.

This gives a nice interpretation to the Lagrangian formalism: $X$ is stationary exactly if the gradient of $f$ at $X$ is in the normal space to the surface. Intuitively, this should make sense.

When the gradients are not linearly independent (that is, when the Jacobian does not have full row rank), the above discussion does not go through: we cannot be certain that the search space (the set of solutions of the constraints) looks like a smooth surface around $X$. It might look weird (have kinks, angles, cusps, ...). It is then much harder to say precisely what is going on. The simplest results from typical introductory courses in optimization simply avoid those complications by assuming linear independence of the gradients. That is not to say that things necessarily go bad when you don't have linear independence, but they could; and at any rate, the situation requires more care.