What does it mean to set the gradient of a multilinear function to $0$?

114 Views Asked by At

When you have a 1D function $f(x)$, the vector $x$ that satisfies $\nabla f(x) = 0$ is a stationary point which is a minimum when $f$ is convex.


Consider a multilinear function (linear in all arguments) that is defined on a compact set, so that it does attain minimum and maximum value.

$$f(x,y) = x(1-y)$$

The gradient of $f$ along $x$ is

$$\nabla_x f(x,y) = (1-y)$$

What does it mean to set $\nabla_x f(x,y) = 0$?

At this condition, we do not recover some $x$ that gives a minimum value to $f(x,y)$, but instead a function of $y$, which satisfies $y = 1$.

What does this condition tell us ?Does it tell us that $f(x,y)$ is minimized at $y = 1$ no matter what $x$? Or is it at the minimizer $x$, $y$ must be equal to $1$

1

There are 1 best solutions below

0
On BEST ANSWER

First you need to distinguish between stationary point and minimum. Stationarity is a necessary, but not sufficient, condition for minimum. The condition (or actually definition) of stationarity over a set $U$, is that for $x\in \text{int}(U)$ we have $\nabla f(x)=0$. If the set $U$ is open then every $x\in U$ is in the interior, however for closed convex sets, the condition for stationarity becomes $\nabla f(x)^T(y-x)\geq 0, \; \forall y\in U$. This means that the directional derivative is nonnegative in all directions at a point $x$.

Second, for convex functions stationarity is also a sufficient condition for optimality. This is the case for $f(x)=x^2$ for example, but not for $f(x)=x^3$ as pointed out in a comment.

Lastly, the function you presented is indeed bilinear, however it is not linear or convex in general ("jointly"). Therefore even if the derivative w.r.t. $x, y$ equals zero, you will not necessarily have a minimum point. In this specific case, you will have a saddle point. What you do know is that in order for the derivative w.r.t. $x$ to equal zero, then we must have $y=1$. This is necessary for stationarity (but again, not sufficient). Notice that by ignoring the $y$ variable, you actually look at your function as it was $f_y(x)=x(1-y)$, where $y$ is just a parameter and not a variable. In that case the derivative will be zero iff the parameter $y=1$ and in fact you'll have a constant function $f(x)=0$. This also highlights the fact that linear functions have a zero derivative iff they are constant.