Gradient Descent with derivative constraints

65 Views Asked by At

I am trying to solve an optimization problem:

Find a parameter vector $\theta$ so that $\sum_x \log f(\theta, x) \cdot y$ is minimized (it's a probability problem), subject to $\frac{\partial f_i}{\partial x_j}(\theta, x) \leq C$ for all $x$ in the input space.

My problem is that estimating the derivative of $f$ at a point $x$ is expensive, and my input space is really, really, really big. In fact, just computing $f(\theta, x)$ on some "fine" grid across my input space would be prohibitively expensive. Are there some handy theorems that may be of use here (even if they are only applicable to certain classes of functions)?