Gradient of a sum of indicators

298 Views Asked by At

Say I have a function $\mathbb R^n \rightarrow \mathbb R$:

$$f(w_1,\ldots,w_n) = n^-\sum_{i\in I^-}w_ix_i$$

with fixed $x_i\in\mathbb R$ (data), $I^-$ the set of indexes with negative sum operands and $n^-$ the number of negative operands. The goal is to minimize this sum over weights $w = (w_1,\ldots,w_n)$. The reason to include $n^-$ is to penalize not only large values of $w_ix_i$ but also the number of negative values.

My question is how can I compute function gradient at a given point: $\partial f/\partial w_i|_{w=w^0}$.

The difficulty here is with $n^-$ (which can be written as a sum of indicators $\sum_i 1\{w_ix_i < 0\}$). Is there any way of obtaining partial derivatives $\partial n^-/\partial w_i|_{w=w^0}$? If not what are workarounds?

1

There are 1 best solutions below

0
On BEST ANSWER

Let $n^{-}_i$ be the number of negative operands excluding the $i$-th for $w^0$. Then (as a function of $w_i$) $$n^- = \begin{cases} n^{-}_i, & \text{if } w_i x_i \geq 0\\n^{-}_i + 1, & \text{otherwise,}\end{cases}$$ that is, $n^-$ is not continuous (so there is no $\partial n^- / \partial w_i$) at $w_i = 0$ if $x_i \neq 0$, otherwise the partial derivative is $0$.

Possible workarounds

  • More smooth penalization of negative values, for example, $n^- = \sum_i (1 + \exp(-\alpha w_i x_i))^{-1}$ (sigmoid function; if $\alpha \to +\infty$, this tends to $\sum_i 1\{w_ix_i < 0\}$).
  • As above, but gradually reduce smoothness during optimization steps (in the above example, $\alpha \to + \infty$).
  • Minimize $\sum_{i\in I^-}w_ix_i$, treating $n^- = m$ as a fixed constraint, for various $m$ and use the minimum over $m$.