Say I have a function $\mathbb R^n \rightarrow \mathbb R$:
$$f(w_1,\ldots,w_n) = n^-\sum_{i\in I^-}w_ix_i$$
with fixed $x_i\in\mathbb R$ (data), $I^-$ the set of indexes with negative sum operands and $n^-$ the number of negative operands. The goal is to minimize this sum over weights $w = (w_1,\ldots,w_n)$. The reason to include $n^-$ is to penalize not only large values of $w_ix_i$ but also the number of negative values.
My question is how can I compute function gradient at a given point: $\partial f/\partial w_i|_{w=w^0}$.
The difficulty here is with $n^-$ (which can be written as a sum of indicators $\sum_i 1\{w_ix_i < 0\}$). Is there any way of obtaining partial derivatives $\partial n^-/\partial w_i|_{w=w^0}$? If not what are workarounds?
Let $n^{-}_i$ be the number of negative operands excluding the $i$-th for $w^0$. Then (as a function of $w_i$) $$n^- = \begin{cases} n^{-}_i, & \text{if } w_i x_i \geq 0\\n^{-}_i + 1, & \text{otherwise,}\end{cases}$$ that is, $n^-$ is not continuous (so there is no $\partial n^- / \partial w_i$) at $w_i = 0$ if $x_i \neq 0$, otherwise the partial derivative is $0$.
Possible workarounds