Maximizing a sum with ReLu functions.

147 Views Asked by At

Let $f$ be a vector valued function and $W$ a matrix. We denote the rows of the matrix by $w_j$. Suppose that $\sum_{j=1}^h \lVert w_j\rVert^2\leq R^2$. Show that the sum

$$\sum_{j=1}^h \lVert w_j\rVert^2\left(\sum_{i=1}^m \epsilon_i\sigma\left(\frac{w_j^T}{\lVert w_j\rVert}f(x_i)\right)\right)^2$$

achieves its supremum only if $\lVert w_j\rVert=R$ for some $j$ and $\lVert w_i\rVert=0$ for all $i\neq j$. Now I tried Lagrange multipliers but wasn't able to get the desired result. Is there a cleaner way to show this.

EDIT: $\sigma$ is the ReLu function and $\epsilon_i\in \{-1,1\}$ fixed. And $x_i$ are just fixed values.

1

There are 1 best solutions below

0
On BEST ANSWER

Note that to maximize the sum $$\sum_{j=1}^h \lVert w_j\rVert^2\left(\sum_{i=1}^m \epsilon_i\sigma\left(\frac{w_j^T}{\lVert w_j\rVert}f(x_i)\right)\right)^2$$

we need to maximize each of the individual sums $\left(\sum_{i=1}^m \epsilon_i\sigma\left(\frac{w_j^T}{\lVert w_j\rVert}f(x_i)\right)\right)^2$ first. Since $\frac{w_j^T}{||w_j||}f(x_i)$ is a function over the unit sphere, which is compact, it attains a maximum at some $\tilde w_j$. Notice that this maximum is invariant under any positive scaling of $\tilde{w}_j$. Let $M=\max_j\left(\sum_{i=1}^m \epsilon_i\sigma\left(\frac{\tilde{w}_j^T}{\lVert \tilde{w}_j\rVert}f(x_i)\right)\right)^2$ then we have

$$\sum_{j=1}^h \lVert \tilde{w}_j\rVert^2\left(\sum_{i=1}^m \epsilon_i\sigma\left(\frac{\tilde{w}_j^T}{\lVert \tilde{w}_j\rVert}f(x_i)\right)\right)^2\leq \sum_{j=1}^h ||\tilde w_j|^2 M\leq R^2M$$

Note that there exists a $k$ such that $M=\left(\sum_{i=1}^m \epsilon_i\sigma\left(\frac{\tilde{w}_k^T}{\lVert \tilde{w}_k\rVert}f(x_i)\right)\right)^2$. Thus, if $||\tilde{w}_k||^2=R^2$ then our inequality become an equality. Therefore we have maximized the sum.