How can we efficiently optimise $\sum_{i=1}^N \|a_i -x\|_\infty ^2$ for $N$ very large?

44 Views Asked by At

Suppose $n$ is very large and we want to minimise the convex function $\sum_{i=1}^n ||a_i -x||_\infty ^2$ over some compact convex $X \subset \mathbb R^d$. The difficulty with attemtpting this numerically is that each query of the objective function (or its gradient) has complexity $O(n)$.

It is much easier for example to minimise some linear function $\sum_{i=1}^n b_i \cdot x $ or a sum of Euclidean squares $\sum_{i=1}^n ||a_i -x|| ^2$. In those cases we can add the $n$ many functions at the very start. For the linear case we just add up the components. For the quadratic case we can expand the square to get

$$\sum_{i=1}^N ||a_i -x|| ^2 = \sum_{i=1}^n \big( a_i^2 -2a_i \cdot x + x^2 \big) = nx^2 -2x \cdot \sum_{i=1}^n a_i + \sum_{i=1}^n a_i^2 $$

So we only have to compute $\sum_{i=1}^n a_i $ once at the very start. Then we can query the function or gradient with cost independent of $N$.

This is particularly useful if we want to solve a sequence of problems for $n=1,2,\ldots, N$. Then we can update the objective function $ \sum_{i=1}^n a_i \to \sum_{i=1}^{n+1} a_i $ at cost $O(1)$ each round and get a total cost of $O(N)$.

I wonder are there any tricks for the sum of $\infty$-norms? One idea is to approximate it by some $p$-norm for large $p$. Then approximate that with the Taylor polynomial of degree $r$. Then do similar to the above. Each turn we update the polynomial with cost $O(r)$ and get total cost $O(rN)$.

The difficult part is figuring out how large we need to take $r$, to keep the error within a given tolerance. If we can bound the $r$th derivatives of the $p$-norm we can do this. This is where I get stuck.

Another idea is to take random samples. Each time we query we select some $i(1),\ldots, i(m) \in \{1,2,\ldots, n\}$ uniformly at random and instead query $\sum_{j=1}^m ||a_{i(j)} -x||_\infty ^2$. Something like this might lead to solving the problem on expectation.

Has anyone thought about this before, or have any ideas about bounding the derivatives?