Suppose that we have an autonomous discrete dynamical system $x_{k+1} = Ax_k \in \mathbb{R}^n$, and a finite set of points $X_0 \subseteq \mathbb{R}^n$. We can assume for the sake of simplicity that $\rho(A) \leq 1$. Then we have the recurrence relation $X_{k+1} = AX_k$, where $AX_k$ is the image of $X_k$ under $A$.
Now, given an orthogonal matrix $U$ with columns $u_i$, we define the bounding box of any finite point set $X$ as $\DeclareMathOperator{\box}{box} \box_U(X) = \{v\in \mathbb{R}^n : \min_{x\in X}\{u_i^Tx\}\leq u_i^Tv\leq \max_{x\in X}\{u_i^Tx\}, i=1,\ldots,n\}$.
Given a finite time $N > 1$, I would like to solve the following optimization problem: \begin{align} \min_U\quad&\sum_{k=1}^N R(A^k\box_U(X_0)),\\\text{subject to}\quad&U^TU=I, \end{align} where $R(X) = \max_{x \in X}\{\|x\|_\infty\}$.
The purpose of this is to approximate a possible large set of points $X_0$ by a simple box, such that its images under $A_k$ form a reachable set that is as "tight" as possible, in the sense of contraction rate, while also reducing the computation required, since computing $A^k\box_U(X_0)$ is easy. This will be used eventually in a tool to compute overapproximations of the reachable set for symbolic verification purposes.
I have so far attempted two approaches: 1) using a PCA approximation and 2) a PSO algorithm. But the first usually gives me a very bad initial box approximation, while the latter is too computationally intensive.
I would therefore like to ask for some ideas or references on how to solve this optimization problem efficiently, if that's even possible. Any help would be appreciated.