Solving a matrix optimization problem (part 2)

24 Views Asked by At

Problem definition

This post is a follow up of this previous one. Consider the following $m\times(n+1)$ matrix \begin{equation*} B(\sigma)\triangleq \left[\begin{array}{ccc} b_0^n(s_1) & \cdots & b_n^n(s_1) \\ \vdots & \ddots & \vdots \\ b_0^n(s_m) & \cdots & b_n^n(s_m) \\ \end{array}\right] \end{equation*}

where $\sigma\triangleq \left[s_1\,\dots\,s_m\right]'$ (with $'$ being the transpose operator) and for all $i=0,\dots,n$ the functions $b_i^n(s)$ are the Bernstein polynomials of degree $n \in \mathbb{N}$, i.e. \begin{equation*} b_i^n(s) \triangleq \binom{n}{i}\,s^i\,(1-s)^{n-i} \end{equation*} For a given $n$ and a given $m \gg n$, I'm searching for a vector $\sigma\in[0,1]^m$ that makes the product \begin{equation*} B(\sigma)\,\underbrace{(B'(\sigma)B(\sigma))^{-1}B(\sigma)}_{\triangleq B^{+}(\sigma)} \end{equation*} as close as possible to the $m\times m$ identity matrix $I_m$. To make this problem more clear, I define the following optimization problem \begin{equation*} \min_{\sigma \in [0,1]^m} \underbrace{\lVert B(\sigma)B^{+}(\sigma)-I_m\rVert_p^2}_{\triangleq J(\sigma)} \end{equation*} where $p$ is a matrix norm, e.g. $p=1,2,\textrm{F},\infty$ ($\textrm{F}$ is the Frobenius norm).

Naive solution

The simplest thing that comes in my mind to solve the problem is to run a gradient descent, where at each iteration the current solution $\sigma$ is projected in $[0,1]^m$. I solve the projection problem by simply saturating the components $s_i$ in $[0,1]$, i.e. \begin{equation*} \textrm{proj}_{[0,1]^m}(\sigma)\triangleq \left[\begin{array}{c} \min(0, \max(s_1, 1)) \\ \vdots \\ \min(0, \max(s_m, 1)) \\ \end{array}\right] \end{equation*}

In order to get easily the gradient of the cost, I just consider the approximation given by the finite differences with "precision" $\tau\triangleq 10^{-3}$, that is \begin{equation*} \nabla J(\sigma) \triangleq \frac{1}{\tau}\left[\begin{array}{c} J(\sigma+0.5\,\tau\,\textrm{e}_1)-J(\sigma-0.5\,\tau\,\textrm{e}_1)\\ \vdots \\ J(\sigma+0.5\,\tau\,\textrm{e}_m)-J(\sigma-0.5\,\tau\,\textrm{e}_m)\\ \end{array}\right] \end{equation*} where for all $i=1,\dots, m$ the vector $\textrm{e}_i\triangleq \left[0_{1\times(i-1)}\,\,1\,\,0_{1\times(m-i)}\right]'$ is the versor of the $i$-th coordinate $s_i$ of $\sigma$. Hence, I use the following update rule to search for the optimum $\sigma^*$ (given that is unique) \begin{equation*} \sigma \leftarrow \textrm{proj}_{[0,1]^m}\left(\sigma - \alpha\,\nabla J(\sigma)\right) \end{equation*} where I use Armijo to define the update step $\alpha$. In particular, I consider as initial step $\alpha=1$, then I search for the largest step $\alpha$ such that \begin{equation*} J(\sigma-\alpha \nabla J(\sigma)) \leq J(\sigma)+\gamma\,\alpha \lVert \nabla J(\sigma)\rVert^2 \end{equation*} if the current step does not satisfy this condition, then I reduce it according to $\alpha\leftarrow \delta\,\alpha$. I'm considering the standard choice $\delta=\gamma\triangleq 0.5$. As a starting guess to initialize my solver, I consider $\sigma$ with uniformly spaced components, i.e. \begin{equation*} \sigma \triangleq \left[\begin{array}{ccc} \frac{0}{m-1} & \cdots & \frac{m-1}{m-1} \end{array}\right]' \end{equation*}

Implementation problem

Except for $p=\infty$, the gradient is always null, regardless the value of $\sigma$. I've checked this fact by randomly generating $\sigma \in [0,1]^m$ and counting how many times happens $\nabla J(\sigma)\neq 0_{m\times 1}$. As a consequence of this fact, my solver does not do anything and just returns the initial guess $\sigma$ with uniformly spaced components. Hence, I cannot find anything better than my initial guess.

Questions

  1. I would like to have an hand in understanding why the gradient seems to be null everywhere
  2. I would like to have some suggestions regarding how to find $\sigma$ such that $B(\sigma)B^{+}(\sigma)$ is somewhat near to $I_m$.