Question about the FG algorithm (convex optimization) (Flury 1984)

52 Views Asked by At

I am currently trying to understand the FG algorithm which was presented by Flury in 1984.

You can find here a description of the algorithm : https://www.researchgate.net/publication/248861527_An_Algorithm_for_Simultaneous_Orthogonal_Transformation_of_Several_Positive_Definite_Symmetric_Matrices_to_Nearly_Diagonal_Form

And here the background for the algorithm (on which I have an issue): https://www.jstor.org/stable/2288721?seq=2#metadata_info_tab_contents

Its objective is to find a common othorgonal matrix to make as diagonal as possible several positive definite symmetric (p.d.s) $p \times p$ matrices $A_1,A_2,...,A_g$.

The goal is to minimize $\phi$ defined as:

\begin{equation} \phi(F_1,...,F_g)=\prod_{i=1}^{g}{\left(\frac{|\text{diag}(F_i)|}{|F_i|}\right)^{n_i}} \end{equation} where $F_i=B'A_iB$ for $B$ an orthogonal matrix, diag($F_i$) is the p$\times$p diagonal matrix with the same diagonal components as $F_i$, and $n_i$ are fixed weights.

It can be shown (Flury 1983) that the above problem is equivalent to minimizing the following function: \begin{equation} g(\Sigma_1,...,\Sigma_g)=g(B_{1},...,B_p,\lambda_{1,1},...,\lambda_{1,p},...,\lambda_{g,1},...,\lambda_{g,p})=\sum_{i=1}^{g}{n_{i} \left[ \sum_{j=1}^{p}{log\lambda_{i,j}+B_{j}'A_{i} B_{j}/\lambda_{i,j}} \right]} \end{equation} where $B_i$ is the i-th column of an orthogonal matrix, $\lambda_{i,j}$ are stricly positive and $\Sigma_i$ are p.d.s matrices for which we want to find $B$ such that $B'\Sigma_iB=\text{diag}(\lambda_{i,1},...,\lambda_{i,p})$.

Until here I understand the reasoning, my issue comes in the following. To solve the above problem, the author use the Lagrange multipliers method, with the following constraints:

$B_i'B_j=0$ if i$\neq$j

$B_i'B_j=1$ else

I have two problems:

First, for $\Sigma_i$ to be p.d.s the $\lambda_{i,j}$ should be strictly positive, so why is not in the constraints of the Lagrange multiplier methods?

Second, I do not understand why the Lagrange methods leads to a necessary condition, because the constraint $B_i'B_i=1$ is not convex I believe. ( $f: u,v \rightarrow u'v-1$ is not convex). The theorem that I know for Lagragian multipliers on convex functions uses the fact that the constraints are convex.

I know that because we want to minimize a convex function over a convex set (the set of p.s.d functions), then global minimums are found where its gradient is null. But in the case of Lagrange multipliers methods, if we add non convex functions I don't see why this result would hold.

Thank you very much in advance, and of course feel free to ask me if anything is not clear.

1

There are 1 best solutions below

5
On BEST ANSWER

It seems that you confused the Lagrange multiplier method for equality constrained optimization problems, with the Lagrange multiplier method for convex optimization problems.

The former deals with problems in the following form (e.g., see http://en.wikipedia.org/wiki/Lagrange_multiplier): $$\begin{align}\min\;\;\;& f(x)\\ \mathrm{s.t.}\;\;\;& g(x)=0, \end{align}$$ where $f,g$ are continuous functions. This does not require either $f$ or $g$ to be convex, so it is quite general. On the other hand, it only gives you a necessary condition. What you get are just candidates for optimal points.

The latter is used for convex programs with inequality constraints. And under some constraint qualification, the KKT conditions are necessary and sufficient for optimality.