Gradient of Coefficient of Variation of weighted geometric mean

212 Views Asked by At

This question is another version of the one here which I've previously asked. the difference is instead of WM here we have exp(WM). exp(W×log(M)) is equivalent to weighted geometric mean of M when W sums to 1 (wikipedia) (the exp operation is applied element-wise). I just replaced log(M) with M for simplicity.
I have a d×n matrix called M. What is the best 1×d W that minimizes CoV(exp(WM)) which is Coefficient of Variance of exp(W×M), considering that W sums to 1

$$\underset{W}{\operatorname{argmin}}\frac{SD(exp(WM))}{Mean(exp(WM))}$$ $$\sum_{i=1}^{d} w_{i}=1$$

I can rewrite this as: $$\underset{W}{\operatorname{argmin}} \frac{\sqrt{\frac{1}{n}\left\|exp(WM)-\frac{1}{n} exp(WM) A\right\|_{2}^{2}}}{\frac{1}{n} exp(WM) A}$$ $$\sum_{i=1}^{d} w_{i}=1$$

A is a n×1 matrix of ones.
I hope I get a closed form for W* or have the gradient so i can use gradient descent to find the minimum.


Edit: Here is my try based on this answer. but couldn't solve it. I think the main problem is with the Hadamard Product ($\circ$) which appears when differentiating the exp(WM).

Introduce an unconstrained vector $x$ and use it to construct a column vector $w$ which satisfies the constraint. $$\eqalign{ &w = \frac{x}{{\tt1}^Tx} \quad\implies\quad {\tt1}^Tw = \frac{{\tt1}^Tx}{{\tt1}^Tx} \doteq {\tt1} \\ }$$ Then for algebraic convenience, define some auxiliary variables $$\eqalign{ {\tt1} &= A,\quad &J = AA^T \\ C &= I-\tfrac{1}{n}J\quad &({\rm Centering\, Matrix}) \\ w &= W^T \quad &({\rm column\, vector\, constructed\, from\, }x)\\ B &= d×1\,\,matrix\,of\,ones \\ Q &= M\circ By^T \\ y &= exp(M^Tw) \quad&\implies dy = M^T\circ exp(M^Tw)B^Tdw = Q^Tdw \\ z &= Cy \quad&\implies dz = CQ^Tdw \\ \alpha &= {\tt1}^Tx \quad&\implies d\alpha = {\tt1}^Tdx \\ \beta &= {\tt1}^Ty \quad&\implies d\beta = {\tt1}^Tdy = {\tt1}^TQ^Tdw \\ w &= \alpha^{-1}x \quad&\implies dw = \alpha^{-1}dx - x\alpha^{-2}d\alpha \\ & \quad&\implies dw = \alpha^{-1}(I - w{\tt1}^T)\,dx \\ }$$ Note that $C^T=C=C^2\;$ and $\;\beta={\tt1}^Texp(M^Tw)={exp(w^TM){\tt1}}=exp(WM)A$
these properties will be used in several of the steps below.

Use the new variables to simplify the vector appearing in the numerator. $$\eqalign{ &\Big(exp(WM)-\tfrac{1}{n}exp(WM)AA^T\Big)^T = \Big(exp(M^Tw) - \tfrac{1}{n}Jexp(M^Tw)\Big) = Cy = z \\ }$$ Call the objective function $\phi$, and start by differentiating its square. $$\eqalign{ \phi^2 &= n\beta^{-2}z^Tz \\ 2\phi\,d\phi &= 2n\beta^{-2}z^Tdz - 2n\beta^{-3}z^Tz\,d\beta \\ d\phi &= n\phi^{-1}\beta^{-3}z^T\Big(\beta\,dz - z\,d\beta\Big) \\ &= n\phi^{-1}\beta^{-3}z^T\Big(\beta\,CQ^T - z{\tt1}^TQ^T\Big)\,dw \\ &= n\phi^{-1}\alpha^{-1}\beta^{-3}z^T\Big(\beta\,CQ^T - z{\tt1}^TQ^T\Big)\,\Big(I - w{\tt1}^T\Big)\,dx \\ \frac{\partial\phi}{\partial x} &= n\phi^{-1}\alpha^{-1}\beta^{-3}\Big(I - {\tt1}w^T\Big)\,\Big(\beta\,QC - Q{\tt1}z^T\Big)z \\ }$$ Set the gradient to zero. $$\eqalign{ \Big({\tt1}w^T\Big)\,\Big(\beta\,QC - Q{\tt1}z^T\Big)\,z &= I\Big(\beta\,QC - Q{\tt1}z^T\Big)\,z \\ }$$ Eliminate $z$ . $$\eqalign{ \Big({\tt1}w^T\Big)\,\Big(\beta QC - Q{\tt1}y^TC\Big)\,Cy &= \Big(\beta QC - Q{\tt1}y^TC\Big)\,Cy \\ \Big({\tt1}w^T\Big)\,Q\Big(\beta I - {\tt1}y^T\Big)\,Cy &= Q\Big(\beta I - {\tt1}y^T\Big)\,Cy \\ }$$

1

There are 1 best solutions below

4
On BEST ANSWER

$ \def\a{\alpha} \def\b{\beta} \def\c#1{\color{red}{#1}} \def\o{{\large\tt1}} \def\g{\sigma} \def\L{\left}\def\R{\right}\def\LR#1{\L(#1\R)} \def\Diag#1{\operatorname{Diag}\LR{#1}} $You were doing great until the final step where you "eliminated $z$".
That's not valid, so you must find a different way to solve the equation.
Here is an idea, but not a complete solution.

Define the matrix $$H = {\b QC -Q\o z^T}$$ Then the zero gradient condition becomes $$Hz = \LR{\o w^T}Hz \;=\; \LR{w^THz}\o \;=\; \g\o$$ This eliminates $z$ from one side of the equation $$\eqalign{ \LR{\beta QC -Q\o z^T}z &= \g\o }$$ Other substitutions can be used $$\eqalign{ \b &= \o^Ty \\ Cz &= C^2y = {Cy = z} \\ Y &= \Diag{y} \quad&\implies\quad Q=MY,\quad z=Cy=CY\o \\ }$$ to rewrite the equation entirely in terms of $y$ $\,($or $Y)$ $$\eqalign{ \g\o_d &= M\Diag{y}\Big(\LR{\o^T_ny}C\Diag{y} -\LR{y^TCy}I_n\Big)\o_n \\ &= M\Big(\LR{\o^T_n{Y}\o_n}\c{YCY} - \LR{\o^T_n\c{YCY}\o_n}{Y}\Big)\o_n \\ &= S\o_n \\ S\o_n &= \g\o_d \quad\implies\quad {\frac{S\o_n}{\|S\o_n\|}} = \frac{\o_d}{\sqrt d} \\ }$$ This looks promising. It almost looks like an eigenvalue equation, however...
$S$ is rectangular, so the $\o$ vectors on the RHS and LHS have different lengths.

The next step is to come up with an iteration formula akin to the classic power iteration to solve for a $y$ vector (or $S$ matrix) such that the pseudo-eigenvalue equation is satisfied.


Update

Since a closed-form eigenvalue-like solution seems improbable, and since we already know how to calculate the gradient and the cost function for any value of $x,\,$ i.e. $$\eqalign{ \phi(x) &= \LR{n\beta^{-2}z^Tz}^{1/2} \\ g(x) &= n\phi^{-1}\a^{-1}\b^{-3}\Big(I-{\tt1}w^T\Big)\,\Big(\b\,QC - Q{\tt1}z^T\Big)z \\ }$$ a better idea is a numerical solution using a gradient-based method.

The Barzilai-Borwein method is particularly simple and effective.
Initialize $$\eqalign{ x_0 &= random \qquad\qquad\qquad\qquad\qquad\quad \\ }$$ First step $$\eqalign{ g_0 &= g(x_0) \\ x_1 &= x_0 - \LR{\frac{0.05\;\phi(x_0)}{g_0^Tg_0}} g_0 \qquad\qquad\quad \\ k &= 1\\ }$$ Subsequent steps $$\eqalign{ g_k &= g(x_k) \\ x_{k+1} &= x_k - \LR{\frac{\LR{x_k - x_{k-1}}^T\LR{g_k - g_{k-1}}}{\LR{g_k - g_{k-1}}^T\LR{g_k - g_{k-1}}}}g_k \\ k &= k+1 \\ }$$ Stop when $g_k\approx 0$.