Optimization of non-negative parameter matrix (with a column-sum term)

183 Views Asked by At

I have a matrix of parameters $\boldsymbol{X}$, with $F$ rows and $K$ columns, where each parameter is non-negative ($x_{fk} \geq 0$). I want to optimize this function: \begin{align} arg\max_{_X} \sum_k \left(-a \ln(\beta + \sum_{f} x_{fk}) + \sum_f c_{fk} \ln x_{fk} \right)\\ = \sum_k -a \ln\left(\beta + \sum_{f} x_{fk}\right) + \sum_k\sum_f c_{fk} \ln x_{fk} \end{align}

where $\boldsymbol{C}$ is a $F \times K$ matrix of data.

A possible and straighforward solution is to take partial derivatives with respect to each $w_{fk}$. This gives me:

\begin{align} x_{fk} = \frac{\beta + \sum_{i\neq f} x_{ik}}{a - c_{fk}} c_{fk} \end{align}

Derivation with matrices (or vectors)

The above is just a core part of the function I want to optimize. Unfortunatelly thinks become very messy when deriving parameter by parameter. Thus, I wonder whether I can optimize this function using matrix notation (and eventually avoiding updates like the above where a parameter depends on all the parameters in the same column).

What I do:

Let $\ln \boldsymbol{\tilde{X}}$ be the matrix of containing the $\ln x_{fk}$ terms. Then I can re-express the function as:

$$ - a \ln(1 + \beta \sum_i x_i) \quad+\quad tr(\boldsymbol{C}^T, \ln \boldsymbol{\tilde{X}}) $$

I think the derivative of the second term with respect to $\boldsymbol{X}$is: $$ tr(\boldsymbol{C}^T \frac{\partial}{\partial}\ln \boldsymbol{\tilde{X}}) $$ where the derivative of $\ln \boldsymbol{\tilde{X}}$ is a matrix with elements $1/x_{fk}$.

For the first term, I don't know know to proceed.

Alternatively, I tried also deriving with respect to each column of $\boldsymbol{X}$. The function would be:

\begin{align} \sum_k - a \ln(1 + \beta \sum_i x_{ik}) + \boldsymbol{C}_{:,k}^T \ln \boldsymbol{X}_{:,k} \end{align}

But still I'm not sure how to deal with the first term

Question: How can I optimize this function using matrix notation?

1

There are 1 best solutions below

2
On BEST ANSWER

Define two vectors whose elements are all equal to unity $$u\in{\mathbb R}^{F}, \,\,\,\,v\in{\mathbb R}^{K}$$ then summations can be replaced by multiplications by these vectors.


The following definitions will be useful $$\eqalign{ B &= \frac{\beta uv^T}{u^Tu} &= \frac{\beta uv^T}{F} \cr y &= X^Tu &\implies y^T = u^TX \cr p &= \frac{C^Tu}{a} &\implies p^T = \frac{u^TC}{a} \cr P &= {\rm Diag}(p) \cr }$$ Let's also define the notations used for

  • the regular/matrix product $(AB)$
  • the inner/Frobenius product $(A:B=\operatorname{tr}(A^TB)\,)$
  • the elementwise/Hadamard product $(A\odot B)$
  • the elementwise/Hadamard division $(A\oslash B)\,$ or $\,(\frac{A}{B})$


Now write the cost function in matrix form, and find its differential and gradient $$\eqalign{ J &= C:\log(X) - av^T:\log(u^T(X+B)) \cr\cr dJ &= C:d\log(X) - av^T:d\log(u^T(X+B)) \cr &= C:{dX}\oslash{X} - av^T:{u^TdX}\oslash{u^T(X+B)} \cr &= C:{dX}\oslash{X} - av^T:{u^TdX}\oslash{(y^T+\beta v^T)} \cr &= {C}\oslash{X}:dX - {av^T}\oslash(y^T+\beta v^T):u^TdX \cr &= {C}\oslash{X}:dX - au\Big[v^T\oslash(y^T+\beta v^T)\Big]:dX \cr &= \Bigg[\frac{C}{X} - au\Bigg(\frac{v^T}{y^T+\beta v^T}\Bigg)\Bigg]:dX \cr \cr \nabla J &= \frac{C}{X} - au\Bigg(\frac{v^T}{y^T+\beta v^T}\Bigg) \cr \cr }$$ Set the gradient to zero and solve for $X$ $$\eqalign{ \frac{C}{X} &= u\Bigg(\frac{av^T}{y^T+\beta v^T}\Bigg) \cr C &= X\odot u\Bigg(\frac{av^T}{y^T+\beta v^T}\Bigg) \cr &= X\,\,{\rm Diag}\Bigg(\frac{av}{y+\beta v}\Bigg) \cr &= X\,\,{\rm Diag}(av)\,\,{\rm Diag}(y+\beta v)^{-1} \cr &= a\,X(Y+\beta I)^{-1} \cr \cr X &= \frac{1}{a}\,C\,(Y+\beta I) \cr &= \frac{C}{a}\,\bigg({\rm Diag}(u^TX)+\beta I\bigg) \cr }$$ This is almost the result we want.
Multiply by $u^T$ to produce a recursive equation for $y$ $$\eqalign{ u^TX &= \frac{u^TC}{a}\,\bigg({\rm Diag}(u^TX)+\beta I\bigg) \cr y^T &= p^T\bigg({\rm Diag}(y^T)+\beta I\bigg) \cr y &= {\rm Diag}(y)\,p + \beta p \cr &= y\odot p + \beta p \cr \cr y\odot(v-p) &= \beta p \cr \cr \frac{y}{\beta} &= \frac{p}{v-p} \cr \frac{Y}{\beta} &= {\rm Diag}\bigg(\frac{p}{v-p}\bigg) = P(I-P)^{-1} \cr \cr }$$ Substituting this into the previous result yields the optimal value for the parameter matrix $$\eqalign{ X &= \Big(\frac{\beta}{a}\Big)\,C\,\Big(\frac{Y}{\beta}+I\Big) \cr &= \Big(\frac{\beta}{a}\Big)\,C\,\Big(P(I-P)^{-1}+I\Big) \cr &= \Big(\frac{\beta}{a}\Big)\,C\,\Big(P+(I-P)\Big)\,(I-P)^{-1} \cr &= \Big(\frac{\beta}{a}\Big)\,C\,(I-P)^{-1} \cr\cr }$$