Problem
As part of an optimization problem I have a matrix $\Gamma$ that is the response in a multivariate regression problem
$$\Gamma = \mathbf{AB}$$
where $\mathbf{A}$ is a matrix with $m \times r$ coefficients for the $m$ polynomials of order $r-1$ and $\mathbf{B}$ is the design matrix with $r \times n$ entries.
This is part of the maximum likelihood estimation of $\mathbf{A}$ in order to fit a basis expansion model where $\Gamma$ contains the coefficients for the $m$ basis functions for $n$ moments in time.
However, there are constraints to this problem, which are
- The column sum of $\Gamma$ should be equal to 1
- Every value in $\Gamma$ has to lie within the interval $[0,1]$
In order to incorporate these constraints there are several options:
- Apply the isometric log-ratio transformation (see this document on compsoitional data analysis) to the regression problem, which implicitly handles both constraints
- Approach this as a constrained optimization problem and use appropriate solvers
- Make this into an unconstrained problem using the penalty method, basically introducing somewhat rigorous penalty terms to the optimization problem that punish violation of the constraints.
Currently I have used approach 1 and modeled $\Gamma$ using the ilr transformation $$\Gamma = ilr^{-1}(\mathbf{AB})$$
because it seems to be the more elegant soultion to incorporate the constraints. However, since the coefficients in $\mathbf{A}$ are now part of the ilr transformed space I find it hard applying regularization to them in order to improve the fit of the model. I guess for this it would be easier to have $\mathbf{A}$ in the original space, therefore I am considering options 2 and 3.
Questions
- Which option would you recommend in this case?
- I understand that the penalty method can only serve as an approximation to the actual constrained problem, but is it still worth considering? I have tried coming up with fitting penalty functions for this case but since they should also be differentiable I am struggling with it.
Edit:
In the constrained case it is not straight forward to me whether I can in fact pass these constraints to the solver, it'd be appreciated to get some input on this.
Now what is a bit annoying is that Matlab reshapes $\mathbf{A}$ to a column vector before passing it to the solver.
$$\mathbf{A} = \begin{pmatrix} \alpha_{1,1} & \alpha_{1,2} & \cdots & \alpha_{1,r} \\ \alpha_{2,1} & \alpha_{2,2} & \cdots & \alpha_{2,r} \\ \vdots & \vdots & \ddots & \vdots \\ \alpha_{m,1} & \alpha_{m,2} & \cdots & \alpha_{m,r} \end{pmatrix}; \mathbf{a} = \begin{pmatrix} \alpha_{1,1}\\ \alpha_{2,1}\\ \vdots\\ \alpha_{m,1}\\ \alpha_{1,2}\\ \vdots\\ \alpha_{m,2}\\ \vdots\\ \alpha_{m,r}\\ \end{pmatrix} \\ $$
and only allowing one linear inequality and one linear equality constraint:
- $\mathbf{C}\cdot \mathbf{a} \leq \mathbf{d}$
- $\mathbf{D}\cdot \mathbf{a} = \mathbf{e}$
I think I can't even make my constraints fit that, can I?