Minimize Variances of multiple depending datasets

55 Views Asked by At

Let $A=(a_{i,j})$ be a $(n \times m)$ Matrix. I'm looking for values $C_1,..., C_m > 0$ such that the variances, meaning $\sum_j (a_{ij}-$(Average of row i)$)^2$, of each row of the matrix A are minimized.

$$\left( \begin{array}{cc} a_{11}C_1 & a_{12}C_2 & ... & a_{1m}C_m \\ .\\.\\. \\ a_{n1}C_1 &...&... &a_{nm}C_m \end{array} \right) $$

So, I know the values of this matrix above but neither the entries of $A$ nor the values of the $C_i$. Of course this problem is solvable when $n=1$ as then I can just choose one of the $C_i$ and calculate the others such that the variance is 0. But in my case $m=18$ and $n=300$.

Also, in the ideal case the entries of each row of $A$ would be the same.

I was thinking about minimizing the sum of all the variances or maybe I can use some variant of the Least Squares method but I'm a bit stuck.

EDIT: small example. given some values: $$(x_{ij})=\left(\begin{array}{cc} 1 & 1.43 & 0.95 \\ 0.7 & 0.75 & 0.82 \\ 0.24 & 0.3 & 0.31 \\ 0.46 & 0.5 & 0.48 \end{array}\right)$$

I want to find a factor $C_j$ per column such that when $x_{ij}=a_{ij}C_j$ the sum of the variances of the rows of the corresponding matrix $A=(a_{ij})$ is minimized. So: $$min_{C\in R^m} \sum_{i=1}^n\sum_{j=1}^m\Bigg(\frac{x_{ij}}{C_j}-\frac{1}{m}\Big(\sum_j\frac{x_{ij}}{C_j}\Big)\Bigg)^2. $$ s.t $C_j>0$.

Ideally the $C_j$ should be somewhere around 1.

2

There are 2 best solutions below

7
On

Let's define $c := [c_1 ~~ \dots ~~ c_m]^T$ and $C := \operatorname{diag}\{c\}$. Presumably you want to minimize the sum of the variances of the rows of the matrix $AC$. Note that $AC\textbf{1}/m=Ac/m$ is the vector with the means of the rows, where $\textbf{1}$ is the vector with all elements 1. Define

$$ X := AC - (1/m) Ac \textbf{1}^T $$

where the mean vector is repeated $m$ times. Your problem is equivalent to minimizing $\operatorname{tr}(X X^T) = \operatorname{vec}(X)^T \operatorname{vec}(X)$ where $\operatorname{vec}(\cdot)$ is the vectorization operator. Note that

$$ \operatorname{vec}(X) := (\tilde{A} - (1/m) (\textbf{1} \otimes A))c $$

where $\otimes$ is the Kronecker product and $\tilde{A}$ is $nm \times m$ matrix which is block diagonal and each block is the column vector of $A$, i.e. $\tilde{A} := \operatorname{blkdiag}\{a_i\}$ where $a_i$ are the columns of $A$.

Let

$$ B := \tilde{A} - (1/m) (\textbf{1} \otimes A) $$

Now the problem becomes minimizing $c^TB^TBc$. It is well-known that the minimizer can be found as the eigenvector associated with the minimum eigenvalue of $B^T B$.

Note that this may not guarantee that $c_i>0$. For this you have to consider the literature of quadratic optimization with inequality constraints.

0
On

I'd try to use NLP model with variables $ C_j$ & $ a_{i,j}$ with constraints

$ C_j a_{i,j} = x_{i,j} \quad \forall i,j$: this is non-linear constraint but most solvers will be able to address this.

$ 1 - \epsilon \le C_j \le 1+\epsilon$ where $0 \le \epsilon \le 1$ a small number depending upon much close to $1$ you'd want $C$.

If you use variance, your objective will be nonlinear. You can use absolute value instead by linearizing it with additional variables $ z$ and constraints
$ ma_{i,j} - \sum_j a_{ij} \le mz_i$
$ \sum_j a_{ij} - ma_{i,j} \le mz_i$

Use $ \sum_i z_i$ as your objective