Least-Squares Solution that Minimizes Two Expressions

107 Views Asked by At

I am trying to solve for $B_k$ in the following equation, as the function of two separate least-squares expressions:

$$ B_k = \arg \min ||X_k - B_kD_kA^T||^2 + \mu_k||B_k - P_kB^*||^2 $$

This is part of a larger algorithm for non-negative PARAFAC2 that I am trying to implement, and is applicable for 3rd order tensors. However, for the purposes of this expression we are only considering one slice of the tensor.

I understand that to solve for $B_k$, we would set $X_k = B_kD_kA^T$ and take the right inverse of $D_kA^T$ to isolate $B_k$. However, I am unsure of how to set up the expression such that $B_k$ is the least-squares solution to both of the expressions. It must be solved using non-negative least squares, as per the article.

Link to the paper

1

There are 1 best solutions below

0
On BEST ANSWER

After some more research, I think that I have come up with a solution to the problem. First, take the derivative with respect to $B_k$, and set it to 0.

$$ \frac{\partial}{\partial B_{k}}(||X_k - B_kD_kA^T||^2 + \mu_k||B_k-P_kB^*||^2) = 0 $$

Understanding that the sum of squared residuals can also be described as the square of the Frobenius Norm of the difference between $X_k$ and $B_kD_kA^T$, or $B_k$ and $P_kB^*$ via:

$$ ||A||_F = \sqrt{\sum_{i=1}^m\sum_{j=1}^n|a_{ij}|^2} = \sqrt{tr(A^TA)} = \sqrt{tr(AA^T)} $$

We can rearrange the original equation, adding an arbitrary constant, $\frac{1}{2}$, to aid with simplification later on.

$$ \frac{\partial}{\partial B_{k}}\frac{1}{2}tr((X_k - B_kD_kA^T)(X_k - B_kD_kA^T)^T) + \frac{\partial}{\partial B_{k}}\frac{1}{2}\mu_k(tr((B_k - P_kB^*)(B_k-P_kB^*)^T) = 0 $$

Expanding the equations, where $(X_k - B_kD_kA^T)^T = X_k^T - AD_kB_k^T$ and $(B_k - P_kB^*)^T = B_k^T - B^{*T}P_k^T$:

$$ \frac{\partial}{\partial B_k}\frac{1}{2}(tr(X_kX_k^T) -tr(X_kAD_kB_k^T) - tr(B_kD_kA^TX_k^T) + tr(B_kD_kA^TAD_kB_k^T)) +\frac{\partial}{\partial B_k}\frac{1}{2}\mu_k(tr(B_kB_k^T)-tr(B_kB^{*T}P_k^T) - tr(P_kB^*B_k^T) + tr(P_kB^*B^{*T}P_k^T)) = 0 $$

This equation can be simplified by using some convenient identities from the Matrix Cookbook: $$ \frac{\partial}{\partial{B_k}}tr(X_kX_k^T) = 0 $$

$$ \frac{\partial}{\partial{B_k}}tr(X_kAD_kB_k^T) = X_kAD_k $$

$$ \frac{\partial}{\partial{B_k}}tr(B_kD_kA^T) = X_kAD_k $$

$$ \frac{\partial}{\partial{B_k}}tr(B_kD_kA^T) = X_kAD_k $$

$$ \frac{\partial}{\partial{B_k}}tr(B_kD_kA^TAD_kB_k^T) = 2B_kD_kA^TAD_k $$

$$ \frac{\partial}{\partial{B_k}}tr(B_kB^{*T}P_k^T) = PkB^* $$

$$ \frac{\partial}{\partial{B_k}}tr(P_kB^*B_k^T) = P_kB^* $$

$$ \frac{\partial}{\partial{B_k}}tr(P_kB^*B^{*T}P_k^T) = 0 $$

$$ \frac{\partial}{\partial{B_k}}tr(B_kB_k^T) = 2B_k $$

Substituting into the previous equation yields:

$$ -X_kAD_k + B_kD_kA^TAD_k + \mu_kB_k - \mu_kP_kB^* = 0 $$

Multiplying by the inverse of $D_kA^TAD_k$:

$$ -X_kAD_k(D_kA^TAD_k)^{-1} + B_k + \mu_kB_k(D_kA^TAD_k)^{-1} - \mu_kP_kB^*(D_kA^TAD_k)^{-1} = 0 $$

Solving for $B_k$:

$$ -X_kAD_k + \mu_kB_k - \mu_kP_kB^* = -B_k(D_kA^TAD_k) $$

$$ -X_kAD_k - \mu_kP_kB^* = -B_k(D_kA^TAD_k) - \mu_kB_k $$

Yields the final form of the equation: $$ B_k = \frac{X_kAD_k + \mu_kP_kB^*}{D_kA^TAD_k + \mu_kI_R} $$

Although the notation used differs from the notation used in the paper (i.e. $B_k$ is described as belonging to the first mode and not the second as was described in the paper), the solution is identical to the solution found in the published code.