I am trying to solve for $B_k$ in the following equation, as the function of two separate least-squares expressions:
$$ B_k = \arg \min ||X_k - B_kD_kA^T||^2 + \mu_k||B_k - P_kB^*||^2 $$
This is part of a larger algorithm for non-negative PARAFAC2 that I am trying to implement, and is applicable for 3rd order tensors. However, for the purposes of this expression we are only considering one slice of the tensor.
I understand that to solve for $B_k$, we would set $X_k = B_kD_kA^T$ and take the right inverse of $D_kA^T$ to isolate $B_k$. However, I am unsure of how to set up the expression such that $B_k$ is the least-squares solution to both of the expressions. It must be solved using non-negative least squares, as per the article.
After some more research, I think that I have come up with a solution to the problem. First, take the derivative with respect to $B_k$, and set it to 0.
$$ \frac{\partial}{\partial B_{k}}(||X_k - B_kD_kA^T||^2 + \mu_k||B_k-P_kB^*||^2) = 0 $$
Understanding that the sum of squared residuals can also be described as the square of the Frobenius Norm of the difference between $X_k$ and $B_kD_kA^T$, or $B_k$ and $P_kB^*$ via:
$$ ||A||_F = \sqrt{\sum_{i=1}^m\sum_{j=1}^n|a_{ij}|^2} = \sqrt{tr(A^TA)} = \sqrt{tr(AA^T)} $$
We can rearrange the original equation, adding an arbitrary constant, $\frac{1}{2}$, to aid with simplification later on.
$$ \frac{\partial}{\partial B_{k}}\frac{1}{2}tr((X_k - B_kD_kA^T)(X_k - B_kD_kA^T)^T) + \frac{\partial}{\partial B_{k}}\frac{1}{2}\mu_k(tr((B_k - P_kB^*)(B_k-P_kB^*)^T) = 0 $$
Expanding the equations, where $(X_k - B_kD_kA^T)^T = X_k^T - AD_kB_k^T$ and $(B_k - P_kB^*)^T = B_k^T - B^{*T}P_k^T$:
$$ \frac{\partial}{\partial B_k}\frac{1}{2}(tr(X_kX_k^T) -tr(X_kAD_kB_k^T) - tr(B_kD_kA^TX_k^T) + tr(B_kD_kA^TAD_kB_k^T)) +\frac{\partial}{\partial B_k}\frac{1}{2}\mu_k(tr(B_kB_k^T)-tr(B_kB^{*T}P_k^T) - tr(P_kB^*B_k^T) + tr(P_kB^*B^{*T}P_k^T)) = 0 $$
This equation can be simplified by using some convenient identities from the Matrix Cookbook: $$ \frac{\partial}{\partial{B_k}}tr(X_kX_k^T) = 0 $$
$$ \frac{\partial}{\partial{B_k}}tr(X_kAD_kB_k^T) = X_kAD_k $$
$$ \frac{\partial}{\partial{B_k}}tr(B_kD_kA^T) = X_kAD_k $$
$$ \frac{\partial}{\partial{B_k}}tr(B_kD_kA^T) = X_kAD_k $$
$$ \frac{\partial}{\partial{B_k}}tr(B_kD_kA^TAD_kB_k^T) = 2B_kD_kA^TAD_k $$
$$ \frac{\partial}{\partial{B_k}}tr(B_kB^{*T}P_k^T) = PkB^* $$
$$ \frac{\partial}{\partial{B_k}}tr(P_kB^*B_k^T) = P_kB^* $$
$$ \frac{\partial}{\partial{B_k}}tr(P_kB^*B^{*T}P_k^T) = 0 $$
$$ \frac{\partial}{\partial{B_k}}tr(B_kB_k^T) = 2B_k $$
Substituting into the previous equation yields:
$$ -X_kAD_k + B_kD_kA^TAD_k + \mu_kB_k - \mu_kP_kB^* = 0 $$
Multiplying by the inverse of $D_kA^TAD_k$:
$$ -X_kAD_k(D_kA^TAD_k)^{-1} + B_k + \mu_kB_k(D_kA^TAD_k)^{-1} - \mu_kP_kB^*(D_kA^TAD_k)^{-1} = 0 $$
Solving for $B_k$:
$$ -X_kAD_k + \mu_kB_k - \mu_kP_kB^* = -B_k(D_kA^TAD_k) $$
$$ -X_kAD_k - \mu_kP_kB^* = -B_k(D_kA^TAD_k) - \mu_kB_k $$
Yields the final form of the equation: $$ B_k = \frac{X_kAD_k + \mu_kP_kB^*}{D_kA^TAD_k + \mu_kI_R} $$
Although the notation used differs from the notation used in the paper (i.e. $B_k$ is described as belonging to the first mode and not the second as was described in the paper), the solution is identical to the solution found in the published code.