Given the matrices $\mathbf{A} \in \mathbb{R}^{M \times N}$ and $\mathbf{B} \in \mathbb{R}^{M \times M}$,
\begin{align} \arg \min_{\mathbf{X} \in \mathbb{R}^{N \times M}} \operatorname{Tr} \left( (\mathbf{A} \mathbf{X})^T \mathbf{B} ( \mathbf{A} \mathbf{X}) \right) - 2 \operatorname{Tr} ( \mathbf{B} \mathbf{A} \mathbf{X} ) \end{align}
where $\operatorname{Tr} (\cdot)$ denotes the trace operator. The above problem can be rewritten as follows
\begin{align} \arg \min_{\mathrm{vec}(\mathbf{AX})} \mathrm{vec}(\mathbf{AX})^T (\mathbf{B} \otimes \mathbf{I}) \mathrm{vec}(\mathbf{AX}) - 2 \mathrm{vec}(\mathbf{B} ) \mathrm{vec}(\mathbf{AX}). \end{align}
The above optimization can be solved easily for $\mathrm{vec}(\mathbf{AX})$ as it is a quadratic program with no constraints. Suppose, we are given prior information that
$$\mathbf{X}_{ik}^{\min}<\mathbf{X}_{ik}<\mathbf{X}_{ik}^{\max}$$
How do I solve it as an inequality-constrained optimization problem for $\mathrm{vec}(\mathbf{X})$ not $\mathrm{vec}(\mathbf{AX})$?
In case $ B $ is a Positive Definite Matrix then there is $ {C}^{T} C = B $ by the Cholesky Decomposition.
So the problem can be rewritten as:
$$\begin{aligned} \arg \min_{X} \quad & \frac{1}{2} {\left\| A X C \right\|}_{F}^{2} - \operatorname{Tr} \left( D X \right) \\ \text{subject to} \quad & L \leq X \leq U \quad \text{Element wise} \end{aligned}$$
Where $ D = B A $.
Then the gradient of the objective function is easy:
$$ {A}^{T} A X C {C}^{T} - {D}^{T} $$
Now, just use Projected Gradient Descent and you're done.
Remark
In case the matrix is Positive Semi Definite, then you can use the LDL Decomposition and build $ C $ from there in the same manner. If $ B $ is neither PSd nor PD then the problem is not convex. Then you can do the same but only local solution is guaranteed.