I have the following optimisation problem: $$\min_{\substack{ \theta_1,\ldots,\theta_n \\ \beta_1,\ldots,\beta_m}}\quad \sum_{t=1}^T \frac{1}{2}(\boldsymbol{\theta}_t^\top \boldsymbol{x}_t + \boldsymbol{\beta}_t^\top \boldsymbol{z}_t-r_t)^2 +\frac{1}{2}\sum_i \|\boldsymbol{\theta}_i\|^2+\frac{1}{2}\sum_j \|\boldsymbol{\beta}_j\|^2$$
where $\theta_t\in \{\theta_1,\ldots,\theta_n\}$ and $\beta_t \in \{\beta_1,\ldots,\beta_m\}$.
I would like to solve this problem in the primal mode, however, the parameters are dependent on each other that plug one in the another does not help. I tried stacking parameters in different ways also, but either it is not possible, or it is very costly in terms of memory. What do you suggest to try?
You can use Alternating Minimisation to optimise this objective function. You are optimising two vectors $\theta$ and $\beta$ so you can alternate between solving each variable vector while keeping the other vector constant.
Let $f(\theta_t, \beta_t)$ be your objective function. You can follow these steps to optimise it:
Assign $\theta_0$ and $\beta_0$ random values from say a uniform distribution between 0 and 1.
Fixing $\beta_{t-1}$, solve for $\theta_{t}$ as $$\theta_{t} = \arg \min_{\theta_t} f(\theta_t, \beta_{t-1})$$You can use gradient descent for this optimisation.
Similarly, fixing $\theta_{t}$, solve for $\beta_{t}$ as $$\beta_{t} =\arg \min_{\beta_{t}} f(\theta_t, \beta_{t})$$
Repeat steps 2 and 3 until the objective function no longer decreases by some epsilon. As a sanity check, the objective function should never increase between consecutive iterations.
Alternating Minimisation (AM) is a popular method that is used for objective functions where dependencies between variables are tight. Matrix completion is one example where AM is efficient, See Algorithm 1 in this paper as it gives an idea on solving such objective functions.