Least Squares for vector valued function

356 Views Asked by At

I want to use Least Squares Estimation(LSE) on a vector-valued function. Is it true that if the unknown parameters are independent then I can split the LSE problem into several smaller ones? Are there any nice reference for this kind of problem?

Consider the following example:

Let $$ [y_1, y_2] = \theta x $$ where $\theta$ is of size $2\times n$ and is to be determined from $N$ samples. Denote the $i$:th sample with $([y_{1,i},y_{2,i}], x_i)$.

Let $$ Y = \begin{bmatrix} y_{1,1} & y_{2,1} \\ y_{1,2} & y_{2,2} \\ \vdots & \vdots \\ y_{1,N} & y_{2,N} \end{bmatrix} , X = \begin{bmatrix} x^\top_1 \\ x_2^\top \\ \vdots \\ x_N^\top \end{bmatrix} $$ and $\theta_1$ and $\theta_2$ be respectively the first and second row of $\theta$.

Then, similarly to the scalar case, we can form the minimization problem $$ \begin{aligned} \min_\theta \|Y - [X\theta_1~~X \theta_2]\|^2 \end{aligned} $$ which, if the parameters in $\theta$ are independent, I believe is equivalent to $$ \left(\min_{\theta_1} \|Y_1 - X\theta_1\|^2 \right) + \left( \min_{\theta_2} \|Y_2 - X\theta_2\|^2 \right), $$ where $Y_1$ and $Y_2$ are respectively the first and second column of $Y$.

2

There are 2 best solutions below

0
On BEST ANSWER

If you mean that there are separate parameter vectors $\theta_1$ and $\theta_2$ for your two responses $y_1$ and $y_2$, then the answer is yes. The least squares loss function can be partitioned into the sum of two components $$ \|Y_1 - X\theta_1\|^2 + \|Y_2 - X\theta_2\|^2 $$ by plain algebra. And the minimum value of a function $f(a,b)$ that's expressible as $f(a,b):=g(a) + h(b)$ is achieved at the pair $(a^*,b^*)$ consisting of the points where $g$ and $h$ are individually minimized. Why is this the case? If $(\bar a, \bar b)$ is the place where $f(a,b)$ is minimized over all pairs, then $$g(\bar a)+h(\bar b) =:f(\bar a, \bar b) \le f(\bar a, b^*):= g(\bar a) + h(b^*)\le g(\bar a) + h(\bar b)$$ which implies $h(b^*)=h(\bar b)$. Similarly $g(a^*)=g(\bar a)$.

0
On

Yes. Define $X = [\boldsymbol{x}_1, \cdots, \boldsymbol{x}_n]$, $Y = [\boldsymbol{Y}^{(1)}, \boldsymbol{Y}^{(2)}]$, and $\theta = [\boldsymbol{\theta}^{(1)}, \boldsymbol{\theta}^{(2)}]$ and denote $u_k$ as the $k^{th}$ component of some vector $\boldsymbol{u}$. The optimization then becomes:

\begin{align} \min_{\theta} \left \lVert Y - X^T \theta\right \rVert^2_F &= \min_{\boldsymbol{\theta}^{(1)}, \boldsymbol{\theta}^{(2)}} \sum_{i=1}^n \sum_{j=1}^2 \left(Y^{(j)}_i - \boldsymbol{x}_i^T \boldsymbol{\theta}^{(j)}\right)^2 \\ &= \min_{\boldsymbol{\theta}^{(1)}, \boldsymbol{\theta}^{(2)}} \sum_{i=1}^n \left(Y^{(1)}_i - \boldsymbol{x}_i^T \boldsymbol{\theta}^{(1)}\right)^2 + \sum_{i=1}^n \left(Y^{(2)}_i - \boldsymbol{x}_i^T \boldsymbol{\theta}^{(2)}\right)^2 \\ &= \min_{\boldsymbol{\theta}^{(1)}} \sum_{i=1}^n \left(Y^{(1)}_i - \boldsymbol{x}_i^T \boldsymbol{\theta}^{(1)}\right)^2 + \min_{\boldsymbol{\theta}^{(2)}} \sum_{i=1}^n \left(Y^{(2)}_i - \boldsymbol{x}_i^T \boldsymbol{\theta}^{(2)}\right)^2 \end{align}

So you can find $\boldsymbol{\theta}^{(1)}$ and $\boldsymbol{\theta}^{(2)}$ separately.