I'm trying to rewrite a weighted least squares problem in matrix form for my research. Let $X$ be an $n \times d$ data matrix, let $Y$ be the corresponding $n \times 1$ target vector, and let $\Lambda$ be a diagonal matrix with weights $\lambda_1, \dots, \lambda_n$ on its diagonal. Let $w$ be the $d \times 1$ weights vector that we wish to find.
The risk function is given by
$$ \sum_{i=1}^n \lambda_i (w^Tx_i - y_i)^2 $$
where $x_i$ is the $i^{th}$ row of $X$. I'm trying to rewrite this more compactly using matrices (i.e. just in terms of the matrices $\Lambda, X, Y$ and the weight vector $w$.
I'm having some issues doing this because I can't figure out how to deal with the squared term. Could someone show me how to do this?
Let's build this up step-by-step. Consider the function (without $\lambda_i$s at first):
$$J(w) = \sum_{i=1}^n (w^T x_i -y_i)^2.$$
This is the scalar product of a vector $e$ with $n$ rows which contains the predition errors for measurement $i,$ namely $e_i = w^Tx_i - y_i$. Staring intensely at the equation, we acutally get $e = Xw - Y$, which means that $$J(w)= \langle e,e \rangle = (Xw - Y)^T(Xw - Y).$$
Now, we introduce the $\lambda_i$s. Coming from
$$J(w, \Lambda) = \sum_{i=1}^n \lambda_i(w^T x_i -y_i)^2,$$ we will first rewrite so its properly formulated in terms of $\Lambda$ and not its diagonal entries. Having itentified the equation above as a scalar product of $e$ with itself, we see $$J(w, \Lambda) = \sum_{i=1}^n e_ i \lambda_i e_i,$$ which looks almost like a scalar product of a vector with itself, but just a weighted one. In fact, if we introduce $[u,v]=\langle u,\Lambda v\rangle = u^T \lambda v$. Thus we get
$$J(w,\Lambda) = (Xw - Y)^T\Lambda (Xw - Y).$$