Derivative of matrix expression $(Y − A\beta)^TW(Y − A\beta)$ wrt $\beta$.

93 Views Asked by At

$Y$ and $\beta$ are $1 \times n$ matrices and $W$ is a diagonal $n \times n$ matrix.

What is the best way to think about how to simplify this expression and its derivative to get the expression below? What are the simple rules I should remember to get this?

$2A^TWA\beta − 2A^TW^TY$

2

There are 2 best solutions below

0
On

Throughout I implicitly sum over repeated indices. You're differentiating a scalar $Z_i W_{ij}Z_j,\,Z_i:=Y_i-A_{ik}\beta_k$ with respect to a vector $\beta$, giving a vector whose $k$th component is obtained by differentiating with respect to $\beta_k$. Since $\partial_k:=\tfrac{\partial}{\partial\beta_k}\implies\partial_k Z_i=-A_{ik}$, the product rule obtains $\partial_k(Z_i W_{ij}Z_j)=-A_{ik}W_{ij}Z_j-Z_i W_{ij}A_{jk}$. Now you can use the rules of matrix multiplication to rewrite this neatly as $-[Z^T(W+W^T)A]_k$, making the derivative $-Z^T(W+W^T)A=(A\beta-Y)^T(W+W^T)A$.

Now for a sanity check, which is worthwhile with any calculation this complex. If $W$ were a scalar instead, we'd have $\partial_k Z^TWZ=W\partial_k (Z^TZ)=-2Z^TWA$. But when we reinstate $W$'s matrix status, we note its antisymmetric part doesn't even contribute to the scalar we're differentiating, so without loss of generality $W$ should be replaced throughout with its symmetric part $(W+W^T)/2$. That gives $-Z^T(W+W^T)A$ instead, as we've found.

1
On

Define the vectors $$\eqalign{ g &= (Ab-y) &\implies dg=A\,db \cr h &= (W\!Ab-Wy) &\implies dh=W\!A\,db \cr }$$ Write the function in terms of these new variables and find its differential and gradient. $$\eqalign{ f &= g^Th \cr df &= h^Tdg + g^Tdh \cr &= (h^TA+g^TWA)\,db \cr &= (A^Th+A^TW^Tg)^T\,db \cr \frac{\partial f}{\partial b} &= A^Th+A^TW^Tg \cr &= A^T(WAb-Wy)+A^TW^T(Ab-y) \cr &= A^T(W+W^T)Ab - A^T(W+W^T)y \cr }$$ If $W=W^T$ this can be simplied to $$\eqalign{ \frac{\partial f}{\partial b} &= 2A^TWAb - 2A^TWy \cr }$$