I want to differentiate the following expression with respect to $b$
$(Y-Xb)'(Y-Xb)$
Where $Y$ is $n\times1$ and $X$ is $n\times k$ and $b$ is $k\times1$, ' denotes transpose. If i do it term by expand the expression and do it term by term I get the right result,
but somehow when I do $(Y-Xb)' \frac{d}{dx}(Y-Xb)$ I will get $(Y-Xb)'(X)'$ compare to right solution I should get $(Y-Xb)'X$... I'm slightly confused, is there anything I should pay attention to when using product rule on this?
Thanks
The derivative is a linear map. So $F(b)=Y-Xb$ has derivative $F'(b)$ whose action on a $k\times 1$ matrix $y$ is $F'(b)y = -Xy$. And $G(b)=(Y-Xb)'$ has derivative $G'(b)y=(-Xy)'$. By the product rule, the derivative of $H(b)=G(b)F(b)$ is $$ \begin{align} H'(b)y & = (G'(b)y)F(b)+G(b)(F'(b)y) \\ & =(-Xy)'(Y-Xb)+(Y-Xb)'(-Xy) \\ & = -y'X'Y+y'X'Xb-Y'Xy+b'X'Xy \end{align} $$ This must be the same as the derivative obtained by first expanding and differentiating: $$ H(b) = (Y-Xb)'(Y-Xb) = Y'Y-b'X'Y-Y'Xb+b'X'Xb. $$ The derivative of the constant term $Y'Y$ is $0$. So, $$ H'(b)y=-y'X'Y-Y'Xy+y'X'Xb+b'X'Xy. $$ There is no simple way to write $H'(b)$ without applying $y$, which is typical because of the non-commutative nature of matrices. The expression on the right is linear in $y$, but it's not in the form one usually expects for a linear operator. If you were to choose a new basis, then the expression on the right would have a matrix form $A[y]_{B}$ where $[y]_{B}$ is a column vector $y$ with respect to the basis $B$, but this complexity is really not needed. Coordinate representations of linear operators with respect to bases spoil us into not wanting to deal with linear operators in other forms, and that's the downside of learning linear theory through coordinate-dependent matrices.