Differentiating with respect to a vector transpose

845 Views Asked by At

Suppose a normal random regression model. Then the normal equations by the method of least squares expressed in matrix forms look like: $$Q=(Y-X\beta)^T(Y-X\beta)$$ where Q is the quantity we would like to minimize to get the least square estimates. Y is a ($n$ x 1) vector, X is a ($n$ x 2) matrix, and $\beta$ is a (2 x 1) vector.

To actually obtain the least square estimates, we need to differentiate Q with respect to $\beta$ as follows:

  1. First expand the quantity Q: $$Q=Y^TY-\beta^TX^TY-Y^TX\beta+\beta^TX^TX\beta$$
  2. Utilize $Y^TX\beta = \beta^TX^TY$ and manipulate terms: $$Q=Y^TY-2\beta^TX^TY+\beta^TX^TX\beta$$
  3. Then take the derivative of Q with respect to $\beta$ and equate the result to the zero vector, $0$: $${\partial Q\over\partial\beta}=-2X^TY+2X^TX\beta=0$$

However, I can't see how the vector transpose $\beta^T$ disappeared in the above equation when the derivative was taken with respect to $\beta$ for Q.

I believe the steps I am missing are in elementary matrix calculus, but how do you differentiate an equation that contains both $\beta^T$ and $\beta$ in its terms (for example, $Q=Y^TY-2\beta^TX^TY-\beta^TX^TX\beta$ in our earlier example) with respect to a vector $\beta$?

3

There are 3 best solutions below

2
On

It is indeed a basic result of matrix calculus. For a vector $x$ and symmetric matrix $A$ you have $$ \frac{\partial }{ \partial x} x' A x = 2x'A. $$

You can derive this result by writing explicitly the quadratic form $x' A x$, i.e., $$ x'A x = \sum_i \sum_j x_i x_j a_{ij} = \sum x_i^2a_{ii} + 2\sum_{i > j}x_i x_ja_{ij}, $$ now by taking partial derivative w.r.t. to $x_k$ you'll get $$ \frac{\partial }{ \partial x_k} x'Ax = 2 x_k a_{kk} + 2 \sum_{i \neq k} x_ia_{ik} = 2 x ^ T a_{\cdot k}, $$ where $a_{\cdot k}$ is the $k$th column of $A$. Doing th same for every $k$, you'll get $$ 2x^TA. $$ In least square derivation $x = \beta$, and $A = X'X$, and instead of a row vector you're working with a column vector (the derivation is w.r.t. $\beta ^ T$), but the procedure is the same.

0
On

As everyone knows, the dot product of two vectors $\,(x,y\in{\mathbb R}^n)\,$ is commutative $$x\cdot y = y\cdot x$$ But when restated in matrix notation, it looks odd to claim that $$x^Ty=y^Tx$$ probably because we are so alert to the non-commutative behavior of matrices, for which it is definitely not true, i.e. $$X^TY \ne Y^TX$$ Given the function $\,\,\phi=y\cdot x,\,$ the gradient wrt $x$ is $$\frac{\partial\phi}{\partial x}=y$$ and remains so, whether you choose to write the function as a dot product or a matrix product $$\eqalign{ \phi &= x^Ty=y^Tx = y\cdot x \cr\cr }$$

0
On

The plain simple answer is that the Differentiation of $β^T$($X^T$X)B becomes ($X^T$X)$\hat \beta$ ($\hat \beta$ is the Beta estimate)

$B^T$ disappears as usual. `