Taking derivative with respect to a vector

2.8k Views Asked by At

From time to time, I come across with derivation operations which are executed with regard to a vector. For example, the least squares estimation method with more than one explanatory variables is written like: $$y_i = \beta_1 + \beta_2 x_{2i} + ... + \beta_k x_{ki} + \epsilon_i $$

And then it is: $$ y = Xb + e $$

Where $y$ is the Nx1 column vector of target variables, $X$ is the Nxk matrix of the observation variables, $b$ is the kx1 column vector estimates of $\beta$ values and $e$ is the Nx1 column vector of residuals.

When arranged as $e =y - Xb$, the aim is to minimize the sum of the squares of residuals: $\sum_i e_i^2 = ||e||^2$.

Now, $||e||^2$ is a function of the vector $b$ and according to my naive understanding, we have to find partial derivatives $\dfrac{\partial(||e||^2)}{\partial b_i }$ for each $b_i$ component of $b$, set each of them to zero and solve the system of equations simultaneously.

But an alternative way of differentiation is shown as $\dfrac{\partial(||e||^2)}{\partial b } = -2X^Ty + 2X^TXb$. Here the derivative is taken w.r.t vector $b$.

Now derivatives with regards to a vector is a new concept for me. Is it a brand new thing or is it just a reorganization of numerous partial derivatives belonging to separate $b$ components into a unified matrix form? What exactly is going on here?

Thanks in advance.

2

There are 2 best solutions below

4
On BEST ANSWER

You might have seen this as $\nabla (\|e\|^2)$, the gradient. It is simply a vector consisting of the partial derivatives $\frac\partial{\partial b_i}$ in its $i$-th component.

$$(\frac\partial{\partial b} f)_i = \frac\partial{\partial b_i} f$$

0
On

In order to find the derivative let us keep the h-linear term in $$(y-X(b+h))^T(y-X(b+h))\approx(y-Xb)^T(y-Xb)\\ -h^TX^T(y-Xb)-(y-Xb)^TXh=(y-Xb)^T(y-Xb)\\ -2(y-Xb)^TXh,$$ so the derivative is $-2(y-Xb)^TX$.

Same if we used the chain rule and that $D(x^Tx)=2x^T$.

Note that the derivative here is not a column- but a row-vector (or covector, or dual vector).