Matrix Differentiation

85 Views Asked by At

Consider a differentiable function $f: \mathbb R \to \mathbb R$ and two $p\times 1$ vectors $x$ and $\theta$. Then define a new function as follows. $$ f\left( x^T\theta \right)x. $$ Now we want to find the derivative of the new function with respect to $\theta$. $$ \frac{d}{d\theta}f\left( x^T\theta \right)x = f'\left( x^T\theta \right) \frac{d}{d\theta}\left( x^T\theta \right) x = f'\left( x^T\theta \right) x x. $$ So $f'\left( x^T\theta \right)$ is a scalar. $\frac{d}{d\theta}\left( x^T\theta \right)$ should a column vector. However, this clearly is not right since it does not make sense to have $xx$. The correct answer is $xx^T$. However, I cannot see why this should be the case. Is this some kind of convention? Could anyone help me, please? Thank you!

2

There are 2 best solutions below

4
On BEST ANSWER

You can do this systematically as follows.

Use the Chain Rule together with the fact that the derivative of a linear transformation $A$ is itself. Define the linear transformations: $$\varphi_x\colon\mathbb{R}^p\to\mathbb{R},\ \varphi_x(\theta) = x^T\theta$$ $$\psi_x\colon\mathbb{R}\to\mathbb{R}^p,\ \psi_x(y) = xy$$ Then, your function is $g(\theta) = (\psi_x\circ f\circ \varphi_x)(\theta)$ with derivative \begin{align} g'(\theta) &= \psi_x'(f(x^T\theta)) \cdot f'(x^T\theta) \cdot \varphi_x'(\theta) \\ &= x \cdot f'(x^T\theta) \cdot x^T \\ &= f'(x^T\theta) xx^T \end{align}

0
On

I assume you mean by $\frac{d}{d\theta}$ the Jacobian.

For a function $\phi:\mathbb R^p\to\mathbb R$ notice that $$\frac{d}{d\theta} [\phi(\theta) x] \ne \frac{d\phi}{d\theta} (\theta) x, $$ but rather $$\frac{d}{d\theta} [\phi(\theta) x] = x\frac{d\phi}{d\theta} (\theta). $$ You get this by differentiating the component functions of $\phi(\theta)x$, namely $\phi(\theta)x_i$, $1\le i \le p$.