what is derivative of $\exp(X\beta)$ w.r.t $\beta$

316 Views Asked by At

I am using the Denominator layout, i.e. $$\frac{\partial X\beta}{\partial \beta} = X^T,$$ where $X$ is $n\times p$ and $\beta$ is $p\times 1$.

What is the result of $$\frac{\partial \exp(X\beta)}{\partial \beta} \text{ ?}$$

Since $\exp(X\beta)$ is $n\times1$ and $\beta$ is $p\times 1$, the derivative should be a $p\times n$ matrix. However, this is what I derived:

$$\frac{\partial \exp(X\beta)}{\partial \beta} = \frac{\partial X\beta}{\partial \beta}\frac{\partial \exp(X\beta)}{\partial X\beta} = X^T\exp(X\beta),$$ which is a $p\times1$. Where did I make a mistake?

2

There are 2 best solutions below

1
On BEST ANSWER

This is an application of the chain rule:

$$f(\mathbf{x})=[e^{x_1},\ldots,e^{x_n}]^\intercal$$ $$f'(\mathbf{x})=\operatorname{diag}(e^{x_1},\ldots,e^{x_n})$$

Denoting the $i$-th row of $X$ by $\mathbf{x}_i$, $1\leq i\leq n$, $$g(\boldsymbol{\beta})=X\boldsymbol{\beta}=[\mathbf{x}^\top_1\boldsymbol{\beta},\ldots,\mathbf{x}^\top_n\boldsymbol{\beta}]^\intercal$$ $$g'(\boldsymbol{\beta})=X$$

we obtain $$h(\boldsymbol{\beta})=f\circ g(\boldsymbol{\beta})=[\exp(\mathbf{x}^\top_1\boldsymbol{\beta}),\ldots,\exp(\mathbf{x}^\top_n\boldsymbol{\beta})]^\intercal$$

and so,

$$ \begin{align} h'(\boldsymbol{\beta})&=f'(g(\boldsymbol{\beta}))\,g'(\boldsymbol{\beta})=\operatorname{diag}\big(\exp(\mathbf{x}^\top_1\boldsymbol{\beta}),\ldots,\exp(\mathbf{x}^\top_n\boldsymbol{\beta})\big)\,X\\ &=\begin{pmatrix} e^{\mathbf{x}_1^\top\beta}x_{11} &\ldots&e^{\mathbf{x}_1^\top\beta}x_{1p}\\ \vdots & \vdots & \vdots\\ e^{\mathbf{x}^\top_n\beta}x_{n1} &\ldots& e^{\mathbf{x}^\top_n\beta}x_{np} \end{pmatrix} \end{align} $$

The last matrix can be express in a more compact way in terms of Kronecker product, which is very used in higher level languages such as MatLab, R, etc.

2
On

For typing convenience, define the vectors $$\eqalign{ y &= X\beta \quad&\implies\quad &dy = X\,d\beta \\ e &= \exp(y) \quad&\implies\quad &E = {\rm Diag}(e) \\ }$$ The differential of an elementwise function requires an elementwise/Hadamard product, which can be replaced by the standard product with a diagonal matrix. $$\eqalign{ de &= e\odot dy = E\,dy \\ }$$ Substitute $dy\,$ to obtain $$\eqalign{ de &= EX\,d\beta \\ \frac{\partial e}{\partial\beta} &= EX \\ }$$ Or in your preferred layout convention $\,X^TE^T$ $$\eqalign{ }$$