I am using the Denominator layout, i.e. $$\frac{\partial X\beta}{\partial \beta} = X^T,$$ where $X$ is $n\times p$ and $\beta$ is $p\times 1$.
What is the result of $$\frac{\partial \exp(X\beta)}{\partial \beta} \text{ ?}$$
Since $\exp(X\beta)$ is $n\times1$ and $\beta$ is $p\times 1$, the derivative should be a $p\times n$ matrix. However, this is what I derived:
$$\frac{\partial \exp(X\beta)}{\partial \beta} = \frac{\partial X\beta}{\partial \beta}\frac{\partial \exp(X\beta)}{\partial X\beta} = X^T\exp(X\beta),$$ which is a $p\times1$. Where did I make a mistake?
This is an application of the chain rule:
$$f(\mathbf{x})=[e^{x_1},\ldots,e^{x_n}]^\intercal$$ $$f'(\mathbf{x})=\operatorname{diag}(e^{x_1},\ldots,e^{x_n})$$
Denoting the $i$-th row of $X$ by $\mathbf{x}_i$, $1\leq i\leq n$, $$g(\boldsymbol{\beta})=X\boldsymbol{\beta}=[\mathbf{x}^\top_1\boldsymbol{\beta},\ldots,\mathbf{x}^\top_n\boldsymbol{\beta}]^\intercal$$ $$g'(\boldsymbol{\beta})=X$$
we obtain $$h(\boldsymbol{\beta})=f\circ g(\boldsymbol{\beta})=[\exp(\mathbf{x}^\top_1\boldsymbol{\beta}),\ldots,\exp(\mathbf{x}^\top_n\boldsymbol{\beta})]^\intercal$$
and so,
$$ \begin{align} h'(\boldsymbol{\beta})&=f'(g(\boldsymbol{\beta}))\,g'(\boldsymbol{\beta})=\operatorname{diag}\big(\exp(\mathbf{x}^\top_1\boldsymbol{\beta}),\ldots,\exp(\mathbf{x}^\top_n\boldsymbol{\beta})\big)\,X\\ &=\begin{pmatrix} e^{\mathbf{x}_1^\top\beta}x_{11} &\ldots&e^{\mathbf{x}_1^\top\beta}x_{1p}\\ \vdots & \vdots & \vdots\\ e^{\mathbf{x}^\top_n\beta}x_{n1} &\ldots& e^{\mathbf{x}^\top_n\beta}x_{np} \end{pmatrix} \end{align} $$
The last matrix can be express in a more compact way in terms of Kronecker product, which is very used in higher level languages such as MatLab, R, etc.