Differentiating a column with respect to a matrix

219 Views Asked by At

Let $\mathbf{X} = [\mathbf{x}_1 | ... | \mathbf{x}_n]$ be a $m \times n$ matrix. I would like to differentiate $\mathbf{x}_i = \mathbf{X} \mathbf{e}_i$ (where $\mathbf{e}_i \in \mathbb{R}^{n \times 1}$ is the unit vectors with $1$ on the $i$th place and $0$'s in the rest) with respect to $\mathbf{X}$. Then $$ d\mathbf{x}_i = d(\mathbf{X}\mathbf{e}_i) = (\mathbf{X} + d\mathbf{X})\mathbf{e}_i - \mathbf{X}\mathbf{e}_i = (d\mathbf{X})\mathbf{e}_i $$ and therefore $$ \frac{d\mathbf{x}_i}{d\mathbf{X}} = \mathbf{e}_i \in \mathbb{R}^{n \times 1} $$ However, I suspect that is not consistent dimension-wise. For example: $f(\mathbf{X}) = \mathbf{a} \mathbf{x}_i$ where $\mathbf{a} \in \mathbb{R}^{1 \times m}$ then simply using the result above $$ \frac{d f(\mathbf{X})}{d\mathbf{X}} = \frac{d(\mathbf{a}\mathbf{x}_i)}{d\mathbf{X}} = \mathbf{a} \mathbf{e}_i \implies \mbox{Dimensions mismatch!} $$ since $\mathbf{a} \in \mathbb{R}^{1 \times m}$ and $\mathbf{e}_i \in \mathbb{R}^{n \times 1}$.

How to fix this issue? An idea is to put a pseudo identity matrix $$ \frac{d\mathbf{x}_i}{d\mathbf{X}} = \mathbf{I}_{m \times n} \mathbf{e}_i \in \mathbb{R}^{n \times 1} $$ such that $\mathbf{X} = \mathbf{X} \circ \mathbf{I}_{m \times n}$ with Hadamard product. But is this the right way to go?

1

There are 1 best solutions below

7
On BEST ANSWER

$\def\p#1#2{\frac{\partial #1}{\partial #2}}\def\E{{\cal E}}$Use $(\star)$ to denote the dyadic product and a colon to denote the double-dot product, i.e. $$\eqalign{ \Gamma &= A\star B \quad&\implies\quad \Gamma_{ijk\ell} = A_{ij}B_{k\ell} \\ Y &= \Gamma:X \quad&\implies\quad Y_{ij}= \sum_{k,\ell}\;\Gamma_{ijk\ell}X_{k\ell} \\ }$$ First, rewrite the linear equation $b=Xa\,$ using index notation $$\eqalign{ b_i &= X_{ik}\,a_k \\ &= \delta_{ij} X_{jk}\,a_k \\ &= \delta_{ij} a_k\,X_{jk} \\ }$$ where $\delta_{ik}$ is a Kronecker delta; these are simply the components of the identity matrix $I$.

Rewrite the linear equation using the dyadic and double-dot products, and then calculate its differential and gradient. $$\eqalign{ b &= (I\star a):X \\ db &= (I\star a):dX \\ \p{b}{X} &= (I\star a) \\ }$$ Finally, substitute $(a=e_i,\;b=Xa=x_i)\;$ to obtain $$\eqalign{ \p{x_i}{X} &= I\star e_i \\ }$$