Succinctly express Jacobian of a simple vector-valued function as a matrix

45 Views Asked by At

For any $U \in \mathbb R^{m \times d}$ and $v \in \mathbb R^m$, let $\theta = \mathrm{cat}(\mathrm{vec}(U),\mathrm{vec}(v)) \in \mathbb R^{N}$ be the concatenation of the vectorization of $U$ and $v$, where $N:=m(d+1)$. Note that $(U,v)$ and $\theta$ are equivalent representations of the same object. Consider the function $T:\mathbb R^N \to \mathbb R^d$ defined by $T(\theta) := U^\top v \in \mathbb R^d$.

Question 1. How to succinctly write the Jacobian $\nabla T (\theta)$ as an $N \times d$ matrix in terms of $U$ and $v$ ?

I think this should be possible via some clever usage of Kronecker products, Hadamard products, block matrices, etc., but my matrix calculus skills are a bit rusty.

Question 2. Is it true that $\nabla T(\theta)$ has full rank $d$ for almost all $\theta$ ?

1

There are 1 best solutions below

0
On BEST ANSWER

$ \def\bbR#1{{\mathbb R}^{#1}} \def\bs{\boldsymbol} \def\t{\theta}\def\e{\varepsilon}\def\o{{\tt1}} \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\vc#1{\op{vec}\LR{#1}} \def\qiq{\quad\implies\quad} \def\m#1{\left[\begin{array}{c|c}#1\end{array}\right]} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} $Define the zero vector $z\in\bbR{d},\;$ the identity matrix $I\in\bbR{d\times d},\;$ the augmented matrix and its Gramian $$\eqalign{ A &= {\bs[}\,U\;v\,{\bs]} = {\bs[}\,u_1\;u_2\:\cdots\ u_d\:\,v\,{\bs]}, \qquad A^TA &= \m{U^TU&U^Tv\\\hline v^TU&v^Tv} \\ }$$ where the $(1,2)$ block of the latter is the required vector function.

Now define block matrix analogs of the standard cartesian basis vectors $$\eqalign{ E_1 = \m{I \\ z^T} \qquad E_2 = \m{z \\ \o} \;\equiv\; e \\ }$$ which allow us to extract the desired $(1,2)$ partition as $$\eqalign{ T \;=\; U^Tv \;=\; E_1^T\LR{A^TA}E_2 \\ }$$ Calculate the differential of the $T$ function $$\eqalign{ dT &= E_1^T\LR{A^TdA+dA^TA}e \\&= E_1^TA^TdA\,e + E_1^TdA^TAe \\ }$$ Then vectorize it and isolate the gradient wrt $\t$ $$\eqalign{ \t &= \vc{A} \\ d\t &= \vc{dA} \\ dT &= \LR{e\otimes AE_1}^T d\t + \LR{E_1\otimes Ae}^T d\t \\ \grad{T}{\t} &= \LR{e\otimes AE_1 \;+\; E_1\otimes Ae}^T \\ }$$ You might want to transpose this result, depending on your preferred layout convention.