Simple vector derivative using chain rule

935 Views Asked by At

I am aware that I could factor out $\boldsymbol{v_c}$ first and then the answer is $\boldsymbol{u_1}^T + \boldsymbol{u_2}^T $. But I am doing this exercise mainly for convincing myself how chain rule works when vector is involved.

Suppose all vectors are column vectors. Here are the variables

$ J = \boldsymbol{u_1}^T \boldsymbol{v_c} + \boldsymbol{u_2}^T \boldsymbol{v_c} $

$ \theta_1 = \boldsymbol{u_1}^T \boldsymbol{v_c} $

$ \theta_2 = \boldsymbol{u_2}^T \boldsymbol{v_c} $

${\boldsymbol{\theta} = [\theta_1, \theta_2]^T}$

I would like to know answer to the following derivatives:

1. $\frac{\partial{J}}{\partial{\boldsymbol{\theta}}}$

2. $\frac{\partial\boldsymbol{\theta}}{\partial{\boldsymbol{v_c}}}$

3. $\frac{\partial{J}}{\partial{\boldsymbol{v_c}}}$

and I wonder if the dot product of the first two would equal the third one, according to chain rule? Or am I using chain rule correctly at all?

I am kind of confused about chain rule in the context of matrix. Matrix calculus reads quite overwhelming. If there is a more introductory source to this subject, please let me know.

2

There are 2 best solutions below

0
On BEST ANSWER

I think I've learned enough to answer the question myself, and share my answer.

Short answer: Yes in the form of $\frac{\partial \theta}{\partial v} \frac{\partial J}{\partial \theta} = \frac{\partial J}{\partial v}$. Otherwise, $\frac{\partial J}{\partial \theta} \frac{\partial \theta}{\partial v} $ won't work out given these are column vectors.

More details:

For simplicity and clarity, I will

  1. unbold symbols. Whether it's a vector or scalar should be easily inferable from the context
  2. use $v$ instead of $v_c$
  3. use superscript ($^{(i)}$) to indicate the $i$th element within a vector. So $\theta^{(1)}$ and $\theta^{(2)}$ are the same as $\theta_1$ and $\theta_2$ in the question.
  4. suppose $v$ is $n$-dimensional

Actually, in Jacobian, row vector is more natural.

In terms of row vector:

\begin{align*} \frac{\partial J}{\partial \theta^T} &= [1, 1] \\ \\ \frac{\partial \theta}{\partial v^T} &= \begin{bmatrix} \frac{\partial \theta^{(1)}}{\partial v^{(1)}} & \cdots & \frac{\theta^{(1)}}{\partial v^{(1)}} \\ \frac{\partial \theta^{(2)}}{\partial v^{(1)}} & \cdots & \frac{\theta^{(2)}}{\partial v^{(n)}} \\ \end{bmatrix} \\ \\ \frac{\partial J}{\partial v^T} &= \begin{bmatrix} \frac{\partial J}{\partial v^{(1)}} & \cdots & \frac{\partial J}{\partial v^{(n)}} \\ \end{bmatrix} \end{align*}

So $$ \frac{\partial J}{\partial \theta^T} \frac{\partial \theta}{\partial v^T} = \frac{\partial J}{\partial v^T} $$

Now, it's easy to transpose both sides, and see the result in terms of column vector.

\begin{align*} (\frac{\partial J}{\partial \theta^T} \frac{\partial \theta}{\partial v^T})^T &= (\frac{\partial J}{\partial v^T})^T \\ \frac{\partial \theta}{\partial v} \frac{\partial J}{\partial \theta} &= \frac{\partial J}{\partial v} \end{align*} The shapes of matrices on the LHS are $(n \times 2) \cdot (2 \times 1)$.

The matrix shape on the RHS is also $(n \times 1$, so that checks out.

What I originally intended to ask in the question was effectively whether

$$ \frac{\partial J}{\partial \theta} \frac{\partial \theta}{\partial v_c} =? \frac{\partial J}{\partial v_c} $$

The answer is NO because the shapes of matrices on the LHS won't checkout.

3
On

Denote the $k^{th}$ column of the matrix $U$ by $u_k$.
Denote the vector, all of whose components are equal to one, by $e$.

Then you can write the quantities of interest as $$\eqalign{ \theta &= U^Tv_c \cr J &= e^T\theta \cr }$$ and calculate their differentials $$\eqalign{ d\theta &= U^T\,dv_c \cr dJ &= e^T\,d\theta \cr &= e^TU^T\,dv_c \cr }$$ The requested gradients are therefore $$\eqalign{ \frac{\partial\theta}{\partial v_c} &= U^T \cr\cr \frac{\partial J}{\partial\theta} &= e^T \cr\cr \frac{\partial J}{\partial v_c} &= e^TU^T \cr\cr }$$