Derivative using chain rule

88 Views Asked by At

I have the following function $$f=\sum_{i=1}^{n} \sum_{j=1}^{p} \bigg \lbrace y_{ij}(\boldsymbol{\lambda_j^{T}m_i}+\beta_{0j}) - \frac{1}{2} (\boldsymbol{\lambda_j^{T}m_i}+\beta_{0j}) - \frac{1}{4} \lbrace (\boldsymbol{\lambda_j^{T}m_i}+\beta_{0j})^2+\boldsymbol{\lambda_j^{T}V_i\lambda_j} \rbrace \bigg \rbrace$$ where $y_{ij}, \beta_{0j}$ real numbers, $\boldsymbol{\lambda_j}$, $\boldsymbol{m_i}$ $qx1$ vectors and $\boldsymbol{V_i}$ $qxq$ matrix. I want to calculate the $\frac{\partial f} {\partial{\lambda_{jk}}}$ where $\lambda_{jk}$ is the k-th element of the vector $\boldsymbol{\lambda_j}$. I started with the chain rule: $\frac{\partial f} {\partial{\lambda_{jk}}}=\frac{\partial f} {\partial{\boldsymbol{\lambda_j}}} \frac{\partial{\boldsymbol{\lambda_j}}} {\partial{\lambda_{jk}}}$ , however the first term will give a $qx1$ vector and the second a $1xq$, while I want the final result to be a real number. Is the chain rule wrong?

2

There are 2 best solutions below

5
On

You're using denominator layout. For consistent differentiation, you'll need to go left, because you're taking transpose of an expression written in numerator layour, $(AB)^T=B^TA^T$): $$\frac{\partial f}{\partial \lambda _{jk}}=\underbrace{\frac{\partial \lambda_j}{\partial \lambda_{jk}}}_{1\times q}\underbrace{\frac{\partial f}{\partial \lambda_j}}_{q\times 1}$$

Or, you could directly differentiate the expression wrt $\lambda_{jk}$.

0
On

$\def\e{\epsilon}\def\v{\varepsilon}\def\R#1{{\mathbb R}^{#1}}\def\o{{\tt1}}\def\p#1#2{\frac{\partial #1}{\partial #2}}$Let $\{e_i,\v_j,\e_k\}$ denote vectors from the standard basis for $\{\R{n},\R{p},\R{q}\}$ and define the all-ones vector/matrix variables $$\eqalign{ \o_n = \sum_{i=1}^n e_i \quad \o_p = \sum_{j=1}^p \varepsilon_j \quad \o_q = \sum_{k=1}^q \epsilon_k \qquad J_{np} = \o_n\o_p^T \quad J_{pp} = \o_p\o_p^T \\ }$$ and the double-dot product (of identically dimensioned matrices) $$A:B = \sum_{i=1}^n\sum_{j=1}^p A_{ij}B_{ij}$$

Then define the following vector/matrix variables and map them to the indexed quantities appearing in the problem statement $$\eqalign{ Y &\implies y_{ij} &= Y:e_i\varepsilon_j^T = e_i^TY\varepsilon_j \\ M &\implies m_i &= Me_i\\ L &\implies \lambda_j &= L\v_j \\ b &\implies \beta_{0j} &= b^T\v_j \\ W &\implies W &= \sum_{i=1}^n V_i \\ }$$ In other words, $\{M,L\}$ are matrices whose columns are the $\{m_i,\lambda_j\}$ vectors, while the individual components of $\{Y,b\}$ are the $\{y_{ij},\beta_{0j}\}$ scalars.

The following auxiliary matrix variables will be very convenient $$\eqalign{ A &= M^TL + \o_nb^T \quad&\implies\quad dA = M^TdL \\ S &= \tfrac 12\left(W+W^T\right) \quad&\implies\quad S = {\rm Sym}(W) \\ }$$ Write the objective function in a pure matrix form using these new variables.
Then calculate its differential and gradient. $$\eqalign{ f &= Y:A - \tfrac 12 J_{np}:A - \tfrac 14 A:A - \tfrac 14 J_{pp}:L^TWL \\ df &= Y:dA - \tfrac 12 J_{np}:dA - \tfrac 12 A:dA - \tfrac 14 J_{pp}:(L^TW\,dL+dL^TWL) \\ &= \left(Y-\tfrac 12J_{np}-\tfrac 12A\right):M^TdL - \tfrac 14 \left(W+W^T\right)LJ_{pp}:dL \\ &= \left(MY-\tfrac 12MJ_{np}-\tfrac 12MA - \tfrac 12SLJ_{pp}\right):dL \\ \p{f}{L} &= MY-\tfrac 12MJ_{np}-\tfrac 12MA - \tfrac 12SLJ_{pp} \;\;\doteq\;\; G\quad\{{\rm the\,gradient}\} \\ }$$ This gradient is a $(q\times p)$ matrix. To obtain individual components, simply contract it with the standard basis vectors $$\eqalign{ G_{kj} = \e_k^TG\v_j = G:\e_k\v_j^T }$$