Understanding the following differentiation

62 Views Asked by At

I am trying to understand how the partial derivative in eq(2) comes on taking partial derivative of eq (1).

$J(\theta) = \textbf{x}^T\textbf{x}- 2\textbf{x}^T\textbf{H}\theta+\theta^T \textbf{H}^T \textbf{H}\theta$ ---(1)

where $T$ denotes Transpose operator, $\textbf{H}$ is $N\times p$ matrix, $\theta$ is $p \times 1$ vector and $\textbf{x}$ is $N \times 1$ vector.

$ \frac{\partial J(\theta)}{\partial \theta} = -2 (\textbf{x}^T\textbf{H})^T+ 2\textbf{H}^T\textbf{H}\theta $ ---(2)

My query is that how in eq (2), Transpose appears in first term and 2 appears in second term.

2

There are 2 best solutions below

1
On BEST ANSWER

Note that $\sf Eq(1)$ is really a Frobenius norm $$\eqalign{ \def\L{\left} \def\R{\right} \def\t{\theta} \def\p{\partial} J &= \L\|H\t-x\R\|_F^2 \\ &= \L(H\t-x\R)^T\L(H\t-x\R) \\ }$$ Substuting $\,w=\L(H\t-x\R)\,$ creates an equation that's easy to differentiate $$\eqalign{ J &= w^Tw \\ dJ &= dw^Tw \;+\; w^Tdw \\ &= 2\,w^Tdw \\ &= 2\,w^T\L(H\,d\t\R) \\ &= 2\L(H^Tw\R)^T\,d\t \\ \frac{\p J}{\p \t} &= 2\,H^Tw \\ &= 2\L(H^TH\t-H^Tx\R) \\ }$$

2
On

You can use the following simple procedure.

Consider a function $J(\theta)$ with $\theta \in {\bf R}^n$, for which you want to compute $\partial J(\theta) / \partial \theta$.

By using the notions of differentials, we can write:

$$J(\theta + \Delta\theta) \eqsim J(\theta) + [\partial J(\theta) / \partial \theta]^\top \Delta\theta$$.

In other words, we performed a first--order approximation of $J(\theta)$. By simply substituting $\theta + \Delta \theta$ inside the argument of $J(\cdot)$, we can easily see what is the term linear in the differential $\Delta \theta$.

\begin{align} J(\theta + \Delta\theta) &= x^\top x -2x^\top H (\theta + \Delta\theta) + (\theta + \Delta\theta)^\top H^\top H (\theta + \Delta\theta) \\ &=x^\top x -2x^\top H \theta -2x^\top H \Delta\theta + \theta^\top H^\top H \theta + 2 \theta^\top H^\top H \Delta \theta + \Delta\theta^\top H^\top H \Delta\theta \\ &= x^\top x -2x^\top H \theta + \theta^\top H^\top H \theta - 2x^\top H \Delta\theta + 2 \theta^\top H^\top H \Delta \theta +\Delta\theta^\top H^\top H \Delta\theta \\ &= J(\theta) + (2 \theta^\top H^\top H -2 x^\top H^\top)\Delta \theta + \Delta\theta^\top H^\top H \Delta\theta \\ & = J(\theta) + (2 H^\top H \theta -2 Hx)^\top\Delta \theta + \cal{O(\Delta \theta^2)} \end{align}

From this you obtain:

$$\partial J(\theta)/\partial \theta = 2 H^\top H \theta -2 Hx$$

Notice that we basically look only at the linear term in $\Delta \theta$ and discard the higher order contribution $ \cal{O(\Delta \theta^2)} $