Derivative of $J(\theta) = \frac 1 {2m} (X \theta - \mathbf{y})^{\intercal} (X \theta - \mathbf{y})$ with respect to $\theta$

474 Views Asked by At

I have

$$ J(\theta) = \frac 1 {2m} (X \theta - \mathbf{y})^{\intercal} (X \theta - \mathbf{y}) $$

in which, $X$ is $m \times n$ matrix, $\theta$ is $n \times 1$ vector, and $\mathbf{y}$ is $m \times 1$ vector.

Then I need to calculate $ \frac d {d\theta} J(\theta) $. I did:

$$\begin{eqnarray} \frac d {d\theta} J(\theta) & = & \frac 1 {2m} \frac d {d\theta} [ (X\theta - \mathbf{y})^{\intercal} (X\theta - \mathbf{y}) ] \\& = & \frac 1 {2m} [ \frac d {d\theta} (X\theta - \mathbf{y})^{\intercal} \cdot (X\theta - y) + (X\theta - y)^{\intercal} \cdot \frac d {d\theta} (X\theta - \mathbf{y}) ] \\ & = & \frac 1 {2m} [ X^{\intercal}(X\theta - \mathbf{y}) + (X\theta - \mathbf{y})^{\intercal}X ] \end{eqnarray}$$

From here, I have no idea how to proceed it. Because inside the square brackets, the 1st part is a $ n \times 1$ vector, the 2nd part is a $ 1 \times n $ vector. Some people says the parts inside the square brackets equal to:

$$ 2 (X \theta - \mathbf{y})^{\intercal} X $$

Well, the result does solve my problem. However, I want to know how they got here.

2

There are 2 best solutions below

0
On BEST ANSWER

Let $A=(X\theta-\mathbf{y})$. Observe that $A^TA$ is a scalar, call it $\alpha$. Thus $\alpha=A^TA=\sum_{j=1}a_j^2$.

Let $\theta_k$ be the $k^{\text{th}}$ component of the vector $\theta$, then \begin{align*} \frac{\partial \alpha}{\partial \theta_k} & =\sum_{j}2a_j \frac{\partial a_j}{\partial \theta_k}\\ & =2\sum_{j}a_j \frac{\partial a_j}{\partial \theta_k} \end{align*} This holds for all $k \in \{1,2,3, \ldots ,n\}$. Thus we get $$\frac{\partial \alpha}{\partial \theta}=2A^T\frac{\partial A}{\partial \theta}$$

0
On

Let

$$J (\theta) := \frac{1}{2 m} \| \mathrm X \theta - \mathrm y \|_2^2 = \frac{1}{2 m} (\mathrm X \theta - \mathrm y)^T (\mathrm X \theta - \mathrm y) = \frac{1}{2 m} (\theta^T \mathrm X^T \mathrm X \theta - 2 \theta^T \mathrm X^T \mathrm y + \mathrm y^T \mathrm y)$$

Thus,

$$\nabla J (\theta) = \frac{1}{2 m} (2 \mathrm X^T \mathrm X \theta - 2 \mathrm X^T \mathrm y + 0) = \frac{1}{2 m} 2 \mathrm X^T (\mathrm X \theta - \mathrm y) = \frac 1m \mathrm X^T (\mathrm X \theta - \mathrm y)$$