I'm looking through the Coursera course for Machine Learning and had a calculus question.
Let
$x^{(i)} = \begin{bmatrix}1 & x^{(i)}_1&x^{(i)}_2 & \cdots & x^{(i)}_p\end{bmatrix}$
$\theta = \begin{bmatrix}\theta_1 \\ \theta_2\ \\ \cdots \\ \theta_p \\ \theta_{p+1} \end{bmatrix}$
$y_i$ be a number.
I'm not sure I understand how to do this:
$\frac{\partial}{\partial \theta_j} y_i \theta x^{(i)} = y_ix_j^{(i)}$
When taking the derivative of $y_i \theta x^{(i)}$ with respect to $\theta_j$, it is makes some sense to me that the derivative is $y_ix_j^{(i)}$ as that $j$th change in theta is only multiplied against the $j$th item in $x^{(i)}$. However, I don't understand the mathematical way of deriving that.
$$\begin{align*} \frac{\partial}{\partial \theta_j} (y_i \theta x^{(i)}) & = y_i \frac{\partial}{\partial \theta_j} (\theta x^{(i)}) \\ & = y_i \frac{\partial}{\partial \theta_j} (\theta_1 x_1^{(i)} + \theta_2 x_2^{(i)} + \dotsb + \theta_p x_p^{(i)}) \\ & = y_i \left [\frac{\partial}{\partial \theta_j} (\theta_1 x_1^{(i)}) + \frac{\partial}{\partial \theta_j} (\theta_2 x_2^{(i)}) + \dotsb + \frac{\partial}{\partial \theta_j} (\theta_p x_p^{(i)}) \right] \\ & = y_i (0 + 0 + \dotsb + x_j^{(i)} + \dotsb + 0) \\ &= y_i x_j^{(i)}\end{align*}$$