Nabla/del of $(y_i-\hat y(\mathbf x,t_i))^2$?

31 Views Asked by At

I want to take nabla, w.r.t. $\mathbf x$, of the function $$ f(\mathbf x)=\sum_{i=1}^m(y_i-\hat y(\mathbf x,t_i))^2 \tag 1 $$

Is the following correct: \begin{align} \nabla f(\mathbf x) &= \nabla\bigg(\sum_{i=1}^m(y_i-\hat y(\mathbf x,t_i))^2\bigg)\\ &= \nabla\sum_{i=1}^m(y_i-\hat y(\mathbf x,t_i))(y_i-\hat y(\mathbf x,t_i))\\ &= \sum_{i=1}^m(y_i-\hat y(\mathbf x,t_i))\nabla(y_i-\hat y(\mathbf x,t_i))\\ &= -\sum_{i=1}^m(y_i-\hat y(\mathbf x,t_i))\nabla\hat y(\mathbf x,t_i))\tag 2 \\ &\dots \text{Multiply?} \end{align}

Or must I expand the parenthesis first: \begin{align} \nabla f(\mathbf x) &= \nabla\bigg(\sum_{i=1}^m (y_i-\hat y(\mathbf x,t_i))^2 \bigg)\\ &= \nabla\sum_{i=1}^m ( y_i^2-y_i\hat y(\mathbf x,t_i)-\hat y(\mathbf x,t_i)y_i +\hat y(\mathbf x,t_i)^2 )\\ &= \nabla\sum_{i=1}^m ( y_i^2-2y_i\hat y(\mathbf x,t_i) +\hat y(\mathbf x,t_i)^2 )\\ &= \sum_{i=1}^m ( -2y_i\nabla\hat y(\mathbf x,t_i) +\nabla\hat y(\mathbf x,t_i)^2 ) \tag 3\\ &\dots\text{Stuck here} \end{align}

2

There are 2 best solutions below

0
On

Both are the same,

\begin{align} \nabla f(\mathbf x) &= \nabla\bigg(\sum_{i=1}^m(y_i-\hat y(\mathbf x,t_i))^2\bigg)\\ &= \color{blue}{2}\nabla\sum_{i=1}^m(y_i-\hat y(\mathbf x,t_i))(y_i-\hat y(\mathbf x,t_i))\\ &= \color{blue}{2}\sum_{i=1}^m(y_i-\hat y(\mathbf x,t_i))\nabla(y_i-\hat y(\mathbf x,t_i))\\ &= -\color{red}{2\sum_{i=1}^m(y_i-\hat y(\mathbf x,t_i))\nabla\hat y(\mathbf x,t_i))}\tag 1 \end{align}

If you expand instead

\begin{align} \nabla f(\mathbf x) &= \nabla\bigg(\sum_{i=1}^m (y_i-\hat y(\mathbf x,t_i))^2 \bigg)\\ &= \nabla\sum_{i=1}^m ( y_i^2-y_i\hat y(\mathbf x,t_i)-\hat y(\mathbf x,t_i)y_i +\hat y(\mathbf x,t_i)^2 )\\ &= \nabla\sum_{i=1}^m ( y_i^2-2y_i\hat y(\mathbf x,t_i) +\hat y(\mathbf x,t_i)^2 )\\ &= \sum_{i=1}^m ( -2y_i\nabla\hat y(\mathbf x,t_i) +\nabla\hat y(\mathbf x,t_i)^2 ) \\ &= \sum_{i=1}^m ( -2y_i\nabla\hat y(\mathbf x,t_i) +2 \hat y(\mathbf x,t_i)\nabla\hat y(\mathbf x,t_i) ) \\ &= \color{red}{-2\sum_{i=1}^m(y_i-\hat y(\mathbf x,t_i))\nabla\hat y(\mathbf x,t_i))}\tag 2 \end{align}

$(1) \equiv (2)$

0
On

Instead of diving into indices, you can handle this problem using vector/matrix notation.

For convenience, define the vector/matrix variables $$\eqalign{ w=({\hat y}-y),\,\,\,\, J = \frac{\partial{\hat y}}{\partial x} = \nabla{\hat y} }$$ Write the function in terms of these, and find its differential and gradient $$\eqalign{ f &= w^Tw \cr df &= 2w^Tdw = 2w^Td{\hat y} = 2w^TJ\,dx = 2(J^Tw)^Tdx \cr \frac{\partial f}{\partial x} &=2J^Tw = 2\,(\nabla{\hat y})^T({\hat y}-y) }$$