Consider a function $f(x,y)$ with $x$ being a function of $y$, that is $f(x(y), y)$. I would like to compute the gradient of this function $\nabla f = (\partial_x f, \partial_y f)$. Using the change of derivative formula, $$\frac{\partial}{\partial x} = \frac{\partial y}{\partial x}\frac{\partial}{\partial y}.$$ If we apply this partial derivative to the function $f$, we find $$\frac{\partial}{\partial x}f(x(y),y) = \frac{\partial y}{\partial x}\frac{\partial}{\partial y} f(x(y),y) = \frac{\partial y}{\partial x} \left[\frac{\partial f}{\partial x}\frac{\partial x}{\partial y} + \frac{\partial f}{\partial y} \right] = \frac{\partial f}{\partial x} + \frac{\partial y}{\partial x} \frac{\partial f}{\partial y}, $$ which gives an extra term $\frac{\partial y}{\partial x} \frac{\partial f}{\partial y}$ in addition to $\frac{\partial f}{\partial x}$. Why is there this extra term? Since $x$ is really a function of $y$, is the gradient $$ \nabla f = (\partial_x f, \partial_y f) $$ or $$\nabla f = (\frac{\partial y}{\partial x} \frac{\partial f}{\partial y} + \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y})$$ or even $$\nabla f = (\frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}+\frac{\partial f}{\partial x}\frac{\partial x}{\partial y} ) ?$$
Any help is appreciated.
Let $f:\mathbb R^2\to \mathbb R$ and $g:\mathbb R \to \mathbb R^2$ is given by $g(y) = (h(y),y)$ for some function $h:\mathbb R \to \mathbb R$. Then our goal is to compute the derivative of $f\circ g$. By the chain rule, it is $$f'(g(y))g'(y) = \partial_1 f(g(y))h'(y)+\partial_2f(g(y))$$ , where $$\partial_1 f(g(y)) = \lim_{h\to 0}\frac{f(g(y)+he_1)-f(g(y))}{h}$$ and similarly for $\partial_2 f(g(y))$.