Motivation: in a computer science book, there are two $\mathbb{R}^d\to\mathbb{R}$ functions $f$ and $g$ related by
$$f(\textbf{x}) = x_kA\left(g(\textbf{x})\right)+ B(\textbf{x})$$ for functions $A:\mathbb{R}\to\mathbb{R}$ and $B:\mathbb{R}^d\to\mathbb{R}$. The author casually writes
$$\frac{\partial f}{\partial g}(\textbf{x}) = x_kA'\left(g(\textbf{x})\right)$$
and, while I understand the intuition, I'm struggling to formally and generally define $\partial f/\partial g$.
The question: given two $\mathbb{R}^d\to\mathbb{R}$ functions $f$ and $g$, when and how are we to define $$\frac{\partial f}{\partial g}?$$
My attempts: here are two definitions that have failed so far:
- Definition $1$: we say $f$ is differentiable with respect to $g$ if there is a function $\phi$ such that $$f = \phi\circ g$$ in which case we define $$\frac{\partial f}{\partial g}(\textbf{x}) = \phi'(g(\textbf{x})).$$
The issue here is that such $\phi$ (almost) never exists. Furthermore, the original relation in the book does not follow this form. We need something more general.
- Definition $2$: we say $f$ is differentiable with respect to $g$ iff there is a function $\phi$ such that $$f(\textbf{x}) = \phi\left(g(\textbf{x}),\textbf{x}\right)$$ for any $\textbf{x}\in\mathbb{R}^d$, in which case we define $$\frac{\partial f}{\partial g}(\textbf{x}) = \frac{\partial \phi}{x^0}(g(\textbf{x}),\textbf{x})$$ in coordinates $\phi(x^0,x^1\ldots,x^d) = \phi(x^0,\textbf{x})$.
Letting $\phi : (x^0,\textbf{x})\mapsto x_kA(x^0)+B(\textbf{x})$ we find that the example at the beginning is covered by this definition. The issue now has to do with the domain of $\phi$; I did not specify it in the definition because I do not know what its domain should be.
If we set the domain of $\phi$ as $$\bigg\{(g(\textbf{x}),\textbf{x}) : \textbf{x}\in\mathbb{R}^d\bigg\}$$ then it is often so small that it does not allow us to differentiate: if $g$ is given by $(x,y)\mapsto x$, we'd ideally wish for $$\frac{\partial f}{\partial g} = \frac{\partial f}{\partial x},$$ but in such case $\text{dom}(\phi)$ is $$\bigg\{ (t,x,y) : t = x\bigg\},$$ which does not allow us to differentiate.
On the other hand, if we require the domain of $\phi$ to be $\mathbb{R}^{d+1}$, then the derivative is no longer unique: simply let $f,g:\mathbb{R}\to\mathbb{R}$ be the identity. Then both $$\phi:(t,x) \mapsto x \ \ \ \ \text{ and } \psi:(t,x) \mapsto t$$ comply with the property $$x = \phi(x,x) = \psi(x,x),$$ yet $$0 = \frac{\partial\phi}{\partial x^0} \neq \frac{\partial\psi}{\partial x^0} = 1.$$
Speaking as someone who studied physics and has since spent more time on math than on physics:
I think where both of your approaches $1$ and $2$ differ from what’s usually meant by this notation (which, I agree with Ted Shifrin, is bad notation) is that you regard the expression by which $f$ is given as contingent and focus on $f$ as an abstract function, asking whether it can be represented as depending on $g$ and rightly finding that there may be different such representations that lead to different “derivatives”.
The notation doesn’t intend to represent the derivative of an abstract function but of the given functional form. The rule is quite simply to treat every instance of $g$ occurring in the expression as if it were an instance of a variable $g$, and to differentiate the resulting function with respect to that variable. This is not a derivative of the abstract function $f(x)$ in the mathematical sense; it’s more like a shorthand for applying the chain rule to the given expression.
So your question, which introduces $f$ as a function from $\mathbb R^d$ to $\mathbb R$ and asks how to define a derivative of this function, already gets up on the wrong side of the bed. Rather than considering all possible representations of such a function as a function that depends on intermediate variables, the given expression is implicitly treated as one particular such representation $\phi$, and the derivative, though written as a derivative of $f$, is really a derivative of this representing function $\phi$.