Suppose we're tying to differentiate the function $f(x)=x^x$. Now the textbook method would be to notice that $f(x)=e^{x \log{x}}$ and use the chain rule to find $$f'(x)=(1+\log{x})\ e^{x \log{x}}=(1+\log{x})\ x^x.$$
But suppose that I didn't make this observation and instead tried to apply the following differentiation rules:
$$\frac{d}{dx}x^c=cx^{c-1} \qquad (1)\\ \frac{d}{dx}c^x = \log{c}\ \cdot c^x \quad (2)$$
which are valid for any constant $c$. Obviously neither rules are applicable to the form $x^x$ because in this case neither the base nor the exponent are constant. But if I pretend that the exponent is constant and apply rule $(1)$, I would get $f'(x)\stackrel{?}{=}x\cdot x^{x-1}=x^x.$ Likewise, if I pretend that the base is constant and apply rule $(2)$ I obtain $f'(x)\stackrel{?}{=}\log{x}\cdot x^x$.
It isn't hard to see that neither of the derivatives are correct. But here's where the magic happens: if we sum the two “derivatives” we end up with $$x^x+ \log{x}\cdot x^x=(1+\log{x})\ x^x$$ which is the correct expression for $f'(x)$.
This same trick yields correct results in other contexts as well. In fact, in some cases it turns out to be a more efficient way of taking derivatives. For example, consider $$g(x)=x^2 = \color{blue} x\cdot \color{red} x.$$ If we pretend the blue $\color{blue} x$ is a constant we would get $g'(x)\stackrel{?}{=}\color{blue}x\cdot 1=x$. Now if we pretend the red $\color{red}x$ is constant we get $g'(x)\stackrel{?}{=}1\cdot \color{red} x=x$. Summing both expressions we end up with $2x$ which is of course a correct expression for the derivative.
These observations have led me to the following conjecture:
Let $f(x,y)$ be a differentiable function mapping $\mathbb{R}^2$ to $\mathbb{R}.$ Let $f'_1 (x,y)=\frac{\partial}{\partial x} f(x,y)$ and $f'_2 (x,y)=\frac{\partial}{\partial y} f(x,y)$. Then for any $t$ we have: $$\frac{d}{dt}f(t,t)=f'_1 (t,t) + f'_2 (t,t).$$
(I apologise for the somewhat awkward notation which I could not seem to get around without causing undue ambiguity.)
This formulation also seems to lend itself to following generalisation:
Let $f:\mathbb{R}^N \to \mathbb{R}$ be a function differentiable in each of its variables $x_1,x_2,\ldots,x_N$. For $n=1,2,\ldots,N$ define $f'_n(x_1,x_2,\ldots,x_N)=\frac{\partial}{\partial x_n}f(x_1,x_2,\ldots,x_N)$. Let $t$ be any real number and define the $N$-tuple $T=(t,t,\ldots,t)$. Then one has: $$\frac{d}{dt} f(T)=\sum_n f'_n(T).$$
Thus my question is:
- Is this true?
- How can it be proven? (Specifically in the case $N=2$ but also in the general case.)
Your observation is true and follows from the multivariable chain rule. To see why, let $f \colon \mathbb{R}^2 \rightarrow \mathbb{R}$ be differentiable and let $\gamma \colon \mathbb{R} \rightarrow \mathbb{R}^2$ be a differentiable curve. Set $\gamma(t) = (\gamma_1(t),\gamma_2(t))$ and consider the composition $h(t) = f(\gamma(t))$ which is a differentiable function from $\mathbb{R}$ to $\mathbb{R}$. The chain rule implies that
$$ h'(t) = \frac{d}{dt} f(\gamma_1(t),\gamma_2(t)) = \frac{\partial f}{\partial x}(\gamma(t)) \cdot \gamma_1'(t) + \frac{\partial f}{\partial y}(\gamma(t)) \cdot \gamma_2'(t). $$
If we take $\gamma(t) = (t,t)$, we get your observation and this obviously generalizes for arbitrary $N$.
A direct proof is also possible using the definition of differentiability. Write $$f(x,y) = f(t_0,t_0) + \frac{\partial f}{\partial x}(t_0,t_0)(x - t_0) + \frac{\partial f}{\partial y}(t_0,t_0)(y - t_0) + r(x,y)$$
where
$$ \lim_{(x,y) \to (t_0,t_0)} \frac{r(x,y)}{\sqrt{(x - t_0)^2 + (y - t_0)^2}} = 0 $$
and then
$$ \frac{f(t,t) - f(t_0,t_0)}{t - t_0} = \frac{\partial f}{\partial x}(t_0,t_0) + \frac{\partial f}{\partial y}(t_0,t_0) + \frac{r(t,t)}{t - t_0} \xrightarrow[t \to 0]{} \frac{\partial f}{\partial x}(t_0,t_0) + \frac{\partial f}{\partial y}(t_0,t_0). $$
BTW, I agree with calling your observation "a trick" but I wouldn't call it obscure. In fact, it is useful in various contexts. For example, in differential geometry this is useful in proving that the lie bracket of two vector fields measures how an infinitesimal parallelogram obtained from the flows fails to close or how the curvature contributes to parallel transport along a closed loop. In both cases, one defines a function $f \colon (-\varepsilon, \varepsilon)^4 \rightarrow V$ which depends on four parameters (so $f = f(t_1,t_2,t_3,t_4)$) and one wants to compute the second derivative of $h(t) = f(t,t,t,t)$ at $t = 0$. Applying the chain rule, we have
$$ h''(0) = \sum_{i,j} \frac{\partial^2 h}{\partial t_i \partial t_j}(0,0,0,0) $$
and then one uses various symmetries to compute the partial derivatives. For more details, see here.