$\newcommand{\norm}[1]{\left\lVert#1\right\rVert}$ Hi,
I would like to ask if this is a valid and rigorous proof for the product rule of differentiation in the multi-variate case. The book I am using for multivariate calculus doesn't provide a proof.
Theorem. Let $f,g:X\subseteq \mathbf{R}^n \to \mathbf{R}$ be two scalar-valued functions that are differentiable at $\mathbf{a}\in X$. Then, the product function $fg$ is also differentiable at $\mathbf{a}$, and
$$ D(fg)(\mathbf{a})=Df(\mathbf{a})g(\mathbf{a})+f(\mathbf{a})Dg(\mathbf{a}) $$
Sketch.
Step 1.
Let's first dispose off the fact, that the matrix of partial derivatives of the function $fg$ is indeed given by the expression above. I have:
\begin{align*} D(fg)(\mathbf{x}) &= \left[ \frac{\partial}{\partial x_1}(fg),\frac{\partial}{\partial x_2}(fg),\ldots,\frac{\partial}{\partial x_n}(fg) \right]\\ &=\left[ \frac{\partial f}{\partial x_1}g + f\frac{\partial g}{\partial x_1},\ldots,\frac{\partial f}{\partial x_n}g + f\frac{\partial g}{\partial x_n} \right] [ \text{Product rule for functions of 1 variable}]\\ &=g\left[\frac{\partial f}{\partial x_1} , \ldots,\frac{\partial f}{\partial x_n} \right] + f\left[\frac{\partial g}{\partial x_1} , \ldots,\frac{\partial f}{\partial x_n} \right] \\ &=Df(\mathbf{x})g(\mathbf{x}) + f(\mathbf{x})Dg(\mathbf{x}) \end{align*}
Step 2.
We are interested to prove that the product function $(fg)(\mathbf{x})=f(\mathbf{x})g(\mathbf{x})$ is indeed differentiable at $\mathbf{x}=\mathbf{a}$.
Let's explore the expression
$$ \frac{\norm{f(\mathbf{x})g(\mathbf{x})-[f(\mathbf{a})g(\mathbf{a}) - D(fg)(\mathbf{a})(\mathbf{x}-\mathbf{a})]}}{\norm{\mathbf{x}- \mathbf{a}}} $$
We can write:
\begin{align*} &\frac{\norm{f(\mathbf{x})g(\mathbf{x})-[f(\mathbf{a})g(\mathbf{a}) + D(fg)(\mathbf{a})(\mathbf{x}-\mathbf{a})]}}{\norm{\mathbf{x}- \mathbf{a}}}\\ =& \frac{\norm{f(\mathbf{x})g(\mathbf{x}) - f(\mathbf{a})g(\mathbf{x}) + f(\mathbf{a})g(\mathbf{x}) - [f(\mathbf{a})g(\mathbf{a}) + D(fg)(\mathbf{a})(\mathbf{x}-\mathbf{a})] }}{\norm{\mathbf{x}- \mathbf{a}}}\\ =& \frac{\norm{f(\mathbf{x})g(\mathbf{x}) - f(\mathbf{a})g(\mathbf{x}) + f(\mathbf{a})g(\mathbf{x}) - [f(\mathbf{a})g(\mathbf{a}) + \{Df(\mathbf{a})g(\mathbf{a}) + f(\mathbf{a})Dg(\mathbf{a})\}(\mathbf{x}-\mathbf{a})] }}{\norm{\mathbf{x}- \mathbf{a}}}\\ =& \small{\frac{\norm{f(\mathbf{x})g(\mathbf{x}) - f(\mathbf{a})g(\mathbf{x}) - Df(\mathbf{a})g(\mathbf{x})(\mathbf{x}-\mathbf{a}) + Df(\mathbf{a})g(\mathbf{x})(\mathbf{x}-\mathbf{a}) + f(\mathbf{a})g(\mathbf{x}) - f(\mathbf{a})g(\mathbf{a}) - f(\mathbf{a})Dg(\mathbf{a})(\mathbf{x}-\mathbf{a}) - Df(\mathbf{a})g(\mathbf{a})(\mathbf{x}-\mathbf{a})}}{\norm{\mathbf{x}- \mathbf{a}}}}\\ \le& \small{ \frac{\norm{g(\mathbf{x})} \norm{f(\mathbf{x}) - f(\mathbf{a}) - Df(\mathbf{a})(\mathbf{x} - \mathbf{a})}}{\norm{\mathbf{x}- \mathbf{a}}} + \frac{\vert f(\mathbf{a})\vert \cdot\norm{g(\mathbf{x}) - g(\mathbf{a}) - Dg(\mathbf{a})(\mathbf{x}-\mathbf{a})} }{\norm{\mathbf{x}- \mathbf{a}}} + \vert Df(\mathbf{a})\vert \norm{g(\mathbf{x}) -g(\mathbf{a}) }} \end{align*}
Since, $f$ is differentiable at $\mathbf{x}=\mathbf{a}$, there exists a $\delta_1>0$, such that the first expression can be made smaller than $\epsilon/3$ whenever $0 < \norm{\mathbf{x} - \mathbf{a}} < \delta_1$.
Since, $g$ is differentiable at $\mathbf{x}=\mathbf{a}$, there exists a $\delta_2>0$, such that the second term can be made smaller than $\epsilon/(3 \vert f(\mathbf{a})\vert)$ whenever $0 < \norm{\mathbf{x} - \mathbf{a}} < \delta_2$.
Since, $g$ is continuous at $\mathbf{x}=\mathbf{a}$, there exists a $\delta_3>0$, such that the third term can be made smaller than $\epsilon/(3 \vert Df(\mathbf{a})\vert)$ whenever $0 < \norm{\mathbf{x} - \mathbf{a}} < \delta_3$.
If I pick $\delta = \min\{\delta_1,\delta_2,\delta_3\}$, I can argue, that the above sum can be made smaller than an arbitrary $\epsilon$.
I am not sure, how I can argue that the coefficient of the first term $\norm{g(\mathbf{x})}$ is in some way bounded. It is continuous in the neighbourhood $\norm{\mathbf{x} - \mathbf{a}} < \delta_1$. So, perhaps, it's bounded, but I'm still studying analysis. :)