Objective
My objective is to use a neural network to create some tensor field that is automatically divergence free in 2D.
Background Math
Many parts of physics require some kind of field to be divergence free. Let us consider a second order tensor $\underline{\underline{\tau}}(\underline{x})$ field in 2D $\underline{x}=(x,y)$ that has to be divergence free $$ \underline{\nabla}\cdot\underline{\underline{\tau}} = \frac{\partial \tau_{ij}}{\partial x_i} \underline{e}_j = \underline{0} . $$ and symmetric $$ \underline{\underline{\tau}} = \underline{\underline{\tau}}^T $$
Traditionally, one can choose any sufficiently smooth scalar field, now called potential $\varphi(\underline{x})$, and create a symmetric and divergence free tensor from it in the following way: $$ \tau_{xx} = \frac{\partial^2 \varphi}{\partial y^2}, \;\; \tau_{yy} = \frac{\partial^2 \varphi}{\partial x^2}, \;\; \tau_{xy} = -\frac{\partial^2 \varphi}{\partial y \partial x} \;\; \text{and} \;\; \tau_{yx} = -\frac{\partial^2 \varphi}{\partial x \partial y} $$ It is easy to verify symmetry, as Clairaut's theorem claims that second derivatives are equal, thus: $$ \frac{\partial^2 \varphi}{\partial x \partial y} = \frac{\partial^2 \varphi}{\partial y \partial x} \;\; \text{thus} \;\; -\tau_{yx} = -\tau_{xy}$$ provided that the function $\varphi$ has continuous second partial derivatives.
Similarly, we can prove that it is divergence free, \begin{align} \underline{\nabla}\cdot\underline{\underline{\tau}} &= (\frac{\partial \tau_{xx}}{\partial x} + \frac{\partial \tau_{yx}}{\partial y}) \underline{e}_x + (\frac{\partial \tau_{xy}}{\partial x} + \frac{\partial \tau_{yy}}{\partial y})\underline{e}_y \\ &= (\frac{\partial^3 \varphi}{\partial xyy} - \frac{\partial^3 \varphi}{\partial yxy} )\underline{e}_x + (\frac{\partial^3 \varphi}{\partial yxx} + \frac{\partial^3 \varphi}{\partial xyx})\underline{e}_y \\ &= 0\underline{e}_x + 0\underline{e}_y \end{align}
Background ML
The idea is to let $\varphi$ be represented by a neural network. In pytorch it has been formulated with:
- 2 inputs, $x$ and $y$
- 1 output, $\varphi$
- 3 hidden layers with 32 neurons each
- Tanh activation function (ReLU or other non-smooth functions might not work)
while the derivatives are calculated with automatic differentiation.
Problem
I had expected that the automatic differentiation would result in a divergence free and symmetric tensor field, of course expecting some numerical errors. But this is not the case, I calculated the following errors: $$ E_\text{div} = \frac{1}{I} \sum_{i=0}^I \| \underline{\nabla}\cdot\underline{\underline{\tau}} \| \;\; \text{and} \;\; E_\text{sym} = \frac{1}{I}\sum_{i=0}^I (\tau_{xy}-\tau_{yx})^2$$ At the initial guess these are already $E_\text{div}\approx 0.0004$ clearly non-zero while $E_\text{sym}\approx 1\times 10^{-19}$ is accurate enough. During training these errors start to become larger, $E_\text{div}\approx 0.04$ and $E_\text{sym}\approx 1\times 10^{-15}$. Notice that these errors are not part of the loss function that is minimized, after all these should be zero automatically as we use the potential to derive them. I just printed them as I observed non-physical results.
Question
My questions are about the smoothness of a NN, that is the only thing that can cause these discrepancies, if the second derivative of the potential is not continuous, the approach will not work.
Thus, the question, what can be said about the smoothness of the NN to the input variables. I imagine that the activation function and the number of layers might play a role.
The mistake was in the programming itself, the logic does hold. After fixing a $-$ sign, my fields are nearly symmetric and divergence free of $E_\text{div} \sim E_\text{sym} \sim 10^{-15}$. For my purpose, this is an acceptable error.
The training here is however tricky, there can be significant overfitting even with very simple networks. The final network of my particular problem seemed to fit the data well, but when plotting the data on my very fine test dataset, it clearly didn't work appropriately. The second derivatives of a network seem to be very sensitive.