Gradients in mean and variance normalization not matching finite differences approximation

64 Views Asked by Bumbble Comm At 30 Mar 2026 - 6:27

I'm implementing a normalization function in C# that applies mean and variance normalization to an input matrix. The mean and variance are calculated for the entire matrix, not row by row. I'm trying to calculate the gradients of this function using the chain rule, but the gradients I compute don't match the approximations obtained using the finite differences method. I would appreciate any help in understanding why the gradients don't match and if there's an issue in my calculations.

The normalization function applies mean and variance normalization in two steps:

Mean normalization: $y_1 = x - \mu$ Variance normalization: $y_2 = \frac{y_1}{\sqrt{\sigma^2 + \epsilon}}$ Here's the gradient calculation for each step:

Mean normalization:

Gradient of $y_1$ with respect to $x$: $\frac{dy_1}{dx} = 1$

Gradient of $y_1$ with respect to $\mu$: $\frac{dy_1}{d\mu} = -1$

Gradient of $\mu$ with respect to $x$: $\frac{d\mu}{dx} = \frac{1}{N}$, where $N$ is the number of elements in $x$.

Variance normalization:

Gradient of $y_2$ with respect to $y_1$: $\frac{dy_2}{dy_1} = \frac{1}{\sqrt{\sigma^2 + \epsilon}}$

Gradient of $\sigma^2$ with respect to $x$: $\frac{d\sigma^2}{dx} = \frac{2(x - \mu)}{N}$

Gradient of $y_2$ with respect to $\sigma^2$: $\frac{dy_2}{d\sigma^2} = -\frac{1}{2}\left(\frac{y_1}{(\sigma^2 + \epsilon)^{\frac{3}{2}}}\right)$

Gradient of $y_2$ with respect to $\mu$: $\frac{dy_2}{d\mu} = -\frac{1}{N \cdot (\sigma^2 + \epsilon)^{\frac{3}{2}}}$

Gradient of $\sigma^2$ with respect to $\mu$: $\frac{d\sigma^2}{d\mu} = -\frac{2}{N}\sum(x - \mu)$

Combining the gradients:

Using the chain rule, I computed the gradients of the output with respect to $x$ and $\mu$:

Gradient of the output with respect to $x$: $\frac{d\text{Output}}{dx} = \left(\frac{dy_2}{dy_1}\right)\left(\frac{dy_1}{dx}\right) + \left(\frac{dy_2}{d\sigma^2}\right)\left(\frac{d\sigma^2}{dx}\right)$

Gradient of the output with respect to $\mu$: $\frac{d\text{Output}}{d\mu} = \left(\frac{dy_2}{dy_1}\right)\left(\frac{dy_1}{d\mu}\right) + \left(\frac{dy_2}{d\sigma^2}\right)\left(\frac{d\sigma^2}{d\mu}\right) + \frac{dy_2}{d\mu}$

The issue I'm facing is that the computed gradients don't match the approximations obtained using the finite differences method. The differences are not within an acceptable range (max diff is around 3.18), and I'm not sure why this is happening.

Could you please help me understand if there's an issue in my gradient calculations or if there's a better way to compute these gradients?

Here's the NormalizationForward function for reference:

public static double[,] NormalizationForward(double[,] input, int outputHeight, int outputWidth, double epsilon)
        {
            double[,] output = (double[,])input.Clone();

            double mean = 0;
            double var = 0;
            double N = outputHeight * outputWidth;

            // Calculate mean
            for (int i = 0; i < outputHeight; i++)
            {
                for (int j = 0; j < outputWidth; j++)
                {
                    mean += input[i, j];
                }
            }
            mean /= N;

            // Calculate variance
            for (int i = 0; i < outputHeight; i++)
            {
                for (int j = 0; j < outputWidth; j++)
                {
                    var += Math.Pow(input[i, j] - mean, 2);
                }
            }
            var /= N;

            // Normalize activations
            for (int i = 0; i < outputHeight; i++)
            {
                for (int j = 0; j < outputWidth; j++)
                {
                    output[i, j] = (input[i, j] - mean) / Math.Sqrt(var + epsilon);
                }
            }

            return output;
        } ```

Original Q&A

Gradients in mean and variance normalization not matching finite differences approximation

Related Questions in CALCULUS

Related Questions in OPTIMIZATION

Related Questions in NUMERICAL-METHODS

Related Questions in VARIANCE

Related Questions in GRADIENT-DESCENT

Trending Questions

Popular # Hahtags

Popular Questions