Consider the following linear least squares problem with the objective of minimizing the loss function $$L = \frac{1}{N} \sum_{i=1}^N (\vec{c} \cdot \vec{x}_i - y_i)^2$$ where $N$ is the number of samples, $\vec{c}$ is a $d$-dimensional vector of unknowns while $(\vec{x}_i,y_i)$ for $i=1,\dots,N$ are the data.
I am very well aware that this is a convex optimization problem with a known analytical solution but I am making some numerical experiments and I'm trying to understand the results of my numerical experiments. Instead, I am solving this optimization problem with an optimizer such as adam. For the "optimal" solution provided to me by the optimizer, I compute $\|\nabla_{\vec{c}} L\|_2$ the norm of the gradient of the loss and check whether it is close to 0. This would then suggest that we're at the global minimum since this is a convex problem.
My observation is that for larger $N$ (more samples), then $\|\nabla_{\vec{c}} L\|_2$ is closer to 0. Is there a reason why this is the case? The only vague explanation I can think of is that the gradient of the loss can be thought of as an average because of the summation over $N$ terms divided by $N$. Thus, increasing $N$ gives a more accurate result.
However, I am not so sure about this intuition and was curious to hear other opinions.