Gradient of linear least squares problem as number of samples increases

53 Views Asked by At

Consider the following linear least squares problem with the objective of minimizing the loss function $$L = \frac{1}{N} \sum_{i=1}^N (\vec{c} \cdot \vec{x}_i - y_i)^2$$ where $N$ is the number of samples, $\vec{c}$ is a $d$-dimensional vector of unknowns while $(\vec{x}_i,y_i)$ for $i=1,\dots,N$ are the data.

I am very well aware that this is a convex optimization problem with a known analytical solution but I am making some numerical experiments and I'm trying to understand the results of my numerical experiments. Instead, I am solving this optimization problem with an optimizer such as adam. For the "optimal" solution provided to me by the optimizer, I compute $\|\nabla_{\vec{c}} L\|_2$ the norm of the gradient of the loss and check whether it is close to 0. This would then suggest that we're at the global minimum since this is a convex problem.

My observation is that for larger $N$ (more samples), then $\|\nabla_{\vec{c}} L\|_2$ is closer to 0. Is there a reason why this is the case? The only vague explanation I can think of is that the gradient of the loss can be thought of as an average because of the summation over $N$ terms divided by $N$. Thus, increasing $N$ gives a more accurate result.

However, I am not so sure about this intuition and was curious to hear other opinions.