In this article, at the bottom of the page, for the overdetermined case, it is said that
An analogous technique can be used to solve an overdetermined set of equations.[...] In such a case, write the $n$ different functions as $f_i(\lambda_1,...,\lambda_n)$ for $i=1, ..., n$ and call their actual values $y_i$ [...]
However, I couldn't understand in which way we should define those $f_i$s ?
I mean I'm trying to do the same fitting for the same type of gaussian and I have more than $100$ data point and $3$ parameters only, so how should I define those $f_i$s so that fitted function will minimise the difference between all those data points ?
Take the example in your link (I changed $x_0$ there to $b$ in order to avoid confusion with data points $x_i$). You are trying to find parameters $A, b, \sigma$ to best fit observed data to $$Ae^{-(x-b)^2/(2*\sigma^2)}$$ For each of your 100 data points, the ith of which well call $x_i$, there is a corresponding observation $y_i$. In this case, $$f_i = Ae^{-(x_i-b)^2/(2*\sigma^2)}$$ The ith residual, $$r_i = y_i - Ae^{-(x-b)^2/(2*\sigma^2)}$$ Nonlinear least squares then finds the values of $A, b, \sigma$ to minimize $$\frac{1}{2}\Sigma r_i^2$$
Note: the $\frac{1}{2}$ is optional, and is for convenience to make certain things work out nicer.
In your case, all $f_i$ are of the same form. But this framework allows for different data points to have different functional forms. Also, instead of the ith input data value being a scalar $x_i$, it could be a vector of multiple inputs per data point.