I know what is Residual Sum of Squares (RSS). The problem is I don't understand why it works.
For example, you have a set of descreet values, and you want to describe them with linear, quadratic or any other function. To choose the best function you calculate the RSS (sum of variances) instead sum of deviances.
Why to choose the most precise function you have to sum variances (instead of deviances)? How is that prooved?
To be specific, suppose your model is $Y_i = \beta_0 + x_i + e_i,$ where $e_i$ are iid $Norm(0, \sigma_e).$ Also suppose that you have used the least squares method to get estimates $\hat \beta_0$ of $\beta_0$ and $\hat \beta_1$ of $\beta_1.$ Then points on the least squares line are $(x_i, \hat Y_i),$ where $\hat Y_i = \hat \beta_0 + \hat \beta_i x_i.$
As mentioned in the Comments, for this model, the best estimate of $\sigma_e^2$ is $S_e^2 = \frac{\sum_i (Y_i - \hat Y_i)^2}{n-2}.$ This means that $S_e^2$ has the smallest the variance among unbiased estimators of $\sigma_e^2.$ Among other advantages, this means that confidence and prediction intervals based on $S_e^2$ are shorter than for other estimates of $\sigma_e^2$, and that tests of hypothesis have the greatest power (other things being equal).
If the errors $e_i$ are not normal with constant variance $\sigma_e,$ then other estimates of variability about the regression line may give shorter CIs and more powerful tests. In particular, an estimator based on $\sum_i |Y_i - \hat Y_i|$ has been used in some applications.