To model data based on the exponential equation
$$ y=Ae^{Bx} $$
On the website https://mathworld.wolfram.com/LeastSquaresFittingExponential.html
In equation (5)
$$ \sum_{i=1}^{n}{y_i\left(\ln{\left(y_i\right)}-a-{bx}_i\right)^2} \space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space \space\space\space\space\space\space\space\space\space\space\space (5) $$
where
$$ \ln{\left(A\right)}=a $$
$$ B=b $$
It says this sum should be minimized. It works better than the linearizing the data and doing linear regression for the equation below.
$$ \ln(y)=\ln(A)+Bx $$
I know that this works better but what is the justification for the use of equation (5) or how was it derived?
Answering your last question, I showed that, if you express the residue as $$r_i=\log(\hat y_i)-\log( y_i)$$ assuming small errors, you have $$r_i \sim \frac{\hat y_i-y_i }{y_i }$$ which explains that the linearization leads to the minimization of the sum of the squares of relative errors.
So, if now, you express the residue as $$r_i=y_i \Big[\log(\hat y_i)-\log( y_i)\Big]$$ with the same assumptions, you have $$r_i \sim y_i \times \frac{\hat y_i-y_i }{y_i }=\hat y_i-y_i$$ which makes that this weighting factor leads to something extremely close to the minimization of the sum of the squares of absolute errors.