$$f(a_1, b_1, \ldots , a_m, b_m) = \sum_{j=1}^{n} (y_j - \sum_{k=1}^m a_kx_j^{b_k})^2$$ $$2m < n$$
$x$ and $y$ are constants, and $a$ and $b$ are variables to find.
I took deviation out of it and somehow solved it and found when it's $0$, but it didn't ended as the minimum. (There was variables that makes the value of $f$ smaller) It's hard for me to work on function I don't know much about.
Note: I am trying to find the variables for the equation which best fits a data, but I can't fit it like $y_1 = a_1x_1^{b_1} + \cdots,\ldots$ because I have more variables than data.
Fitting a model such $$ y= \sum_{k=1}^{m} a_k\,x^{b_k}$$ using $n$ data points $(x_i,y_i)$ , $n \gt 2m$, is difficult because the model is nonlinear with respect to its parameters (the $b_k$'s) and nonlinear regression require reasonable starting guesses.
As you noticed, the problem is simple if you assign specific values to the $b_k$'s since the problem becomes a simple multilinear regression.
There is brute force solution which uses this fact. Consider the function $$\Phi(b_1,b_2,\cdots,b_m)=\sum_{i=1}^{n} \Big(y_i - \sum_{k=1}^m a_k\,x_i^{b_k}\Big)^2$$ and generate values of the function on a regular grid; you do not need to compute all points because of the symmetry. Set for example $$b_1=b_0+(i_1-1) \Delta b$$ $$b_2=b_1+i_2 \Delta b$$ $$b_3=b_2+i_3 \Delta b$$ $$b_m=b_{m-1}+i_m \Delta b$$ For sure, you need to select $b_0$ and fix $\Delta b$ in order to limit the $b_k$'s to a maximum value.
For each set of the $b_k$'s, you perform the linear regression and you keep the points which make function $\Phi$ lower. When you finished the generation of the grid, you then know the best approximate values (keep track of the corresponding $a_k$'s) and you are ready for the full nonlinear regression.
It is sure that it could be time consuming if $m$ is large but it works. But if $m \leq 4$ it is quite fast and you can automate the process using Excel.