I would like to non-heuristically (dis)prove the following statement:
"The degree of the optimal polynomial to fit to some data corresponds with the closest integer to the resulting exponent from a power regression."
In other words, the best polynomial model to least-squares fit a given data set $\{(x_i, y_i) | 1\leq i\leq N\}$ is $$\hat{f}(x_i;\{a\})=\sum_{n=0}^{\lceil\beta\rceil}a_n x^n,$$ where the parameter $\beta\in \mathbb{R}$ is, in turn, obtained from least-squares fitting of the model $$ \hat{g}(x_i;\alpha,\beta,\gamma)=\alpha x_i^\beta+\gamma.$$
The statement is trivially true in the particular case where $\beta\in\mathbb{N}$, $a_0=\gamma$, $a_\beta =\alpha$, and $a_{0<n<\beta}=0$. But what about anything else?
I am sorry but the question appears not clear enough in absence of clear mathematical definition of what is considered of the "optimal" wrt a chosen crieria (which is missing in the question).
By comparison to qualify the quality of the fitting of a model to a given data we compute for example MSE (Mean Square Error). Or another criteria of this kind with a well establish mathematical definition and the goal is to adjust the model for LMSE (Least Mean Square Error).
The OP should have given a representative example of data in order to make more concreet the calculus.
An example of data is given below with a low number of points (only 7) in interest of shortness.
The results of fitting characterised by LRMSE (Least Root Mean Square Error) are given for increasing degree $n$ of the polynomial model.
If the criteria of fitting is the least mean square error, the best fit is obtained with a polynomial of degree = 6 (as expected with 7 points).
If the criteria of fitting is the smallest amplitude of oscillation on the range of data obviously the best fitting is obtained with a polynomial of degree = 1 (linear).
I suppose that one look for something between them. This implies that the critera of fitting be a mixture of the both above. But what criteria exactly ? Without a clear definition of the criteria this opens the floodgates to arbitrariness.
Of course the above example is caricatural. This is only to show where is the difficulty and why a definitive answer to the question (as presently raised) cannot be given.
In addition : Fitting with the power function.
Note for information : The seven points where generated from a polynomial of degree = 2 and a random scatter.