I'm attempting to determine the best approximation for the a model for a data set with power law characteristics as illustrated below (blue scatter). The model (the orange line) has been determined using the weighted average of the slope as:
$m_{weighted} = \frac{\sum_{i=1}^{N} m_i*w_i}{\sum_{i=1}^{N} w_i}$ where $m = \frac{log(F_2/F_1)}{log(x_2/x_1)}$, which is calculated for each pair of data points.
The error term in the chart is derived from $\frac{\sum(\text{data points})-\sum(\text{modelled data points})}{\sum(\text{data points})}$
Question 1: Is there a method with lower computational complexity that delivers the same result, i.e. without iterating over each pair of data points?
Following that:
Question 2: Is such a method robust when calculating on small samples?
Forgive me, but I'm not completely confident whether my mathjax notation leaves space for error in interpretation, so to avoid misunderstandings I'm adding the python code below:
def powerlaw_appx(xs,ys):
assert isinstance(xs, list)
assert isinstance(ys, list)
assert len(xs) == len(ys)
slopes = []
for comb in itertools.combinations(range(len(xs)), 2):
i1, i2 = comb
upper = math.log(ys[i2] / ys[i1]) # log(y2 / y1)
lower = math.log(xs[i2] / xs[i1]) # log(x2 / x1)
m = upper / lower
weight = (ys[i2] + ys[i1]) / 2
slopes.append((weight, m))
total_weight = [w for w, m in slopes]
weighted_slope = [m * w for w, m in slopes]
m = sum(weighted_slope) / sum(total_weight)
k = ys[0]
return k, m

I don't think there is a $ O(n) $ algorithm for computing the value you've presented. If you're ok with a slightly different value of $m_{weighted}$ you might try doing a linear fit to the data after taking $log$ of both list of x coordinates and y coordinates, which has linear complexity in number of points (you compute sum and sum of squares and sum of products which in numpy can be done incredibly fast due to vectorisation of some operations). For description of this method look here.
If you have a particular error function that you want to minimize (which I'm unable to figure out from your question as $\sum(data points)$ is a vector for me) you can use a different method that will minimize such error.