Least square fit $a(\lg(n))^2+b(\lg(n))+c$

55 Views Asked by At

Using an ordering algorithm for prime numbers less than 1,000,000 I get the following execution times:

|Prime numbers|--Time

100--.039s

1000--.039s

10,000--.209s

50,000--.299s

78498--.301s

The problem says "fit a curve via least squares type $a(\lg(n))^2+b(\lg(n))+c$".

Theoretically the algorithm has execution time $O((\lg(n))^2)$.

I am somewhat lost on the subject of least squares,my algorithm class does not cover this topic, I do not ask to solve the problem, only guidance on how to start with the adjustment or what things should be done.

1

There are 1 best solutions below

1
On BEST ANSWER

Since the algorithm is $O(lg(n))^2$, we can express the time complexity as $T(n)= a(lg(n)^2+b(lg(n))+c$, for some constants $a,b,c$ and try to estimate the constants from the data using non-linear least squares model.

For example, the below code snippet shows how we can estimate $a,b,c$ with R, where the runtime data is stored in a dataframe df:

df   # the runtime data stored in a data frame
      n     T     log2n
1   100 0.039  6.643856
2  1000 0.039  9.965784
3 10000 0.209 13.287712
4 50000 0.299 15.609640
5 78498 0.301 16.260368

fitmodel <- nls(T ~ a*(log(n))^2 + b*log(n) + c,
            data = df,
            start=list(a=1,b=1,c=1),      
            trace=TRUE)
summary(fitmodel)
params <- fitmodel$m$getAllPars()
params 
#           a            b            c 
# 0.005714032 -0.046468590  0.120629736 

Now, let's visualize how the NLS model predicted runtime compare with the actual runtime values for different values of $n$, using the following code snippet:

n <- seq(100, 100000, 1)
pred <- params[1]*(log(n))^2 + params[2]*(log(n)) + params[3]
df1 <- data.frame(n = n, pred = pred)
df %>% ggplot(aes(n, T)) + geom_point() + geom_line() +
       geom_line(data=df1, aes(n, pred, col='pred'), lty=2) +
       theme_bw()

enter image description here

The black datapoints represent the original runtimes for the corresponding $n$s, whereas the red dotted curve represents the one predicted by the fitted NLS model.