Experimental Analysis of an Algorithm - How to prove that the graph is $O(n\log n)$?

336 Views Asked by At

This question is probably stupid, but I've been trying to figure this out for hours and I still couldn't find anything about it. Probably I'm just too lost.

So basically, I'm analysing an algorithm by doing an asymptotic and experimental analysis. The asymptotic analysis went well and I concluded that my algorithm is $O(n\log n)$. The problem is the experimental analysis. Firstly, I did some tests to get the time taken for each input. For example:

$n = 1, t = 0 | n = 2, t = 2 | n = 4, t = 8 | n = 8, t = 24 | n = 16, t = 64 | n = 32, t = 160 (...)$

If I do the graph with this example, I can see that it is an O(n log n)/linearithmetic graph, but how do I prove it? Do I have to calculate the order of growth? If so, how do I do it?

Thanks for the help!

1

There are 1 best solutions below

0
On BEST ANSWER

You can try to see (e.g., by plotting) if the function $\frac{T(n)}{n}$ has a logarithmic growth (to reduce the number of hidden constants, instead of checking $T(n)$ itself).

For example, you can try to use non-linear least square curve fitting (e.g., with R) and fit a logarithm function (of the form Const + A * log(n), e.g.,) and compare the growths. The following R code shows an example of the same on randomly generated time

n <- 1:100
T <- 5*n*(log(10*n))+2+10*runif(1) # randomly generated time complexity T(n)

nlmod <- nls(T/n ~  Const + A * log(n), start = list(A = 1, Const=1), trace=TRUE)
summary(nlmod)
# Formula: T/n ~ Const + A * log(n)

# Parameters:
#  Estimate Std. Error t value Pr(>|t|)    
#  A      4.14962    0.07108   58.38   <2e-16 ***
#  Const 15.05845    0.26675   56.45   <2e-16 ***
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

# Residual standard error: 0.6563 on 98 degrees of freedom

# Number of iterations to convergence: 1 
# Achieved convergence tolerance: 3.671e-09

plot(n,T/n, main = "nls(*), data, true function and fit, n=100", pch=19)
lines(n, predict(nlmod), col = 2)
lines(n, predict(nlmod,n), col = 2, lwd=2)
legend("bottomright", legend=c("T(n)/n", "fitted"),
   col=c("black", "red"), pch=c(19,NA), lty=c(NA,1), lwd=2)

enter image description here

You can also compute the distance (e.g., MSE, KL-divergence etc.) between the fitted logarithmic curve and the actual $\frac{T(n)}{n}$, to ensure that it's sufficiently small for large enough $n$.