Complexity of inversion counting

1.9k Views Asked by At

.

Let $X \in \mathbb{Z}^n$. I need to find the number of pairs $ (i, j)$ with $ 0 < i < j \le n$ such that $ X_i>X_j$ as well as the number of pairs $ (i, j),\ 0 < i < j \le n$, such that $ X_i=X_j$.

I know how to calculate in $O(n \log(n))$ time complexity. How to prove that a comparison-based algorithm with better complexity does not exist (check "update" section on the bottom)?


An algorithm with $O(n log(n))$ time complexity (one of the known methods):

We store:

1) 2 accumulators for the first and for the second value. (both of them initially 0)

2) Balanced search tree (AVL, for example) with the size of subtree at each vertex (operation "add element" is $O(\log \text{size})$ cost). Elements in this tree are 2-tuples $((-\infty\cup\mathbb{Z}\cup\infty), \mathbb{Z} )$, comparison is done by first element, if first elements are equal - by the second element. Initially this tree is empty.

Subprocedure - find the number of elements in the tree that lay in the interval $[X, Y]$ - with the time cost $O(\log n)$ (trivial)

Call this procedure as count$(X, Y)$

So, for $i$ from $1$ to $n$:

  1. add to first accumulator count( ($-\infty,1$),($X_i -1, i$))
  2. add to second accumulator count (($X_i -1, -1$), ($X_i -1, i-1$))
  3. add to balanced tree ($X_i, i$)

So, $n$ steps with $O (\log{n})$ cost

Update: It cannot be calculated faster than $O(n)$ complexity: The second count can be calculated with $O(n)$ time complexity (with the simple hash tables), so I need to prove the time complexity lower bound only for counting the pairs $ (i, j),\ 0 < i < j \le n$ such that $X_i>X_j$

2

There are 2 best solutions below

8
On BEST ANSWER

Inversion counting is a problem with a long history, and of great importance even today (particularly in e.g. streaming models).

It is natural to see its connection to sorting, since sorting effectively "erases" inversions; and in fact, it is straightforward to see how one can "piggyback" inversion counting on MergeSort without asymptotically increasing its complexity: what happens when I have already sorted two halves of an array, and I am merging them, and I pick an element of from the second half while $j$ elements from the first still remain unmerged?

In fact, if we restrict ourselves to comparison-based counting, i.e. the only way to access our array $x_1,...x_n$ is by asking for given any pair $(i,j)$ whether $x_i< x_j$, inversion counting requires at least as many comparisons as sorting, making its complexity $\Omega(n\lg(n))$. To see it, note that any set of comparisons defines implicitly a directed acyclic graph $G$ whose nodes correspond to $x_1,...,x_n$, and whose arcs correspond to comparisons made; and any in-order visit of $G$ (one that does not visit $x_i$ before $x_j$ if there is an arc from $x_j$ to $x_i$, i.e. if $x_j<x_i$) defines a total ordering of $x_1,...,x_n$ compatible with all the comparisons made. If $G$ is not a line, i.e. does not admit a unique total ordering, then there is at least a point in a visit when the next element of the visit can be either of two elements $x_i$ or $x_j$. But if I choose either one, I can choose the other as the following element, and then continue my visit of $G$ independently of my choice. This induces two total orderings that are both compatible with all comparisons made so far, and differ by exactly one inversion ($x_j$ with $x_j$), meaning that the parity of their inversion counts is different.

NOTE: I believe this is the same argument given above by kotomord, but I hope it's clearer written this way - because I for one was confused.

This connection with sorting is double-edged, however, in the sense that just like the complexity of sorting varies between $\Theta(n)$ and $\Theta(n \lg(n))$ depending on the machine model one is using, so does the complexity of inversion counting. For example, in the popular and reasonable Word RAM model (in which sorting can be done in time $O(n\lg\lg(n))$), inversion counting can be done in time $O(n\sqrt{\lg(n)})$ (check the tech report here, which also sports a fair bibliography on the topic).

Note that in the Word RAM model the bound for inversion counting is worse than that for sorting, showing that in general you can't reduce inversion counting to sorting. Nor is the converse true: note that the argument on inversion counting requiring at least as many comparisons as sorting, above, does not hold in general if one considers cases when both sorting and inversion counting have the same side-information - e.g. if one knows that the permutation holds exactly $1$ inversion! One can, however, always reduce inversion counting to (2D) dominance counting, and that's how many people approach it.

5
On

I found a proof

Get the subcase of this task - set what all $X_i$ are distinct.

Let exist an algorithm F with time complexity O(f(n)), $n \le f(n) \le nlog(n)$

Now we create a new array Y of pairs $(X_i, i)$ and cast with algo to Y.
At any comparizon in F put into log pair (i, j), where $X_i<X_j$ and F was compare this values.

Where are O(f(n)) values in the log. See it as a graph G with n vertices, where pairs are directed edges. G contains no cycles

Lemma 1. The time complexity of topological sorting of graph G with n vertices and m edges is no more $O(n+m)$

Lemma 2. If graph G contains two different topological orders,then G contains two different topological orders, difference between them is permutation of two adjanced elements.

Both lemmas are proved by the trivial algorithm. http://codepad.org/fLDdQjpt - where is a simple java implementation

So, fix orders $Y_1$ and $Y_2$ Create by them two start arrays $X_1$ and $X_2$, difference beetween them is permutation of 2 elements => $X_1$ and $X_2$ have different inversions count, but F must works equally.

So, we can sort X with $O(n) + O(f(n)) + O(f(n) +n) = O(f(n))$ time complexity (third is a time complexity of the topological sorting of G)

So, $n*log(n) < f(n)$ asymptotically => $f(n) = O(n*log(n))$