Time complexity of taking duplicates out of a set

56 Views Asked by At

Imagine an algorithm, that takes a set as input, and returns that set, but without duplicates, as output.

The algorithm is bruteforce, it goes through each element in the input set, and one-by-one compares it to all elements in the output set. If the current element doesn't have a dupe in the output, it will be placed there.

Say, we have $\{5, 7, 2, 6, 2, 8, 4, 5\}$ as input (the elements could be anything), and the algorithm already scanned through $5$, $7$ and $2$.

It moves on to 6:
$6$ is compared to the first element of the output set, $5$. They aren't the same, so the algorithm moves on.
$6$ is then compared to $7$, and $2$.
There wasn't another $6$ found in the output set, so $6$ is added there ($\{5, 7, 2, 6\}$).

Next, the algorithm moves on to $2$, comparing it to $5$ , then $7$. But when $2$ is compared to the next element, $2$, they're the same, so the $2$ being compared to all elements in the output array is dupe, and is dropped. The algorithm then moves on to the next element...

The main question is: What is the algorithm's time complexity?

2

There are 2 best solutions below

0
On

That is of order $n^2$.

I would sort the set. The time would then be order $n \log n$.

If the elements were bounded relative to the number of them, it could be done in order $n$ by using an array with size of the maximum possible value.

2
On

The algorithm you described checks all elements in the output set and if the element is not present in the output set you put it in the set. If the input set has $n$ elements this algorithm has a time complexity of $O(n^2)$.

In fact close to linear time complexity can be achieved using a hash table. We traverse the input set and while doing so store the elements of the set in a hash table. If an element is present in the table we do not insert into the table. Afterwards we traverse the hash table to construct a set without duplicates. This gives us an average time complexity of $O(n)$, assuming a reasonable implementation of a hash table.