Constructing hash table to elect the representative

175 Views Asked by At

I'm having difficulty with the following question:

There are 338 seats in the house of common. We are trying to elect speaker for the house of common. Suppose that m politicians are candidates and each candidates has a unique ID consisting of the first eight letters of his surname(padded with blanks for short surnames). An ID is stores as an 8 byte ASCII character string and the votes are stored in an array of 8-byte character strings. Design an algorithm that inputs the array and determines who is the Speaker. Your algorithm should be based on hashing but not any sorting algorithm. During construction of a hash table, the algorithm should use O(1) expected time to determine the winner. The algorithm should work for any n and m but it is okay to assume that m is less than or equal to 125, and there is no tie for the winner.

I'm having hard time because of the two time bounds. I have O(1) expected time to process each vote, so budget is O(n) expected time to build a hash table with all n votes. Then I have an additional O(1) expected time to determine the winner. Looking through the constructed hash table to determine the winner would take at least O(m) time and this exceeds the budge of O(1). I think I need to do some of the work of determining the winner during construction of the hash table to meet the time bounds. Can anyone please clarify this? P.S. It says I may not assume m is a constant. The solution should work for any m and n. <- I'm also confused about this. It says I can assume m is less than or equal to 125 as well.

2

There are 2 best solutions below

7
On

The bound on $m$ means that you can fix the size of the hash table.
We will also assume that we have a $O(1)$ perfect hash function for the key space, i.e. a function $$H: \{A-Z,a-z,\ldots\}^8 \to [1, M]$$ wich is injective on the input data (not the key space). This is a very reasonable assumption for the given setting and appears to be okay from the context of the problem. The important thing in practice is to chose $M$ much larger than $m$ (real implementations try to keep the "load factor", $m/M$ at around $0.7$ for a time/memory tradeoff. Here we don't have memory constraints so we might as well take a higher $M$. A safe bet would be $M \approx m^2$, so a $16$-bit Hash function and $M = 16384$ is fine for practical applications.

Algorithm

  1. Allocate an array $C$ of size $M \gg 125$, initialized with $0$'s
  2. Set $C_\max = 0$ and $V_\max =\ ""$
  3. For each vote $V$...
    • Compute the hash $H(V)$ with $1\le H(V) \le M$.
    • Increment the cell $C_{H(V)}$ by $1$
    • If $C_\max < C_{H(V)}$: Set $C_\max = C_{H(V)}$ and $V_\max = V$
  4. Return $V_\max$

Runtime

Step 1 takes $O(1)$ time, because it's the same for all input sizes. Step 2 dito. Step 3 is repeated $n$ times and contains

  • A Hash computation ($O(1)$)
  • An increment ($O(1)$)
  • A branch, with a fixed instruction independent of $n$ ($O(1)$)

Since all substeps are $O(1)$ and it's repeated $n$ times, we get $O(n)$ runtime.

Step 4 is again a fixed instruction, so it costs $O(1)$

Summing up, the complexity is dominated by the loop in step 3, wich is $O(n)$ as required.

Remark

For guaranteed correctness, we need to check for the unlikely event of a hash collision.
In principle, this is achieved by storing in the table not only the vote count, but also some metadata:

$$C_{H(V)} = (\text{count}, V, \text{next})$$ $\text{count}$ is the vote count, $V$ the candidate name and $\text{next}$ is an index to another cell for a potential other candidate $V'$ with $H(V) = H(V')$.

Finding $V'$ is then done by

  1. Computing $i = H(V')$
  2. Checking if $C_i.V$ is equal to $V'$ and if not, repeat with $i = C_i.\text{next}$

Storage is done by finding $V'$ and if we can't find it, we end up in a cell without a $\text{next}$-Attribute. In this case we employ some* strategy to allocate a new cell for us and reference this in the $\text{next}$ attribute of the last visited cell $C_i$

*: For example, $i_{\text{next}} = i \bmod M + 1$ is a possibilty, or $i_{\text{next}} = H(i)$ if the hash function is also defined on $[1, M]$.

0
On

In practice a hash table lookup generally takes constant time, assuming that you're using a good hash function. However this cannot be guaranteed, because for any practical hash function you can come up with inputs that result in collisions.

Since the length of the keys is fixed as 8 ASCII characters, you could use a trie with a height of 8 and a constant fanout of 95 at each level (there are 95 printable ASCII characters), which will provide guaranteed constant-time lookup. But since the problem explicitly specifies hashing you're probably assumed rather to rely on the constant-time "practical" behavior of a hash table.

As for determining the winner, simply keep track of who has received the most votes and how many that is, at any time during the counting. When you increment a counter past the most-votes-so-far counter, update these values.