I'm having difficulty with the following question:
There are 338 seats in the house of common. We are trying to elect speaker for the house of common. Suppose that m politicians are candidates and each candidates has a unique ID consisting of the first eight letters of his surname(padded with blanks for short surnames). An ID is stores as an 8 byte ASCII character string and the votes are stored in an array of 8-byte character strings. Design an algorithm that inputs the array and determines who is the Speaker. Your algorithm should be based on hashing but not any sorting algorithm. During construction of a hash table, the algorithm should use O(1) expected time to determine the winner. The algorithm should work for any n and m but it is okay to assume that m is less than or equal to 125, and there is no tie for the winner.
I'm having hard time because of the two time bounds. I have O(1) expected time to process each vote, so budget is O(n) expected time to build a hash table with all n votes. Then I have an additional O(1) expected time to determine the winner. Looking through the constructed hash table to determine the winner would take at least O(m) time and this exceeds the budge of O(1). I think I need to do some of the work of determining the winner during construction of the hash table to meet the time bounds. Can anyone please clarify this? P.S. It says I may not assume m is a constant. The solution should work for any m and n. <- I'm also confused about this. It says I can assume m is less than or equal to 125 as well.
The bound on $m$ means that you can fix the size of the hash table.
We will also assume that we have a $O(1)$ perfect hash function for the key space, i.e. a function $$H: \{A-Z,a-z,\ldots\}^8 \to [1, M]$$ wich is injective on the input data (not the key space). This is a very reasonable assumption for the given setting and appears to be okay from the context of the problem. The important thing in practice is to chose $M$ much larger than $m$ (real implementations try to keep the "load factor", $m/M$ at around $0.7$ for a time/memory tradeoff. Here we don't have memory constraints so we might as well take a higher $M$. A safe bet would be $M \approx m^2$, so a $16$-bit Hash function and $M = 16384$ is fine for practical applications.
Algorithm
Runtime
Step 1 takes $O(1)$ time, because it's the same for all input sizes. Step 2 dito. Step 3 is repeated $n$ times and contains
Since all substeps are $O(1)$ and it's repeated $n$ times, we get $O(n)$ runtime.
Step 4 is again a fixed instruction, so it costs $O(1)$
Summing up, the complexity is dominated by the loop in step 3, wich is $O(n)$ as required.
Remark
For guaranteed correctness, we need to check for the unlikely event of a hash collision.
In principle, this is achieved by storing in the table not only the vote count, but also some metadata:
$$C_{H(V)} = (\text{count}, V, \text{next})$$ $\text{count}$ is the vote count, $V$ the candidate name and $\text{next}$ is an index to another cell for a potential other candidate $V'$ with $H(V) = H(V')$.
Finding $V'$ is then done by
Storage is done by finding $V'$ and if we can't find it, we end up in a cell without a $\text{next}$-Attribute. In this case we employ some* strategy to allocate a new cell for us and reference this in the $\text{next}$ attribute of the last visited cell $C_i$
*: For example, $i_{\text{next}} = i \bmod M + 1$ is a possibilty, or $i_{\text{next}} = H(i)$ if the hash function is also defined on $[1, M]$.