Q: I have 14 columns, how many indexes do i need to create to cover all possibilities?
Examples of possibilities:
- col1
- col12
- col5, col3, col4
- col7, col2, col12, col1
- all 14 columns
--
- Order can be vary (but will be optimized -- see rules)
- Length can vary (min = 1, max = 14)
Rules:
- When given a set of columns (this can range from 1 to 14 columns) they will be automatically put in the most efficient order (so it matches an already existing index). This might not be applicable in case all possible indexes are created.
- An index can span one or multiple columns
- An index of multiple columns cover the first columns. Example: index(col1, col2, col3) can be used as index for (col1, col2) but not for (col2, col3)
- Using multiple indexes is not allowed in the answer (database do use multiple indexes in a single query when useful though)
Bonus question:
- When a requirement puts a maximum on the amount of indexes that can be used.
- And there is a list available of combinations and how often they are used.
How would one calculate the most efficient set of indexes?
I don't know the most efficient algorithm, but here's one that should work to calculate the most efficient indexes:
Get the Hasse diagram using subset ordering
For your problem, the Hasse diagram looks like this:
$ \{1..14\} \left\{ \begin{array}{l} \{1, 2, 7, 12\} \left\{ \begin{array}{l} \{1\} \\ \{12\} \end{array} \right. \\ \{3, 4, 5\} \end{array} \right. $
Note that not all diagrams are treelike. Sometimes two queries on the left are both supersets of a query on the right and it forms a sort of lattice.
Find all paths
Find all paths that start with a query with no superqueries (a root query) and end with a query with no subqueries (a leaf query).
Delete unnecessary paths
In each path, mark any queries that appear in any other path.
{1..14}⇒{1, 2, 7, 12}⇒ {1}{1..14}⇒{1, 2, 7, 12}⇒ {12}{1..14}⇒ {3, 4, 5}If there are any entire paths where all queries are marked, choose one, delete the entire path, and redo the markings. Delete all the entire paths you can.
The number of paths remaining is the number of indexes. You may be able to delete a different sequence of paths in order to end up with less indexes at the end.
Your problem in particular has no entire paths that can be deleted, so you know the number of indexes needed is three.
Shrink remaining paths
You can then delete any queries on the left of a path as long as they appear in another path. Repeat with the new leftmost queries until there are no such duplicates on the left. There are a few different solutions you might come to:
The number of columns in the leftmost query tells the number of columns in that index if you're trying to minimize that in some way.
Convert paths to indexes
So, with the following paths:
You'll have the indexes: