Optimize database indexes

90 Views Asked by At

Q: I have 14 columns, how many indexes do i need to create to cover all possibilities?

Examples of possibilities:

  • col1
  • col12
  • col5, col3, col4
  • col7, col2, col12, col1
  • all 14 columns

--

  • Order can be vary (but will be optimized -- see rules)
  • Length can vary (min = 1, max = 14)

Rules:

  • When given a set of columns (this can range from 1 to 14 columns) they will be automatically put in the most efficient order (so it matches an already existing index). This might not be applicable in case all possible indexes are created.
  • An index can span one or multiple columns
  • An index of multiple columns cover the first columns. Example: index(col1, col2, col3) can be used as index for (col1, col2) but not for (col2, col3)
  • Using multiple indexes is not allowed in the answer (database do use multiple indexes in a single query when useful though)

Bonus question:

  • When a requirement puts a maximum on the amount of indexes that can be used.
  • And there is a list available of combinations and how often they are used.

How would one calculate the most efficient set of indexes?

2

There are 2 best solutions below

2
On BEST ANSWER

I don't know the most efficient algorithm, but here's one that should work to calculate the most efficient indexes:

Get the Hasse diagram using subset ordering

  • Draw a graph of each query as a set of columns.
  • Connect lines between each superset-subset pair.
  • Find the transitive reduction by removing any lines where the superset and subset have a set in between (that is, the in-between set is a subset of the superset and a superset of the subset).
    • For example, if you have the lines:
      • {1, 2, 3} ⇒ {1, 2}
      • {1, 2} ⇒ {1}
      • {1, 2, 3} ⇒ {1}
    • Remove {1, 2, 3} ⇒ {1}, because {1, 2} is connected between them.

For your problem, the Hasse diagram looks like this:

$ \{1..14\} \left\{ \begin{array}{l} \{1, 2, 7, 12\} \left\{ \begin{array}{l} \{1\} \\ \{12\} \end{array} \right. \\ \{3, 4, 5\} \end{array} \right. $

Note that not all diagrams are treelike. Sometimes two queries on the left are both supersets of a query on the right and it forms a sort of lattice.

Find all paths

Find all paths that start with a query with no superqueries (a root query) and end with a query with no subqueries (a leaf query).

  • {1..14} ⇒ {1, 2, 7, 12} ⇒ {1}
  • {1..14} ⇒ {1, 2, 7, 12} ⇒ {12}
  • {1..14} ⇒ {3, 4, 5}

Delete unnecessary paths

In each path, mark any queries that appear in any other path.

  • {1..14} ⇒ {1, 2, 7, 12} ⇒ {1}
  • {1..14} ⇒ {1, 2, 7, 12} ⇒ {12}
  • {1..14} ⇒ {3, 4, 5}

If there are any entire paths where all queries are marked, choose one, delete the entire path, and redo the markings. Delete all the entire paths you can.

The number of paths remaining is the number of indexes. You may be able to delete a different sequence of paths in order to end up with less indexes at the end.

Your problem in particular has no entire paths that can be deleted, so you know the number of indexes needed is three.

Shrink remaining paths

You can then delete any queries on the left of a path as long as they appear in another path. Repeat with the new leftmost queries until there are no such duplicates on the left. There are a few different solutions you might come to:

  1.  

    • {1..14} ⇒ {1, 2, 7, 12} ⇒ {1}
    • {12}
    • {3, 4, 5}
  2.  

    • {1}
    • {1..14} ⇒ {1, 2, 7, 12} ⇒ {12}
    • {3, 4, 5}
  3.  

    • {1, 2, 7, 12} ⇒ {1}
    • {12}
    • {1..14} ⇒ {3, 4, 5}
  4.  

    • {1}
    • {1, 2, 7, 12} ⇒ {12}
    • {1..14} ⇒ {3, 4, 5}

The number of columns in the leftmost query tells the number of columns in that index if you're trying to minimize that in some way.

Convert paths to indexes

  • For each path:
    • Start with an empty index.
    • For i from the number of queries on the path down to 1:
      • Intersect the first i queries.
      • Append to the index any of the elements of the index that aren't already in the index (order you append these is irrelevant).

So, with the following paths:

  • {1..14} ⇒ {1, 2, 7, 12} ⇒ {1}
  • {12}
  • {3, 4, 5}

You'll have the indexes:

  • (1, 2, 7, 12, 3, 4, 5, 6, 8, 9, 10, 11)
  • (12)
  • (3, 4, 5)
0
On

Answer for part 1:

$\sum\limits_{n=1}^k \binom{k}{n}$

Where n counts from 1 to k. And k is 14.