Need a concise way to query a database with limited filters

31 Views Asked by At

I have two observers identifying pets - Sam and Cindy. For each pet, Sam and Cindy have to say what kind of pet they see (dog, cat, leprechaun, etc.). They enter their results into a database and I need to confirm the quality of their work. The first thing I want to do is see when Sam and Cindy disagreed on what kind of pet they saw.

This sounds easy enough but there are a few challenges. Firstly, there are thousands of records, so I'm not able to check the results manually. I need to query the database they entered their results into.

This is how the database works:

  1. I can only use 20 filters or less in a single query
  2. An individual filter can only check Sam or Cindy's results
  3. The filter can compare their results to a specific value (e.g. Sam's answer matches 'dog')
  4. The filter can check multiple values (e.g. Sam's answer matches 'dog' or 'cat' or 'mouse')
  5. A filter can match or not match on all values (e.g. Sam's answer does not match 'dog')
  6. Filters combine with AND/OR operators (e.g. Sam matches 'dog' AND Cindy does not match 'dog')
  7. Once the query is run I will see all records that return TRUE against the filters
  8. Cindy and Sam were able to pick from the same 42 pets for all their answers
  9. I can only do one query

Here are three example queries:

  1. Sam matches 'dog' and Cindy does not match 'dog' gets me 1/42nd of the way to the answer I want
  2. (Sam matches 'dog' and Cindy does not match 'dog') OR (Sam matches 'cat' and Cindy does not match 'cat')
  3. (Sam matches 'dog' or 'cat' or 'stone') AND (Cindy does not match 'dog' or 'cat' or 'stone')

I can use the second example to cover all 42 possible values but this would require 84 filters and I only have 20.

The way I've used the third example is technically within the rules but it doesn't get the right result because it won't show me records where Sam said dog and Cindy said cat even though this is something I would want to flag as a mistake.

Is there a clever way I can expand on the 3rd example (using OR operators within a filter) to get the end result I'm after in 20 filters or less?

If not, what's the most efficient I can get to be able to do this operation in the fewest number of queries.

1

There are 1 best solutions below

5
On

I am assuming every bold clause in the query below is a filter.

If you run

(Sam matches 'x' and Cindy matches 'x')

or (Sam matches 'x' and Cindy does not match 'x')

or (Sam does not match 'x' and Cindy matches 'x')

or (Sam does not match 'x' and Cindy does not match 'x')

then we would have used a total of $15$ filters. This compound query is a tautology (i.e., always true) and every record in the database will satisfy this compound query. We don't care what the value of 'x' is (i.e., it could be 'dog', 'cat' or any valid attribute value).

It works on the principle that if we have two clauses $x, y$ then $(x \land y) \lor (x \land y') \lor (x' \land y) \lor (x' \land y')$ is always true.

You mention:

I want to ... see when Sam and Cindy disagreed on what kind of pet they saw

This is a classic XOR operation. ie., if $x,y$ are two clauses and we want when they are complements of each other, we write

$$x \oplus y \Leftrightarrow (x \land y') \lor (x' \land y)$$. The equivalent query for a single choice 'x' would be

(Sam matches 'x' and Cindy does not match 'x')

or (Sam does not match 'x' and Cindy matches 'x')

This uses $7$ filters and gives only results where there is a mismatch between Sam and Cindy for a specific choice 'x'.

If you run a slight modification of the first query

(Sam matches 'x' and Cindy does not match 'x')

or (Sam does not match 'x' and Cindy matches 'x')

or (Sam does not match 'x' and Cindy does not match 'x')

where 'x' is say 'cat' then you will get a few false positives e.g. Sam chooses 'dog' and Cindy chooses 'dog' (and any choice that is not 'cat' where they agree). This uses 10 filters.