How can I use relational algebra to determine when changes to a relation invalidate the results of an expression?

123 Views Asked by At

My goal is to take a series of modifications to relations in an SQL database, and check if I need to re-run other SQL queries because their results would change. Allowing people to listen to changes to an SQL query.

I found a similar problem is "how to incrementally refresh a materialized view".

Here is an unimplemented proposal for Postgres that uses relational algebra to optimize this process.

I am looking for something slightly different though. The above proposal calculates the delta for these queries, but I want to know whether I need to calculate the delta in the first place based on the actual query selections.

For a series of changes (update, insert, delete), taking into account the values of the tuple, how can I work out if the query results need to be updated at all.

In practice it would be:

function doesQueryNeedRefresh([changes], query): Boolean { return ... }

For example, if we look at different mutations to the list relation, here are the conditions that invalidate the query.

select name from list where list_id = 1
  • update:
    • if old list_id = 1 and list_id changes
    • name attr changes
  • insert:
    • if list_id = 1
  • delete:
    • if list_id = 1

But then it gets more complex with joins:

select l.name, i.name from list as l
inner join list_items as li on li.list_id = l.id
inner join items as i on i.id = li.item_id
where items.completed = true

For this query, if we look at a single mutation to each relation in isolation:

  • list
    • create: if list_items.list_id contains list.id
    • update: if name changes or list.id
    • delete: if list_items.list_id contains list.id
  • list_items
    • create: if list_id is in list, or item_id is in items
    • update: if foreign keys change (because of no projection)
    • delete: if list_id is in list, or item_id is in items
  • items
    • create: if completed = true and id in list_items.item_id
    • update: if id in list_items.item_id
    • delete: if id in list_items.item_id

We could do less checks by assuming that some values cannot change such as primary and foreign keys.

Also, instead of deleting rows, what if we marked things as deleted = true, maybe this would make it simpler.

Is there a way to formalize this in relational algebra? And also, does this help me efficiently perform these calculations?


I am also interested in the smallest subset of these relations that are necessary for these checks to run. Use case: I maintain a local database that syncs a portion of the data from the cloud database, and I want to run queries locally,