Datalog Rules in Quail
Section
Mix.install([
{:quail, path: "/Users/quinn/dev/beam_box/quail"}
])
Quail is a pedagogical egglog implementation in Elixir. The other notebooks demonstrate pattern rewrites (slotted e-graphs) and colored e-graphs (contextual equalities). This notebook focuses on the Datalog side: how conjunctive queries over relations enable analyses and transformations that go beyond what pattern matching alone can express.
In egglog, the e-graph’s structure is automatically mirrored as queryable relations. You can also define your own relations and write Datalog rules that derive new facts, assert new relations, or merge e-classes — all within the same equality saturation loop.
Transitive Closure
The “hello world” of Datalog: computing transitive closure over a graph. Given edge facts, derive all reachable pairs.
Two rules do the work:
| Rule | Body | Action |
|---|---|---|
| Base case |
edge(x, y) |
assert reachable(x, y) |
| Transitive step |
reachable(x, y) ∧ edge(y, z) |
assert reachable(x, z) |
The saturation loop applies these rules until no new facts are derived.
import Quail, only: [v: 1]
alias Quail.{Database, Rule}
db = Quail.new()
# Seed the graph with edges: a → b → c → d, plus a shortcut a → c.
db = Database.insert_relation(db, :edge, {:a, :b})
db = Database.insert_relation(db, :edge, {:b, :c})
db = Database.insert_relation(db, :edge, {:c, :d})
db = Database.insert_relation(db, :edge, {:a, :c})
rules = [
# Base: every edge is a reachable pair.
Rule.rule(:reach_base,
body: [{:edge, [v(:x), v(:y)]}],
actions: [{:assert, :reachable, [v(:x), v(:y)]}]
),
# Step: reachable composes with edges.
Rule.rule(:reach_step,
body: [
{:reachable, [v(:x), v(:y)]},
{:edge, [v(:y), v(:z)]}
],
actions: [{:assert, :reachable, [v(:x), v(:z)]}]
)
]
result = Quail.run(db, rules, iter_limit: 10)
# Query: what can :a reach?
reachable_from_a =
Database.lookup_relation(result.database, :reachable, 0, :a)
|> Enum.map(fn {_from, to} -> to end)
|> Enum.sort()
IO.puts("Edges: a→b, b→c, c→d, a→c")
IO.puts("Reachable from :a — #{inspect(reachable_from_a)}")
IO.puts("Saturated in #{result.iterations} iterations (#{result.stop_reason})")
The key ideas:
-
Database.insert_relation/3seeds a named relation with ground facts. No e-graph needed — these are pure Datalog relations. -
Rule bodies are conjunctive queries:
[{:edge, [v(:x), v(:y)]}]means “for every row(x, y)in theedgerelation…” -
:assertactions insert derived facts into new relations. The semi-naive evaluator ensures each fact is derived only once. - The saturation loop interleaves Datalog evaluation with e-graph rebuilding, so both sides stay consistent.
Op Relations
When you add terms to the e-graph, Quail automatically mirrors the structure into op relations — one relation per operator, indexed by arity. This is what makes the e-graph queryable by Datalog rules.
For a term like {:add, {:num, 3}, {:num, 4}}, three relations are populated:
| Relation | Row | Meaning |
|---|---|---|
num/2 |
(class_3, 3) |
class_3 contains num(3) |
num/2 |
(class_4, 4) |
class_4 contains num(4) |
add/3 |
(class_add, class_3, class_4) |
class_add = add(class_3, class_4) |
A Datalog rule body can reference these directly. The first column is always the result class ID; subsequent columns are children; trailing columns are literal data.
alias Quail.{Database, Relation}
db = Quail.new()
{root, db} = Quail.add_term(db, {:add, {:num, 3}, {:num, 4}})
# Inspect op relations — each operator gets its own indexed table.
for {{op, arity}, rel} <- db.op_relations do
rows = Relation.to_list(rel)
IO.puts("#{op}/#{arity}:")
for row <- rows do
IO.puts(" #{inspect(Tuple.to_list(row))}")
end
end
IO.puts("\nRoot class: #{root}")
Constant Folding: Rewrites vs Datalog
The same optimization can be expressed as either a pattern rewrite or a Datalog rule. Comparing the two reveals what Datalog brings to the table.
Pattern rewrite — matches e-graph structure directly:
add(num(a), num(b)) → num(a + b)
Datalog rule — queries relations, then performs actions:
add(r, x, y) ∧ num(x, n) ∧ num(y, m) → union(r, num(n + m))
The Datalog version is more explicit about what it’s doing: it names intermediate variables (r, x, y) and can combine information from multiple relations in a single rule. This becomes essential for analyses that need to aggregate data across the e-graph.
alias Quail.{Rewrite, Rule}
# Approach 1: Pattern rewrite with a guard.
rewrite_fold = Rewrite.rewrite(:const_add,
{:add, {:num, v(:a)}, {:num, v(:b)}},
{:num, v(:c)},
when: fn %{a: a, b: b}, _db -> %{c: a + b} end
)
# Approach 2: Datalog rule querying op relations.
datalog_fold = Rule.rule(:const_fold,
body: [
{:add, [v(:r), v(:x), v(:y)]},
{:num, [v(:x), v(:n)]},
{:num, [v(:y), v(:m)]}
],
actions: [{:union, v(:r), v(:sum_class)}],
guard: fn %{n: n, m: m} -> %{sum_class: {:add_term, {:num, n + m}}} end
)
# Both produce the same result.
db = Quail.new()
{root, db} = Quail.add_term(db, {:add, {:num, 3}, {:num, 4}})
r1 = Quail.run(db, [rewrite_fold], iter_limit: 10)
{:ok, t1} = Quail.extract(r1.database, root)
r2 = Quail.run(db, [datalog_fold], iter_limit: 10)
{:ok, t2} = Quail.extract(r2.database, root)
IO.puts("Rewrite result: #{inspect(t1)}")
IO.puts("Datalog result: #{inspect(t2)}")
IO.puts("Equivalent? #{t1 == t2}")
The Datalog version uses a guard function to compute the sum, and {:add_term, {:num, n + m}} as a special value that tells the runner to add a new term to the e-graph and bind its class ID to sum_class. The :union action then merges the add class with the newly created num class.
Both approaches are equivalent for this simple case. The Datalog approach shines when you need to combine information from user-defined relations that aren’t part of the e-graph structure.
Mixed Rules: Rewrites + Datalog
The real power of egglog is mixing both rule types in the same saturation loop. Pattern rewrites explore the space of equivalent expressions; Datalog rules derive facts and trigger actions based on what the e-graph contains. They compose naturally because they share the same database.
Here we simplify (a + 0) * 1 using algebraic rewrites, then fold constants with a Datalog rule — all in one pass.
alias Quail.{Rewrite, Rule}
rules = [
# Algebraic rewrites (pattern-based).
Rewrite.rewrite(:add_zero, {:add, v(:x), {:num, 0}}, v(:x)),
Rewrite.rewrite(:mul_one, {:mul, v(:x), {:num, 1}}, v(:x)),
Rewrite.rewrite(:comm_add, {:add, v(:x), v(:y)}, {:add, v(:y), v(:x)}),
# Constant folding (Datalog-based).
Rule.rule(:fold_add,
body: [
{:add, [v(:r), v(:x), v(:y)]},
{:num, [v(:x), v(:n)]},
{:num, [v(:y), v(:m)]}
],
actions: [{:union, v(:r), v(:sum)}],
guard: fn %{n: n, m: m} -> %{sum: {:add_term, {:num, n + m}}} end
),
Rule.rule(:fold_mul,
body: [
{:mul, [v(:r), v(:x), v(:y)]},
{:num, [v(:x), v(:n)]},
{:num, [v(:y), v(:m)]}
],
actions: [{:union, v(:r), v(:prod)}],
guard: fn %{n: n, m: m} -> %{prod: {:add_term, {:num, n * m}}} end
)
]
db = Quail.new()
# (3 + 0) * 1 — should simplify to 3.
{root, db} = Quail.add_term(db, {:mul, {:add, {:num, 3}, {:num, 0}}, {:num, 1}})
result = Quail.run(db, rules, iter_limit: 20)
{:ok, extracted} = Quail.extract(result.database, root)
IO.puts("(3 + 0) * 1 => #{inspect(extracted)}")
IO.puts("#{result.iterations} iterations (#{result.stop_reason})")
Analyzers: Analysis as Relations
An analyzer bridges the e-graph and Datalog worlds. It watches e-nodes as they’re added and populates a function relation with analysis results. Because the relation uses function semantics, merging two e-classes automatically merges their analysis values via a user-provided merge function.
This is the egglog approach to abstract interpretation: instead of a separate analysis pass, the analysis lives inside the database and participates in saturation.
alias Quail.{Analyzer, Database, Relation, Rule}
# A constant-value analyzer: tracks which e-classes have known constant values.
# The merge function handles conflicts — if two merged classes had different
# constants, the result is :top (unknown).
const_analyzer = Analyzer.new(:const_val,
arity: 2,
merge: fn a, b -> if a == b, do: a, else: :top end,
makers: %{
num: fn _db, class_id, _enode ->
[{class_id, hd(_enode.data)}]
end
}
)
db = Quail.new()
db = Database.register_analyzer(db, const_analyzer)
# Add some terms. The analyzer automatically populates const_val rows.
{id_3, db} = Quail.add_term(db, {:num, 3})
{id_4, db} = Quail.add_term(db, {:num, 4})
{id_x, db} = Quail.add_term(db, {:var, :x})
IO.puts("const_val relation after adding terms:")
for {class, val} <- Relation.to_list(db.relations[:const_val]) do
IO.puts(" class #{class} => #{inspect(val)}")
end
# :var has no const_val row — the analyzer's maker only fires for :num.
IO.puts("\nvar :x (class #{id_x}) has no constant value")
Constant Propagation with Analyzer Rules
Analyzers can carry derived rules — Datalog rules that fire based on the analyzer’s relation. This lets the analysis feed back into the e-graph, creating a closed loop: e-nodes produce analysis facts, analysis facts trigger rules, rules merge e-classes, merging produces new e-nodes, and so on.
Here we build a constant propagation analyzer. The const_val relation tracks known constants per class. A derived rule says: if both children of an add node have known constant values, fold the result.
alias Quail.{Analyzer, Database, Rule}
# The derived rule: if add(r, x, y) and both x, y have const values, fold.
const_fold_rule = Rule.rule(:const_propagate,
body: [
{:add, [v(:r), v(:x), v(:y)]},
{:const_val, [v(:x), v(:n)]},
{:const_val, [v(:y), v(:m)]}
],
actions: [{:union, v(:r), v(:sum_class)}],
guard: fn %{n: n, m: m} ->
if is_number(n) and is_number(m) do
%{sum_class: {:add_term, {:num, n + m}}}
else
false
end
end
)
# Register analyzer with the derived rule attached.
const_analyzer = Analyzer.new(:const_val,
arity: 2,
merge: fn a, b -> if a == b, do: a, else: :top end,
makers: %{
num: fn _db, class_id, enode -> [{class_id, hd(enode.data)}] end
},
rules: [const_fold_rule]
)
db = Quail.new()
db = Database.register_analyzer(db, const_analyzer)
# Nested additions: (1 + 2) + (3 + 4)
# The analyzer should propagate constants bottom-up:
# num(1) → const_val(1), num(2) → const_val(2)
# add(1, 2) → fold to num(3) → const_val(3)
# num(3) → const_val(3), num(4) → const_val(4)
# add(3, 4) → fold to num(7) → const_val(7)
# add(num(3), num(7)) → fold to num(10)
{root, db} = Quail.add_term(db, {:add, {:add, {:num, 1}, {:num, 2}}, {:add, {:num, 3}, {:num, 4}}})
result = Quail.run(db, [], iter_limit: 20)
{:ok, extracted} = Quail.extract(result.database, root)
IO.puts("(1 + 2) + (3 + 4) => #{inspect(extracted)}")
IO.puts("#{result.iterations} iterations (#{result.stop_reason})")
Notice that we passed an empty rule list [] to Quail.run/3. The analyzer’s derived rule is automatically prepended by the runner — it’s part of the database, not something the caller needs to manage. This keeps the analysis self-contained: registering the analyzer is enough to enable constant propagation.
Call Graph Analysis
Beyond arithmetic, Datalog rules work well for program analysis tasks. Here’s a sketch: given a call graph, determine which functions are reachable from an entry point, and which are dead code.
This combines user relations (calls, entry) with derived relations (live, dead) — a pattern typical of static analysis frameworks like Doop or Soufflé.
alias Quail.{Database, Rule}
db = Quail.new()
# Seed the call graph.
# main calls foo, bar
# foo calls baz
# bar calls baz
# qux calls quux (disconnected from main)
db =
[
{:main, :foo}, {:main, :bar},
{:foo, :baz}, {:bar, :baz},
{:qux, :quux}
]
|> Enum.reduce(db, fn {caller, callee}, db ->
Database.insert_relation(db, :calls, {caller, callee})
end)
# Mark entry points and enumerate all functions.
db = Database.insert_relation(db, :entry, {:main})
all_fns = [:main, :foo, :bar, :baz, :qux, :quux]
db =
Enum.reduce(all_fns, db, fn f, db ->
Database.insert_relation(db, :function, {f})
end)
rules = [
# Entry points are live.
Rule.rule(:entry_live,
body: [{:entry, [v(:f)]}],
actions: [{:assert, :live, [v(:f)]}]
),
# If a live function calls g, then g is live.
Rule.rule(:call_live,
body: [
{:live, [v(:f)]},
{:calls, [v(:f), v(:g)]}
],
actions: [{:assert, :live, [v(:g)]}]
)
]
result = Quail.run(db, rules, iter_limit: 10)
live_fns =
Enum.filter(all_fns, fn f ->
Database.lookup_relation(result.database, :live, 0, f) != []
end)
dead_fns = all_fns -- live_fns
IO.puts("Call graph: main→{foo,bar}, foo→baz, bar→baz, qux→quux")
IO.puts("Entry: main")
IO.puts("Live: #{inspect(Enum.sort(live_fns))}")
IO.puts("Dead: #{inspect(Enum.sort(dead_fns))}")
The Anatomy of a Datalog Rule
Let’s break down the full anatomy of a Rule.rule/2 call:
Rule.rule(:name,
body: [ # Conjunctive query (all must match)
{:relation, [v(:x), v(:y)]}, # Goal: query a relation
{:op_name, [v(:result), v(:child)]}, # Can be op relation or user relation
],
actions: [ # What to do with each match
{:union, v(:a), v(:b)}, # Merge two e-classes
{:assert, :rel, [v(:x), v(:y)]}, # Insert into a relation
{:add, {:op, v(:x)}}, # Add a new term to the e-graph
],
guard: fn subst -> # Optional filter/extension
%{new_var: computed_value} # Return map to extend substitution
end
)
Body goals are evaluated left-to-right with index acceleration. Bound variables from earlier goals narrow the search in later goals — so put the most selective goals first.
Actions execute for every successful substitution. The {:add_term, term} sentinel in guard results lets you create new e-graph terms and bind their class IDs for use in :union actions.
Semi-naive evaluation ensures each derivation happens at most once per iteration. The three-layer relation design (rows / delta / new_rows) tracks which facts are new, avoiding redundant work.
What’s Next
The Datalog engine in Quail is the foundation for building real program analyses:
- Type inference — use relations to propagate type constraints through the e-graph, narrowing classes to their most specific types
- Points-to analysis — track which variables can point to which allocations, composing with e-graph equality
- Inlining heuristics — use call graph relations to decide which functions to inline, then express inlining as a rewrite rule
- Cost models — derive cost annotations via analyzers, then guide extraction toward cheaper implementations
The key insight of egglog is that these analyses don’t need a separate pass. They live inside the saturation loop, interleaving with rewrites. The e-graph discovers equivalences; Datalog rules derive facts about those equivalences; the facts trigger more rewrites. Everything composes.
See the slotted e-graphs notebook for binding-aware rewrites, and the colored e-graphs notebook for context-scoped equalities.