Planner-Worker-Reviewer: Delegating Research with Quality Gates
Setup
repo_root = Path.expand("..", __DIR__)
deps =
if File.exists?(Path.join(repo_root, "mix.exs")) do
[{:ptc_runner, path: repo_root}, {:llm_client, path: Path.join(repo_root, "llm_client")}]
else
[{:ptc_runner, "~> 0.6.0"}]
end
Mix.install(deps ++ [{:req_llm, "~> 1.0"}, {:kino, "~> 0.14"}], consolidate_protocols: false)
local_path = Path.join(__DIR__, "llm_setup.exs")
if File.exists?(local_path) do
Code.require_file(local_path)
else
%{body: code} = Req.get!("https://raw.githubusercontent.com/andreasronge/ptc_runner/main/livebooks/llm_setup.exs")
Code.eval_string(code)
end
setup = LLMSetup.setup()
setup = LLMSetup.choose_provider(setup)
my_llm = LLMSetup.choose_model(setup)
The fetch_page Tool
fetch_page = fn %{"url" => url} ->
case Req.get(url, redirect: true, max_redirects: 3, receive_timeout: 15_000) do
{:ok, %{status: 200, body: body}} when is_binary(body) ->
text =
body
|> String.replace(~r//s, "")
|> String.replace(~r/]*>.*?<\/script>/s, "")
|> String.replace(~r/]*>.*?<\/style>/s, "")
|> String.replace(~r/]*>.*?<\/nav>/s, "")
|> String.replace(~r/]*>.*?<\/header>/s, "")
|> String.replace(~r/]*>.*?<\/footer>/s, "")
|> String.replace(~r/<(br|\/p|\/div|\/li|\/h\d|\/tr|\/td|\/dt|\/dd)[^>]*>/i, "\n")
|> String.replace(~r/<[^>]+>/, " ")
|> String.replace(~r/&\w+;/, " ")
|> String.replace(~r/[ \t]+/, " ")
|> String.replace(~r/\n[ \t]*/, "\n")
|> String.replace(~r/\n{3,}/, "\n\n")
|> String.trim()
|> String.slice(0, 6000)
%{url: url, text: text}
{:ok, %{status: status}} ->
{:error, "HTTP #{status} for #{url}"}
{:error, reason} ->
{:error, "Request failed for #{url}: #{inspect(reason)}"}
end
end
fetch_page.(%{"url" => "https://elixir-lang.org"}) |> Map.update!(:text, &String.slice(&1, 0, 200))
Phase 1: Generate the Plan
alias PtcRunner.SubAgent
planner = SubAgent.new(
prompt: """
Research question: {{question}}
You have access to a `fetch_page` tool that retrieves web page text, plus
`(grep pattern text)` and `(grep-n pattern text)` for searching text.
Plan the steps needed to answer this question. Return exactly 3 steps:
one per research topic, plus a final compilation step.
""",
signature: "(question :string) -> {steps [:string]}",
max_turns: 1,
retry_turns: 1,
output: :json
)
question =
"What are the latest stable versions of Elixir and Erlang/OTP, and what are the key new features in each?"
{:ok, plan_step} = SubAgent.run(planner, llm: my_llm, context: %{question: question})
plan = plan_step.return["steps"]
IO.puts("=== Generated Plan ===")
Enum.with_index(plan, 1) |> Enum.each(fn {step, i} -> IO.puts(" #{i}. #{step}") end)
plan
Phase 2: The Three-Role Architecture
Three agents with distinct roles:
- Worker — fetches pages, extracts data. Multi-turn, has tools.
- Reviewer — judges worker output against criteria. Single-shot JSON, no tools.
- Planner — orchestrates worker→reviewer chains. Never sees raw results.
The planner writes PTC-Lisp programs that wire worker and reviewer together.
It defines a do-step helper that dispatches to the worker, reviews the result,
and reports progress. The planner handles retries at the orchestration level.
# Worker: multi-turn research agent for one focused task
worker = SubAgent.new(
prompt: """
{{task}}
## Rules
- Use `fetch_page` to retrieve web pages. For GitHub files, use raw URLs:
`https://raw.githubusercontent.com/OWNER/REPO/REF/PATH`
- Use `(grep pattern text)` or `(grep-n pattern text)` to search text.
Note: `grep-n` is a function name, NOT `grep -n`.
- Return a map with your findings as soon as you have useful data.
Partial results are better than no results.
""",
description: "Research worker: fetches web pages and extracts information for a specific task",
signature: "(task :string) -> :map",
tools: %{
"fetch_page" =>
{fetch_page,
signature: "(url :string) -> {url :string, text :string}",
description:
"Fetch a web page and return its text content. For GitHub files, use raw.githubusercontent.com URLs."}
},
max_turns: 3,
retry_turns: 1,
timeout: 30_000
)
# Reviewer: single-shot judge, returns structured verdict
reviewer = SubAgent.new(
prompt: """
You are a quality reviewer. Evaluate whether the research result
satisfies the acceptance criteria.
## Step
{{step}}
## Acceptance Criteria
{{criteria}}
## Result to Review
{{result}}
Be pragmatic: approve results that contain useful information even if
not perfect. Set approved=true with notes about gaps in the summary.
Only reject if the result is empty, completely wrong, or missing
critical information.
""",
description: "Reviews research results against acceptance criteria",
signature: "(step :string, criteria :string, result :string) -> {approved :bool, summary :string, feedback :string}",
max_turns: 1,
retry_turns: 2,
output: :json
)
# Planner: orchestrates worker→reviewer, reports progress
planner_executor = SubAgent.new(
prompt: """
Answer this research question: {{question}}
## Your Role
You are a planner-orchestrator. You delegate research to `research_worker`
and verify results with `reviewer`. You never fetch pages yourself.
## The `do-step` Pattern
Define a helper that dispatches to worker and reviewer (no retry — you
handle failures at the orchestration level):
(defn do-step [id task criteria]
(let [result (tool/research_worker {:task task})
result_str (str result)
review (tool/reviewer {:step task :criteria criteria :result result_str})]
(if (:approved review)
(do (step-done id (:summary review)) result)
(do (step-done id (str "REJECTED: " (:feedback review))) nil))))
## Rules
- Always define `do-step` first, then use it for each plan step.
- Use the step IDs from the Progress checklist.
- Batch independent steps in the same turn (e.g. steps 1 and 2 together).
- When threading results from earlier steps into later tasks, extract
specific fields with keywords (e.g. `(:version data)`, `(:features data)`)
and compose a readable task string. NEVER pass a raw map via `(str result)` —
the worker receives it as an opaque string it cannot parse.
Good: `(str "Compare Elixir " (:version elixir) " with Erlang " (:version erlang))`
Bad: `(str "Compare these: " elixir-data erlang-data)`
- Write specific but achievable acceptance criteria.
- If a step returns nil (rejected), you can retry with a refined task
on the next turn, or skip and work with what you have.
- Only `(return ...)` when all steps are done.
""",
signature: "(question :string) -> :map",
plan: plan,
tools: %{
"research_worker" => SubAgent.as_tool(worker),
"reviewer" => SubAgent.as_tool(reviewer)
},
max_turns: 8,
max_depth: 2,
timeout: 180_000
)
{result, step} =
SubAgent.run(planner_executor,
llm: my_llm,
context: %{question: question},
journal: %{},
max_heap: 2_500_000,
debug: true
)
SubAgent.Debug.print_trace(step, raw: true)
{result, step.return}
# Render interactive trace tree (agent hierarchy with expandable details)
PtcRunner.Kino.TraceTree.new(step)
if step.summaries && map_size(step.summaries) > 0 do
IO.puts("=== Summaries (#{map_size(step.summaries)} entries) ===\n")
Enum.each(step.summaries, fn {id, summary} ->
IO.puts(" [done] #{id}: #{summary}")
end)
else
IO.puts("(no summaries)")
end
IO.puts("")
if step.journal && map_size(step.journal) > 0 do
IO.puts("=== Journal (#{map_size(step.journal)} entries) ===\n")
Enum.each(step.journal, fn {id, value} ->
val_str = inspect(value, limit: 5, printable_limit: 120)
IO.puts(" [cached] #{id}: #{String.slice(val_str, 0, 120)}")
end)
else
IO.puts("(no journal entries)")
end
This livebook demonstrates the planner-worker-reviewer pattern — a three-role hierarchy:
- Planner (JSON, single-shot) generates a step list
- Planner-executor (multi-turn, PTC-Lisp) orchestrates the workflow
- Worker (multi-turn, PTC-Lisp) handles one focused research task
- Reviewer (single-shot, JSON) judges worker output against criteria
The do-step Pattern
The key insight: the planner never inspects raw worker output. Instead, it writes
a do-step helper that chains worker→reviewer mechanically:
worker produces result → reviewer judges against criteria → approved? → step-done
→ rejected? → planner retries or skips
This eliminates the problem where the planner marks steps done despite poor results. The reviewer is a dedicated judge with explicit acceptance criteria.
Three Roles, Clear Responsibilities
| Role | Mode | Sees | Decides |
|---|---|---|---|
| Planner | Multi-turn PTC-Lisp | Reviewer verdicts (via checklist) | What to research, acceptance criteria, data flow |
| Worker | Multi-turn PTC-Lisp | One focused task prompt | How to fetch and extract data |
| Reviewer | Single-shot JSON | Step + criteria + result | Whether result is sufficient |
Comparison with Other Patterns
| Plan-and-Execute | Planner-Worker | Planner-Worker-Reviewer | |
|---|---|---|---|
| Verification | Self-assessment | Planner judges | Dedicated reviewer |
| Planner sees raw results | Yes | Yes | No — only verdicts |
| Retry on poor results | Manual | Manual |
Planner-controlled via do-step |
| LLM calls per step | 1 | 2 | 2-4 (worker + reviewer) |
When to Use This Pattern
- Quality matters more than speed — the reviewer catches vague or incomplete results
- Acceptance criteria are definable — each step has clear success conditions
- Results vary in quality — web scraping, API calls, data extraction
- Audit trail needed — reviewer verdicts are structured, loggable data
Batching Independent Steps
The planner can dispatch independent steps in the same turn:
(def elixir-data (do-step "1" "Find Elixir version..." "Must include version, date, 3+ features"))
(def erlang-data (do-step "2" "Find Erlang version..." "Must include version, key features"))
Steps 1 and 2 run sequentially within one planner turn (PTC-Lisp is single-threaded), but the planner doesn’t need an extra turn to see their results — the reviewer handles verification inline. Dependent steps can reference earlier results:
(do-step "3"
(str "Compare Elixir " (:version elixir-data) " and Erlang " (:version erlang-data))
"Must compare both versions side by side")
Limitations
-
More LLM calls — 2-4 per step (worker turns + reviewer). The reviewer is cheap (single-shot JSON) but it adds up.
-
Prompt complexity — the planner must write the
do-stephelper correctly. If the LLM struggles with this, consider providing it as a built-in function. -
step-doneindefn— works becausedefncalls run in the same process. Would NOT work ifdo-stepwere called insidepmap/pcalls. -
Plan & Execute Livebook — Single-agent plan-and-execute pattern
-
Navigator Pattern Guide — Journaled tasks and planning design space
-
Composition Patterns — Dynamic agent creation and orchestration