Powered by AppSignal & Oban Pro

Planner-Worker-Reviewer: Delegating Research with Quality Gates

livebooks/planner_worker.livemd

Planner-Worker-Reviewer: Delegating Research with Quality Gates

Setup

repo_root = Path.expand("..", __DIR__)

deps =
  if File.exists?(Path.join(repo_root, "mix.exs")) do
    [{:ptc_runner, path: repo_root}, {:llm_client, path: Path.join(repo_root, "llm_client")}]
  else
    [{:ptc_runner, "~> 0.6.0"}]
  end

Mix.install(deps ++ [{:req_llm, "~> 1.0"}, {:kino, "~> 0.14"}], consolidate_protocols: false)
local_path = Path.join(__DIR__, "llm_setup.exs")

if File.exists?(local_path) do
  Code.require_file(local_path)
else
  %{body: code} = Req.get!("https://raw.githubusercontent.com/andreasronge/ptc_runner/main/livebooks/llm_setup.exs")
  Code.eval_string(code)
end

setup = LLMSetup.setup()
setup = LLMSetup.choose_provider(setup)
my_llm = LLMSetup.choose_model(setup)

The fetch_page Tool

fetch_page = fn %{"url" => url} ->
  case Req.get(url, redirect: true, max_redirects: 3, receive_timeout: 15_000) do
    {:ok, %{status: 200, body: body}} when is_binary(body) ->
      text =
        body
        |> String.replace(~r//s, "")
        |> String.replace(~r/]*>.*?<\/script>/s, "")
        |> String.replace(~r/]*>.*?<\/style>/s, "")
        |> String.replace(~r/]*>.*?<\/nav>/s, "")
        |> String.replace(~r/]*>.*?<\/header>/s, "")
        |> String.replace(~r/]*>.*?<\/footer>/s, "")
        |> String.replace(~r/<(br|\/p|\/div|\/li|\/h\d|\/tr|\/td|\/dt|\/dd)[^>]*>/i, "\n")
        |> String.replace(~r/<[^>]+>/, " ")
        |> String.replace(~r/&\w+;/, " ")
        |> String.replace(~r/[ \t]+/, " ")
        |> String.replace(~r/\n[ \t]*/, "\n")
        |> String.replace(~r/\n{3,}/, "\n\n")
        |> String.trim()
        |> String.slice(0, 6000)

      %{url: url, text: text}

    {:ok, %{status: status}} ->
      {:error, "HTTP #{status} for #{url}"}

    {:error, reason} ->
      {:error, "Request failed for #{url}: #{inspect(reason)}"}
  end
end

fetch_page.(%{"url" => "https://elixir-lang.org"}) |> Map.update!(:text, &amp;String.slice(&amp;1, 0, 200))

Phase 1: Generate the Plan

alias PtcRunner.SubAgent

planner = SubAgent.new(
  prompt: """
  Research question: {{question}}

  You have access to a `fetch_page` tool that retrieves web page text, plus
  `(grep pattern text)` and `(grep-n pattern text)` for searching text.

  Plan the steps needed to answer this question. Return exactly 3 steps:
  one per research topic, plus a final compilation step.
  """,
  signature: "(question :string) -> {steps [:string]}",
  max_turns: 1,
  retry_turns: 1,
  output: :json
)

question =
  "What are the latest stable versions of Elixir and Erlang/OTP, and what are the key new features in each?"

{:ok, plan_step} = SubAgent.run(planner, llm: my_llm, context: %{question: question})

plan = plan_step.return["steps"]
IO.puts("=== Generated Plan ===")
Enum.with_index(plan, 1) |> Enum.each(fn {step, i} -> IO.puts("  #{i}. #{step}") end)

plan

Phase 2: The Three-Role Architecture

Three agents with distinct roles:

  • Worker — fetches pages, extracts data. Multi-turn, has tools.
  • Reviewer — judges worker output against criteria. Single-shot JSON, no tools.
  • Planner — orchestrates worker→reviewer chains. Never sees raw results.

The planner writes PTC-Lisp programs that wire worker and reviewer together. It defines a do-step helper that dispatches to the worker, reviews the result, and reports progress. The planner handles retries at the orchestration level.

# Worker: multi-turn research agent for one focused task
worker = SubAgent.new(
  prompt: """
  {{task}}

  ## Rules
  - Use `fetch_page` to retrieve web pages. For GitHub files, use raw URLs:
    `https://raw.githubusercontent.com/OWNER/REPO/REF/PATH`
  - Use `(grep pattern text)` or `(grep-n pattern text)` to search text.
    Note: `grep-n` is a function name, NOT `grep -n`.
  - Return a map with your findings as soon as you have useful data.
    Partial results are better than no results.
  """,
  description: "Research worker: fetches web pages and extracts information for a specific task",
  signature: "(task :string) -> :map",
  tools: %{
    "fetch_page" =>
      {fetch_page,
       signature: "(url :string) -> {url :string, text :string}",
       description:
         "Fetch a web page and return its text content. For GitHub files, use raw.githubusercontent.com URLs."}
  },
  max_turns: 3,
  retry_turns: 1,
  timeout: 30_000
)

# Reviewer: single-shot judge, returns structured verdict
reviewer = SubAgent.new(
  prompt: """
  You are a quality reviewer. Evaluate whether the research result
  satisfies the acceptance criteria.

  ## Step
  {{step}}

  ## Acceptance Criteria
  {{criteria}}

  ## Result to Review
  {{result}}

  Be pragmatic: approve results that contain useful information even if
  not perfect. Set approved=true with notes about gaps in the summary.
  Only reject if the result is empty, completely wrong, or missing
  critical information.
  """,
  description: "Reviews research results against acceptance criteria",
  signature: "(step :string, criteria :string, result :string) -> {approved :bool, summary :string, feedback :string}",
  max_turns: 1,
  retry_turns: 2,
  output: :json
)

# Planner: orchestrates worker→reviewer, reports progress
planner_executor = SubAgent.new(
  prompt: """
  Answer this research question: {{question}}

  ## Your Role
  You are a planner-orchestrator. You delegate research to `research_worker`
  and verify results with `reviewer`. You never fetch pages yourself.

  ## The `do-step` Pattern
  Define a helper that dispatches to worker and reviewer (no retry — you
  handle failures at the orchestration level):

  (defn do-step [id task criteria]
    (let [result (tool/research_worker {:task task})
          result_str (str result)
          review (tool/reviewer {:step task :criteria criteria :result result_str})]
      (if (:approved review)
        (do (step-done id (:summary review)) result)
        (do (step-done id (str "REJECTED: " (:feedback review))) nil))))

  ## Rules
  - Always define `do-step` first, then use it for each plan step.
  - Use the step IDs from the Progress checklist.
  - Batch independent steps in the same turn (e.g. steps 1 and 2 together).
  - When threading results from earlier steps into later tasks, extract
    specific fields with keywords (e.g. `(:version data)`, `(:features data)`)
    and compose a readable task string. NEVER pass a raw map via `(str result)` —
    the worker receives it as an opaque string it cannot parse.
    Good: `(str "Compare Elixir " (:version elixir) " with Erlang " (:version erlang))`
    Bad: `(str "Compare these: " elixir-data erlang-data)`
  - Write specific but achievable acceptance criteria.
  - If a step returns nil (rejected), you can retry with a refined task
    on the next turn, or skip and work with what you have.
  - Only `(return ...)` when all steps are done.
  """,
  signature: "(question :string) -> :map",
  plan: plan,
  tools: %{
    "research_worker" => SubAgent.as_tool(worker),
    "reviewer" => SubAgent.as_tool(reviewer)
  },
  max_turns: 8,
  max_depth: 2,
  timeout: 180_000
)

{result, step} =
  SubAgent.run(planner_executor,
    llm: my_llm,
    context: %{question: question},
    journal: %{},
    max_heap: 2_500_000,
    debug: true
  )

SubAgent.Debug.print_trace(step, raw: true)

{result, step.return}
# Render interactive trace tree (agent hierarchy with expandable details)
PtcRunner.Kino.TraceTree.new(step)
if step.summaries &amp;&amp; map_size(step.summaries) > 0 do
  IO.puts("=== Summaries (#{map_size(step.summaries)} entries) ===\n")

  Enum.each(step.summaries, fn {id, summary} ->
    IO.puts("  [done] #{id}: #{summary}")
  end)
else
  IO.puts("(no summaries)")
end

IO.puts("")

if step.journal &amp;&amp; map_size(step.journal) > 0 do
  IO.puts("=== Journal (#{map_size(step.journal)} entries) ===\n")

  Enum.each(step.journal, fn {id, value} ->
    val_str = inspect(value, limit: 5, printable_limit: 120)
    IO.puts("  [cached] #{id}: #{String.slice(val_str, 0, 120)}")
  end)
else
  IO.puts("(no journal entries)")
end

This livebook demonstrates the planner-worker-reviewer pattern — a three-role hierarchy:

  1. Planner (JSON, single-shot) generates a step list
  2. Planner-executor (multi-turn, PTC-Lisp) orchestrates the workflow
  3. Worker (multi-turn, PTC-Lisp) handles one focused research task
  4. Reviewer (single-shot, JSON) judges worker output against criteria

The do-step Pattern

The key insight: the planner never inspects raw worker output. Instead, it writes a do-step helper that chains worker→reviewer mechanically:

worker produces result → reviewer judges against criteria → approved? → step-done
                                                          → rejected? → planner retries or skips

This eliminates the problem where the planner marks steps done despite poor results. The reviewer is a dedicated judge with explicit acceptance criteria.

Three Roles, Clear Responsibilities

Role Mode Sees Decides
Planner Multi-turn PTC-Lisp Reviewer verdicts (via checklist) What to research, acceptance criteria, data flow
Worker Multi-turn PTC-Lisp One focused task prompt How to fetch and extract data
Reviewer Single-shot JSON Step + criteria + result Whether result is sufficient

Comparison with Other Patterns

Plan-and-Execute Planner-Worker Planner-Worker-Reviewer
Verification Self-assessment Planner judges Dedicated reviewer
Planner sees raw results Yes Yes No — only verdicts
Retry on poor results Manual Manual Planner-controlled via do-step
LLM calls per step 1 2 2-4 (worker + reviewer)

When to Use This Pattern

  • Quality matters more than speed — the reviewer catches vague or incomplete results
  • Acceptance criteria are definable — each step has clear success conditions
  • Results vary in quality — web scraping, API calls, data extraction
  • Audit trail needed — reviewer verdicts are structured, loggable data

Batching Independent Steps

The planner can dispatch independent steps in the same turn:

(def elixir-data (do-step "1" "Find Elixir version..." "Must include version, date, 3+ features"))
(def erlang-data (do-step "2" "Find Erlang version..." "Must include version, key features"))

Steps 1 and 2 run sequentially within one planner turn (PTC-Lisp is single-threaded), but the planner doesn’t need an extra turn to see their results — the reviewer handles verification inline. Dependent steps can reference earlier results:

(do-step "3"
  (str "Compare Elixir " (:version elixir-data) " and Erlang " (:version erlang-data))
  "Must compare both versions side by side")

Limitations

  • More LLM calls — 2-4 per step (worker turns + reviewer). The reviewer is cheap (single-shot JSON) but it adds up.

  • Prompt complexity — the planner must write the do-step helper correctly. If the LLM struggles with this, consider providing it as a built-in function.

  • step-done in defn — works because defn calls run in the same process. Would NOT work if do-step were called inside pmap/pcalls.

  • Plan & Execute Livebook — Single-agent plan-and-execute pattern

  • Navigator Pattern Guide — Journaled tasks and planning design space

  • Composition Patterns — Dynamic agent creation and orchestration