Powered by AppSignal & Oban Pro

Setup

hybrid-chat-agent.livemd

Setup

This notebook builds on Build an AI Chat Agent. The key difference is that not every turn needs the same level of reasoning. Some turns should stay short and cheap. Others should slow down and think harder.

Mix.install([
  {:jido, "~> 2.1"},
  {:jido_ai, "~> 2.0"},
  {:req_llm, "~> 1.7"}
])

Logger.configure(level: :warning)

# Livebook imports can execute generated docs as doctests.
# Disable compiler docs until the current Jido Hex release drops the invalid signal_types/0 example.
Code.put_compiler_option(:docs, false)

Configure credentials

This notebook uses one OpenAI reasoning-capable model for both quick and deep turns. In Livebook, store OPENAI_API_KEY as a secret. Livebook exposes it as LB_OPENAI_API_KEY, so the cell below checks both names.

openai_key = System.get_env("LB_OPENAI_API_KEY") || System.get_env("OPENAI_API_KEY")

configured? =
  if is_binary(openai_key) do
    ReqLLM.put_key(:openai_api_key, openai_key)
    true
  else
    IO.puts("Set OPENAI_API_KEY or LB_OPENAI_API_KEY before running the chat cells.")
    false
  end

Define the hybrid chat agent

The agent stays simple. The hybrid behavior comes from how each request is sent, not from extra lifecycle hooks.

defmodule MyApp.HybridSupportAgent do
  use Jido.AI.Agent,
    name: "hybrid_support_agent",
    description: "Support chat agent that can escalate selected turns",
    tools: [],
    model: "openai:o4-mini",
    system_prompt: """
    You are a support engineer helping a developer-tools team triage user reports.
    Keep normal replies short and concrete.
    When the user asks for diagnosis or planning, reason carefully before answering.
    """
end

defmodule MyApp.HybridSupportChat do
  def quick_reply(pid, prompt) do
    MyApp.HybridSupportAgent.ask_sync(pid, prompt, timeout: 30_000)
  end

  def deep_reply(pid, prompt) do
    MyApp.HybridSupportAgent.ask_sync(
      pid,
      prompt,
      timeout: 60_000,
      llm_opts: [reasoning_effort: :high]
    )
  end
end

quick_reply/2 and deep_reply/2 both talk to the same agent process. The only difference is that the deep turn raises the request’s reasoning effort.

If your account uses a different OpenAI reasoning-capable model, swap the model string for another supported option such as openai:o3-mini or openai:gpt-5-mini.

Start the runtime and agent

case Jido.start() do
  {:ok, _} -> :ok
  {:error, {:already_started, _}} -> :ok
end

runtime = Jido.default_instance()
agent_id = "hybrid-chat-demo-#{System.unique_integer([:positive])}"

{:ok, pid} = Jido.start_agent(runtime, MyApp.HybridSupportAgent, id: agent_id)

Quick turn: summarize the report

Start with a lightweight turn. This should come back quickly and keep the answer short.

quick_turn =
  if configured? do
    MyApp.HybridSupportChat.quick_reply(
      pid,
      """
      A design partner says the command palette opens with Cmd+K, but arrow keys stop
      working after they enter a nested menu. Summarize the issue in one sentence and
      name the most likely affected area.
      """
    )
  else
    {:skip, :no_openai_key}
  end

IO.inspect(quick_turn, label: "Quick turn")
quick_turn_snapshot =
  if configured? do
    case Jido.AgentServer.status(pid) do
      {:ok, status} ->
        %{
          request_id: status.raw_state[:last_request_id],
          usage: status.snapshot.details[:usage] || %{},
          status: status.snapshot.status
        }

      other ->
        other
    end
  else
    {:skip, :no_openai_key}
  end

IO.inspect(quick_turn_snapshot, label: "Quick turn snapshot")

Deep turn: reason through causes and next steps

Reuse the same pid, but escalate this turn with reasoning_effort: :high. That keeps the conversation intact while asking the model to spend more effort on diagnosis.

deep_turn =
  if configured? do
    MyApp.HybridSupportChat.deep_reply(
      pid,
      """
      Based on everything in this conversation, reason through:
      1. the two most likely root causes
      2. the highest-signal debugging steps
      3. whether this should block Friday's design-partner beta

      Keep the answer structured and concrete.
      """
    )
  else
    {:skip, :no_openai_key}
  end

IO.inspect(deep_turn, label: "Deep turn")

Compare the quick-turn and deep-turn snapshots

The final answer is still just assistant text, but the runtime snapshot gives you a stable place to inspect the completed turn. On OpenAI reasoning-capable models, the deep turn usually shows much larger usage and reasoning-token counts than the quick turn.

deep_turn_snapshot =
  if configured? do
    case Jido.AgentServer.status(pid) do
      {:ok, status} ->
        %{
          request_id: status.raw_state[:last_request_id],
          usage: status.snapshot.details[:usage] || %{},
          status: status.snapshot.status
        }

      other ->
        other
    end
  else
    {:skip, :no_openai_key}
  end

turn_usage_comparison =
  case {quick_turn_snapshot, deep_turn_snapshot} do
    {%{usage: quick_usage}, %{usage: deep_usage}} ->
      %{
        quick_usage: quick_usage,
        deep_usage: deep_usage,
        reasoning_token_delta:
          (deep_usage[:reasoning_tokens] || 0) - (quick_usage[:reasoning_tokens] || 0),
        output_token_delta: (deep_usage[:output_tokens] || 0) - (quick_usage[:output_tokens] || 0)
      }

    _ ->
      %{quick_turn_snapshot: quick_turn_snapshot, deep_turn_snapshot: deep_turn_snapshot}
  end

IO.inspect(deep_turn_snapshot, label: "Deep turn snapshot")
IO.inspect(turn_usage_comparison, label: "Turn usage comparison")

Some providers may also expose separate reasoning traces, but that is not guaranteed. The snapshot and usage fields above are the stable inspection points for this guide.

Quick turn again: draft the user-facing reply

After the deeper reasoning step, drop back to a short turn on the same conversation.

final_quick_turn =
  if configured? do
    MyApp.HybridSupportChat.quick_reply(
      pid,
      """
      Draft a three-sentence update for the design partner.
      Acknowledge the bug, say what we are checking next, and avoid over-promising.
      """
    )
  else
    {:skip, :no_openai_key}
  end

IO.inspect(final_quick_turn, label: "Final quick turn")

This is the whole pattern: quick turn, deep turn, quick turn again, all on one agent pid.

Inspect the stored conversation

Once the turns work, inspect the stored context and confirm the agent kept the whole thread.

conversation =
  case Jido.AgentServer.status(pid) do
    {:ok, status} ->
      status.snapshot.details[:conversation] || []

    other ->
      other
  end

IO.inspect(conversation, label: "Conversation")

When to use this pattern

Use this pattern when:

  • most turns are ordinary chat replies
  • some turns need extra diagnostic or planning effort
  • you want one conversation thread without juggling multiple agents

Do not start with request_transformer or model-routing plugins here. Those are the advanced follow-up once the manual escalation pattern is working.

Verification

  1. Run the quick turn and confirm it returns a short summary.
  2. Run the deep turn on the same pid and confirm it gives a more structured diagnostic answer.
  3. Run the final quick turn and confirm it drafts a shorter partner-facing update.
  4. Inspect conversation and confirm it includes all three turns.
  5. Inspect turn_usage_comparison and confirm the deep turn used more tokens than the quick turn.

What to try next

  • Start with Build an AI Chat Agent if you want the simpler one-pid chat pattern first.
  • Continue to AI Agent with Tools when the deep turn should call actions instead of reasoning from text alone.
  • Reach for request_transformer only after this manual escalation pattern is clear.