Notesclub

created by hec & contributors

terms privacy

Agent Swarm Patterns

notebooks/agent_swarm_patterns.livemd

Bradley Fargo

@blasphemetheus

edifice

Share to X

Share to Bluesky

More notebooks

Agent Swarm Patterns

Setup

Choose one of the two cells below depending on how you started Livebook.

Standalone (default)

Use this if you started Livebook normally (livebook server).

edifice_dep =
  if File.dir?(Path.expand("~/edifice")) do
    {:edifice, path: Path.expand("~/edifice")}
  else
    {:edifice, "~> 0.2.0"}
  end

Mix.install([
  edifice_dep,
  # {:exla, "~> 0.10"},
  {:kino, "~> 0.14"}
])

# Nx.global_default_backend(EXLA.Backend)

Attached to project (recommended for Nix/CUDA)

Use this if you started Livebook via ./scripts/livebook.sh.

Nx.global_default_backend(EXLA.Backend)
IO.puts("Attached mode — using EXLA backend from project node")

Introduction

Edifice includes four agent swarm building blocks — neural architecture modules that implement multi-agent coordination patterns as differentiable, end-to-end trainable components. These are not an orchestration framework (like LangGraph or CrewAI). They’re the neural backbone that agents use to process information, communicate, and make decisions.

What you’ll learn

How each module structures inter-agent communication differently
The shape flow through each architecture (what goes in, what comes out)
Parameter counts at T400-friendly dimensions
How StatefulAgent maintains memory across turns
How MessagePassingAgents supports sparse communication topologies

All examples use tiny dimensions (hidden=32, seq=8, batch=2) so they run instantly on CPU or a 2GB GPU like the T400.

# Shared dimensions — intentionally small for fast iteration
batch = 2
seq_len = 8
embed_dim = 32
hidden = 32
num_agents = 3
num_heads = 4

IO.puts("Shared config:")
IO.puts("  batch=#{batch}  seq=#{seq_len}  embed=#{embed_dim}  hidden=#{hidden}")
IO.puts("  agents=#{num_agents}  heads=#{num_heads}")

Helper: Build, Run, Inspect

defmodule SwarmHelper do
  def run_model(name, model, inputs) do
    templates = Map.new(inputs, fn {k, v} -> {k, Nx.template(Nx.shape(v), Nx.type(v))} end)
    {init_fn, predict_fn} = Axon.build(model, mode: :inference)
    params = init_fn.(templates, Axon.ModelState.empty())
    output = predict_fn.(params, inputs)
    param_count = count_params(params)

    IO.puts("#{name}")
    IO.puts("  Inputs:")

    for {k, v} <- inputs do
      IO.puts("    #{k}: #{inspect(Nx.shape(v))}")
    end

    IO.puts("  Output: #{format_output(output)}")
    IO.puts("  Params: #{fmt(param_count)}")
    IO.puts("")

    {output, params, predict_fn}
  end

  def count_params(%Axon.ModelState{} = state) do
    state |> Axon.ModelState.trainable_parameters() |> count_nested(0)
  end

  defp count_nested(%Nx.Tensor{} = t, acc), do: acc + Nx.size(t)
  defp count_nested(map, acc) when is_map(map) do
    Enum.reduce(map, acc, fn {_k, v}, a -> count_nested(v, a) end)
  end
  defp count_nested(_other, acc), do: acc

  def fmt(n) when n >= 1_000_000, do: "#{Float.round(n / 1_000_000, 1)}M"
  def fmt(n) when n >= 1_000, do: "#{Float.round(n / 1_000, 1)}K"
  def fmt(n), do: "#{n}"

  defp format_output({a, b}), do: "{#{inspect(Nx.shape(a))}, #{inspect(Nx.shape(b))}}"
  defp format_output(%Nx.Tensor{} = t), do: inspect(Nx.shape(t))
  defp format_output(other), do: inspect(other)
end

1. AgentSwarm — Multi-Agent Debate

The AgentSwarm implements the “multi-agent debate” pattern: N independent agent transformer stacks produce proposals, then agents attend to each other across R communication rounds, and an aggregator transformer merges everything into a final output.

What to look for

The output is [batch, aggregator_hidden] — a single consensus vector
More communication rounds (R) means more debate iterations
Parameter count scales with num_agents * agent_layers (each agent has its own weights)

alias Edifice.Meta.AgentSwarm

model = AgentSwarm.build(
  embed_dim: embed_dim,
  num_agents: num_agents,
  agent_hidden_size: hidden,
  agent_layers: 1,
  communication_rounds: 2,
  aggregator_hidden_size: hidden,
  aggregator_layers: 1,
  num_heads: num_heads,
  dropout: 0.0,
  window_size: seq_len
)

input = Nx.broadcast(0.5, {batch, seq_len, embed_dim})

{swarm_output, _params, _predict} =
  SwarmHelper.run_model("AgentSwarm (3 agents, 2 comm rounds)", model, %{
    "state_sequence" => input
  })

IO.puts("Output sample (first 8 values):")
swarm_output |> Nx.slice_along_axis(0, 1, axis: 0) |> Nx.flatten() |> Nx.slice([0], [8]) |> Nx.to_list() |> Enum.map(&Float.round(&1, 4)) |> IO.inspect()

2. RouterNetwork — Specialist Dispatch

The RouterNetwork learns to route inputs to specialist sub-models. A small MLP router scores each specialist, then either soft-mixes all outputs (weighted sum) or hard-selects top-k specialists. Think mixture-of-experts but at the full-model level.

What to look for

Soft routing: all specialists contribute (weighted by router scores)
Hard routing (top_k): only k specialists run, the rest are zeroed
The router is tiny (2-layer MLP on mean-pooled input) — most params are in specialists

alias Edifice.Meta.RouterNetwork

# Soft routing — all specialists contribute
soft_model = RouterNetwork.build(
  embed_dim: embed_dim,
  num_specialists: num_agents,
  specialist_hidden_size: hidden,
  specialist_layers: 1,
  num_heads: num_heads,
  dropout: 0.0,
  routing: :soft,
  window_size: seq_len
)

{soft_out, _, _} =
  SwarmHelper.run_model("RouterNetwork (soft, 3 specialists)", soft_model, %{
    "state_sequence" => input
  })

# Hard routing — only top 1 specialist
hard_model = RouterNetwork.build(
  embed_dim: embed_dim,
  num_specialists: num_agents,
  specialist_hidden_size: hidden,
  specialist_layers: 1,
  num_heads: num_heads,
  dropout: 0.0,
  routing: {:top_k, 1},
  window_size: seq_len
)

{hard_out, _, _} =
  SwarmHelper.run_model("RouterNetwork (top_k=1, 3 specialists)", hard_model, %{
    "state_sequence" => input
  })

# Compare: soft blends all, hard picks one
diff = Nx.subtract(soft_out, hard_out) |> Nx.abs() |> Nx.reduce_max() |> Nx.to_number()
IO.puts("Max difference between soft and hard routing: #{Float.round(diff, 6)}")
IO.puts("(Non-zero means the routing strategies produce different outputs — expected!)")

3. StatefulAgent — Multi-Turn Memory

The StatefulAgent wraps a backbone transformer with persistent state that carries across turns. Each turn takes the current input + previous state and produces output + updated state. Three state update modes:

Compressive: Gated linear blend (inspired by InfiniAttention)
EMA: Exponential moving average (simplest)
GRU: Full GRU cell update (most expressive)

What to look for

The model returns a tuple: {output, new_state}
State shape is separate from output shape (state_size vs hidden_size)
State should change between turns — if it doesn’t, memory isn’t working
Try: same input both turns vs different inputs — state should diverge more with different inputs

alias Edifice.Meta.StatefulAgent

state_size = 16

model = StatefulAgent.build(
  embed_dim: embed_dim,
  hidden_size: hidden,
  num_layers: 1,
  num_heads: num_heads,
  state_size: state_size,
  state_mode: :compressive,
  dropout: 0.0,
  window_size: seq_len
)

zero_state = Nx.broadcast(0.0, {batch, state_size})

templates = %{
  "state_sequence" => Nx.template({batch, seq_len, embed_dim}, :f32),
  "agent_state" => Nx.template({batch, state_size}, :f32)
}

{init_fn, predict_fn} = Axon.build(model, mode: :inference)
params = init_fn.(templates, Axon.ModelState.empty())
param_count = SwarmHelper.count_params(params)

IO.puts("StatefulAgent (compressive, state_size=#{state_size})")
IO.puts("  Params: #{SwarmHelper.fmt(param_count)}")

# Simulate 3 turns with random input
key = Nx.Random.key(42)
state = zero_state

for turn <- 1..3, reduce: {key, state} do
  {k, prev_state} ->
    {turn_input, k} = Nx.Random.uniform(k, -1.0, 1.0,
      shape: {batch, seq_len, embed_dim}, type: {:f, 32})

    {output, new_state} = predict_fn.(params, %{
      "state_sequence" => turn_input,
      "agent_state" => prev_state
    })

    state_norm = Nx.mean(Nx.abs(new_state)) |> Nx.to_number() |> Float.round(4)
    state_delta =
      Nx.subtract(new_state, prev_state)
      |> Nx.abs() |> Nx.reduce_max() |> Nx.to_number() |> Float.round(4)

    IO.puts("  Turn #{turn}: output=#{inspect(Nx.shape(output))}  " <>
            "state_norm=#{state_norm}  state_delta=#{state_delta}")

    {k, new_state}
end

IO.puts("\n  State should change each turn (state_delta > 0) — that's memory working!")

Compare state modes

# Quick comparison of all 3 state modes
for mode <- [:compressive, :ema, :gru] do
  m = StatefulAgent.build(
    embed_dim: embed_dim,
    hidden_size: hidden,
    num_layers: 1,
    num_heads: num_heads,
    state_size: state_size,
    state_mode: mode,
    dropout: 0.0,
    window_size: seq_len
  )

  {init, pred} = Axon.build(m, mode: :inference)
  p = init.(templates, Axon.ModelState.empty())
  pc = SwarmHelper.count_params(p)

  key = Nx.Random.key(99)
  {inp, _} = Nx.Random.uniform(key, -1.0, 1.0,
    shape: {batch, seq_len, embed_dim}, type: {:f, 32})

  {_out, st1} = pred.(p, %{"state_sequence" => inp, "agent_state" => zero_state})

  state_mag = Nx.mean(Nx.abs(st1)) |> Nx.to_number() |> Float.round(4)
  IO.puts("  #{String.pad_trailing(to_string(mode), 14)} params=#{String.pad_trailing(SwarmHelper.fmt(pc), 8)} state_magnitude=#{state_mag}")
end

4. MessagePassingAgents — Graph Communication

The MessagePassingAgents models agents as nodes in a graph. Instead of all-to-all attention (AgentSwarm), communication follows edges in an adjacency matrix. Each round: project sender/receiver features, aggregate neighbor messages via the adjacency, update node state with a GRU cell.

This is the right choice when you want structured communication — not every agent needs to talk to every other agent.

What to look for

Different adjacency matrices → different communication patterns → different outputs
Fully-connected (all 1s) approximates AgentSwarm’s all-to-all pattern
Ring topology (each agent talks to neighbors only) produces sparser information flow
Isolated agents (no edges) only see their own proposal — no communication benefit

alias Edifice.Meta.MessagePassingAgents

model = MessagePassingAgents.build(
  embed_dim: embed_dim,
  num_agents: num_agents,
  agent_hidden_size: hidden,
  agent_layers: 1,
  message_rounds: 3,
  output_size: hidden,
  num_heads: num_heads,
  dropout: 0.0,
  aggregation: :mean,
  pool_mode: :mean,
  window_size: seq_len
)

adj_templates = %{
  "state_sequence" => Nx.template({batch, seq_len, embed_dim}, :f32),
  "adjacency" => Nx.template({batch, num_agents, num_agents}, :f32)
}

{init_fn, predict_fn} = Axon.build(model, mode: :inference)
params = init_fn.(adj_templates, Axon.ModelState.empty())
param_count = SwarmHelper.count_params(params)

IO.puts("MessagePassingAgents (3 agents, 3 rounds)")
IO.puts("  Params: #{SwarmHelper.fmt(param_count)}\n")

# Use random input so different topologies produce meaningfully different results
key = Nx.Random.key(77)
{rand_input, _} = Nx.Random.uniform(key, -1.0, 1.0,
  shape: {batch, seq_len, embed_dim}, type: {:f, 32})

# Compare 3 topologies
topologies = %{
  "Fully-connected" => Nx.broadcast(1.0, {batch, num_agents, num_agents}),
  "Ring" => Nx.stack([
    Nx.tensor([[1, 1, 0], [1, 1, 1], [0, 1, 1]], type: :f32),
    Nx.tensor([[1, 1, 0], [1, 1, 1], [0, 1, 1]], type: :f32)
  ]),
  "Isolated" => Nx.stack([
    Nx.eye(num_agents, type: :f32),
    Nx.eye(num_agents, type: :f32)
  ])
}

outputs =
  for {name, adj} <- topologies do
    out = predict_fn.(params, %{"state_sequence" => rand_input, "adjacency" => adj})
    out_norm = Nx.mean(Nx.abs(out)) |> Nx.to_number() |> Float.round(4)
    IO.puts("  #{String.pad_trailing(name, 18)} output_norm=#{out_norm}")
    {name, out}
  end

# Pairwise differences
IO.puts("")
pairs = for {n1, o1} <- outputs, {n2, o2} <- outputs, n1 < n2 do
  diff = Nx.subtract(o1, o2) |> Nx.abs() |> Nx.reduce_max() |> Nx.to_number() |> Float.round(4)
  IO.puts("  #{n1} vs #{n2}: max_diff=#{diff}")
  diff
end

IO.puts("\nDifferent topologies should produce different outputs.")
IO.puts("Isolated agents get no messages — their output diverges most from fully-connected.")

Summary

IO.puts("=== Agent Swarm Building Blocks ===\n")
IO.puts("Module                    Pattern                         Inputs")
IO.puts("─────────────────────────────────────────────────────────────────────────")
IO.puts("AgentSwarm                Dense all-to-all attention       state_sequence")
IO.puts("RouterNetwork             Learned specialist dispatch      state_sequence")
IO.puts("StatefulAgent             Persistent memory across turns   state_sequence + agent_state")
IO.puts("MessagePassingAgents      Graph-structured communication   state_sequence + adjacency")
IO.puts("")
IO.puts("All four are neural building blocks — combine them with any Edifice")
IO.puts("backbone (Mamba, GRU, Transformer, etc.) for your specific use case.")
IO.puts("")
IO.puts("Suggested combinations:")
IO.puts("  • Melee bot: StatefulAgent + MinGRU backbone (fast, remembers game state)")
IO.puts("  • Strategy game: MessagePassingAgents (unit types as nodes, command hierarchy as edges)")
IO.puts("  • Ensemble LLM: AgentSwarm with 3-5 agents + RouterNetwork for final routing")

Other notebooks:

Dr. Christian Geuer-Pollmann
@chgeuer

livebook_on_azure

Christian's first LiveBook test

notebook1.livemd

tutorial advanced data-science axon exla nx

2022-8-18
@andyl

elix_util

MNIST

mnist.livemd

tutorial advanced data-science req axon exla nx

2022-8-18
@TomBers

livebookNotes

Attractors

attractors.livemd

advanced data-science decimal vega_lite kino

2022-8-18
@TomBers

livebookNotes

Trying Nx

NX.livemd

advanced data-science exla axon nx

2022-8-18
Shozo Fukuda
@shoz-f

tfl_interp

Monocular Depth Estimation: MiDaS v2.1

Midas.livemd

advanced data-science tfl_interp cimg nx kino

2026-7-7
Chrisna Adhi Pranoto
@nethricia

curriculum

Rock Paper Scissors Genserver

deprecated_rock_paper_scissors_genserver.livemd

tutorial advanced gen-server jason kino youtube hidden_cell

2026-7-7
NISHIGUCHI Masatoshi
@mnishiguchi

livebooks

Elixir downloading files

downloader.livemd

tutorial intermediate bumblebee req kino

2024-2-28

Back