Notesclub

created by hec & contributors

terms privacy

Bayesian Poker: Opponent Profiling via NUTS

notebooks/12_poker_bayesian.livemd

Igor O'sten

@borodark

eXMC

Share to X

Share to Bluesky

More notebooks

Bayesian Poker: Opponent Profiling via NUTS

Section

Probabiliers de tous les a priori, unissez-vous !

What This Notebook Does

You sit down at a poker table. Five strangers. You know nothing about them. After 80 hands, you know everything — not as point estimates (“she raises 22% of the time”) but as full posterior distributions (“her raise rate is between 18% and 27% with 95% probability, and she bluffs more on the turn than the flop”).

This notebook builds a hierarchical Bayesian model of poker opponents. The model has four hidden parameters per player (VPIP, PFR, aggression, bluff frequency) and population-level hyperparameters that share information across players. NUTS explores the 20-dimensional posterior. The result: probability distributions over each opponent’s tendencies, used to compute expected value for every decision.

Why this matters beyond poker: This is the same pattern as any system where you observe behavior and infer hidden parameters. Manufacturing quality (workers have hidden defect rates), customer segmentation (customers have hidden preferences), clinical trials (patients have hidden treatment responses). The poker framing makes the math tangible.

Setup


# CPU only — no GPU required
System.put_env("EXLA_CPU_ONLY", "true")
System.put_env("CUDA_VISIBLE_DEVICES", "")
Mix.install([
  {:exmc, path: Path.expand("../", __DIR__)},
  {:exla, "~> 0.10"},
  {:kino_vega_lite, "~> 0.1"}
])
Application.put_env(:exla, :clients, host: [platform: :host])
Application.put_env(:exla, :default_client, :host)
Nx.default_backend(Nx.BinaryBackend)
Nx.Defn.default_options(compiler: EXLA, client: :host)


alias Exmc.Poker
alias Exmc.Poker.{Cards, ActionModel, Simulator, OpponentModel, Decision}
alias VegaLite, as: Vl
:ok

The Hidden Parameters

Every poker player has hidden tendencies. You cannot observe them directly — you can only observe their actions (fold, call, raise) in response to situations. The inference problem: given 80 hands of observed actions, what are the most probable values of the hidden parameters?

Parameter	What it means	Range	Example
VPIP	Voluntarily Put $ In Pot — how loose they play	0–1	0.22 = tight, 0.45 = loose
PFR	Preflop Raise Rate — raise vs limp	0–1	0.18 = selective, 0.28 = aggressive
AGG	Aggression Factor — raise-to-call ratio postflop	0+	0.5 = passive, 2.2 = aggressive
BLUFF	Bluff Frequency — raises with weak hands	0–1	0.05 = honest, 0.40 = deceptive

These four numbers define a player’s type. A TAG (tight-aggressive) has low VPIP, moderate PFR, high AGG. A calling station has high VPIP, low PFR, low AGG — they call everything but rarely raise. A LAG (loose-aggressive) has high everything.

The challenge: you don’t see the parameters. You see folds, calls, and raises. The Bayesian model inverts this — from actions to parameters.

The Hierarchical Structure

Why hierarchical? Because players at the same stake level share characteristics. A $1/$2 table has a typical distribution of VPIPs. A new player at that table probably has a VPIP near the table average. The hierarchical model captures this: population-level hyperparameters (μ and σ for each parameter) act as informative priors for individual players.

Population:   μ_VPIP, σ_VPIP, μ_PFR, σ_PFR, μ_AGG, σ_AGG
                  ↓
Player i:     VPIP_i ~ logit⁻¹(μ_VPIP + σ_VPIP * z_i)   (NCP)
                  ↓
Observation:  action_ij ~ Softmax(VPIP_i, PFR_i, AGG_i, BLUFF_i, hand_strength_j)

The NCP (non-centered parameterization) is critical. Without it, the posterior has funnel geometry that NUTS navigates poorly. With NCP, the sampler explores efficiently even at 20 dimensions.

Meet the Table

Five archetypal players. In real poker, you’d observe their actions over 50-100 hands before forming a read. Let’s simulate that.

archetypes = Simulator.archetypes()

archetype_table =
  Enum.map(archetypes, fn p ->
    %{
      "Type" => to_string(p.label),
      "VPIP" => p.vpip,
      "PFR" => p.pfr,
      "AGG" => p.agg,
      "BLUFF" => p.bluff
    }
  end)

Kino.DataTable.new(archetype_table, name: "Player Archetypes")

Example 1: Simulate 80 Hands

We pick 3 opponents (TAG, LAG, Calling Station) and simulate 80 hands each. The simulator draws random hand strengths and samples actions from the softmax model. This is the “data generation” step — in real poker, this is what you observe over the first two hours at the table.

players = [
  %{vpip: 0.22, pfr: 0.18, agg: 1.8, bluff: 0.25, label: :tag},
  %{vpip: 0.35, pfr: 0.28, agg: 2.2, bluff: 0.40, label: :lag},
  %{vpip: 0.45, pfr: 0.08, agg: 0.5, bluff: 0.05, label: :station}
]

:rand.seed(:exsss, 42)
{observations, true_params} = Simulator.simulate(players, 80)

# Action distribution per player
action_data =
  Enum.zip(players, observations)
  |> Enum.flat_map(fn {player, obs} ->
    freqs = Enum.frequencies(obs.actions)
    total = length(obs.actions)

    [
      %{"Player" => to_string(player.label), "Action" => "Fold", "Count" => Map.get(freqs, 0, 0), "Pct" => Map.get(freqs, 0, 0) / total * 100},
      %{"Player" => to_string(player.label), "Action" => "Call", "Count" => Map.get(freqs, 1, 0), "Pct" => Map.get(freqs, 1, 0) / total * 100},
      %{"Player" => to_string(player.label), "Action" => "Raise", "Count" => Map.get(freqs, 2, 0), "Pct" => Map.get(freqs, 2, 0) / total * 100}
    ]
  end)

Vl.new(width: 500, height: 250, title: "Observed Action Frequencies (80 hands)")
|> Vl.data_from_values(action_data)
|> Vl.mark(:bar)
|> Vl.encode_field(:x, "Player", type: :nominal)
|> Vl.encode_field(:y, "Pct", type: :quantitative, title: "% of hands")
|> Vl.encode_field(:color, "Action", type: :nominal, scale: %{domain: ["Fold", "Call", "Raise"], range: ["#e45756", "#54a24b", "#4c78a8"]})
|> Vl.encode_field(:x_offset, "Action", type: :nominal)

What to notice: The TAG folds the most (tight). The LAG raises the most (aggressive). The calling station calls the most (passive). These are the surface patterns. NUTS will recover the hidden parameters that produce them.

The Action Model

Before running inference, let’s visualize what the softmax action model looks like for different player types. Given a hand strength (0 = garbage, 1 = nuts), the model outputs P(fold), P(call), P(raise).

The softmax transforms raw utilities into probabilities:

Fold utility ∝ (1 - VPIP) × (1 - hand_strength)
Call utility ∝ VPIP × (1 - PFR) × hand_strength
Raise utility ∝ PFR × (AGG × hand_strength + BLUFF × (1 - hand_strength))

The BLUFF parameter is key: it adds raise probability even with weak hands. A player with BLUFF = 0.40 will sometimes raise with garbage — making them unpredictable and hard to play against.

hs_range = for h <- 0..100, do: h / 100

model_curves =
  Enum.flat_map(players, fn p ->
    Enum.flat_map(hs_range, fn hs ->
      {pf, pc, pr} = ActionModel.action_probs(p.vpip, p.pfr, p.agg, p.bluff, hs)
      [
        %{"Player" => to_string(p.label), "Hand Strength" => hs, "Probability" => pf, "Action" => "Fold"},
        %{"Player" => to_string(p.label), "Hand Strength" => hs, "Probability" => pc, "Action" => "Call"},
        %{"Player" => to_string(p.label), "Hand Strength" => hs, "Probability" => pr, "Action" => "Raise"}
      ]
    end)
  end)

child_curves =
  Vl.new(width: 250, height: 200)
  |> Vl.mark(:line, stroke_width: 2)
  |> Vl.encode_field(:x, "Hand Strength", type: :quantitative)
  |> Vl.encode_field(:y, "Probability", type: :quantitative)
  |> Vl.encode_field(:color, "Action",
    type: :nominal,
    scale: %{domain: ["Fold", "Call", "Raise"], range: ["#e45756", "#54a24b", "#4c78a8"]}
  )

Vl.new(title: "Action Probability vs Hand Strength")
|> Vl.data_from_values(model_curves)
|> Vl.facet([column: [field: "Player", type: :nominal]], child_curves)

What to notice: The TAG’s curves cross at hand_strength ~0.3 (selective). The calling station’s fold line is low and flat (calls almost everything). The LAG’s raise line stays high even at low hand strengths (bluffs frequently).

Run Bayesian Inference

Now the main event: feed 80 hands of observations into a hierarchical Bayesian model and let NUTS recover the hidden player parameters.

The model has 8 population hyperparams (μ and σ for VPIP, PFR, AGG, BLUFF) plus 4 raw NCP params per player = 20 dimensions.

NUTS navigates this 20-dimensional posterior with adaptive step size and mass matrix. The NCP reparameterization avoids the funnel geometry that would otherwise trap the sampler.

{ir, _data} = OpponentModel.build(observations)
init = OpponentModel.init_values(3)

{trace, stats} =
  Exmc.Sampler.sample(ir, init,
    num_samples: 500,
    num_warmup: 500,
    seed: 42,
    ncp: false
  )

profiles = OpponentModel.extract_profiles(trace, 3)

IO.puts("Sampling complete:")
IO.puts("  Divergences: #{stats.divergences}")
IO.puts("  Step size: #{Float.round(stats.step_size, 4)}")
IO.puts("  Samples: #{stats.num_samples}")
:ok

True vs Recovered Parameters

The moment of truth — how well did NUTS recover the hidden player types from 80 hands of noisy behavioral data?

A perfect recovery is impossible — 80 hands is not enough data to pin down 4 parameters exactly. But the posterior means should be close, and the uncertainty should reflect how much information 80 hands actually contains.

comparison_data =
  Enum.zip([players, profiles])
  |> Enum.with_index()
  |> Enum.flat_map(fn {{true_p, profile}, _i} ->
    label = to_string(true_p.label)

    vpip_mean = profile.vpip |> Nx.mean() |> Nx.to_number()
    pfr_mean = profile.pfr |> Nx.mean() |> Nx.to_number()
    agg_mean = profile.agg |> Nx.mean() |> Nx.to_number()
    bluff_mean = profile.bluff |> Nx.mean() |> Nx.to_number()

    [
      %{"Player" => label, "Param" => "VPIP", "True" => true_p.vpip, "Posterior Mean" => vpip_mean},
      %{"Player" => label, "Param" => "PFR", "True" => true_p.pfr, "Posterior Mean" => pfr_mean},
      %{"Player" => label, "Param" => "AGG", "True" => true_p.agg, "Posterior Mean" => agg_mean},
      %{"Player" => label, "Param" => "BLUFF", "True" => true_p.bluff, "Posterior Mean" => bluff_mean}
    ]
  end)

Kino.DataTable.new(comparison_data, name: "Parameter Recovery")

Posterior Distributions

The posterior gives us not just point estimates but full uncertainty. A player you’ve only seen 80 hands from has wide posteriors — you’re uncertain. More hands = tighter posteriors. This is the Bayesian advantage over frequentist HUD stats: you know how much you know.

posterior_data =
  Enum.zip(players, profiles)
  |> Enum.flat_map(fn {true_p, profile} ->
    label = to_string(true_p.label)

    vpip_samples = Nx.to_flat_list(profile.vpip)
    pfr_samples = Nx.to_flat_list(profile.pfr)
    agg_samples = Nx.to_flat_list(profile.agg)
    bluff_samples = Nx.to_flat_list(profile.bluff)

    Enum.flat_map(Enum.zip([vpip_samples, pfr_samples, agg_samples, bluff_samples]), fn {v, p, a, b} ->
      [
        %{"Player" => label, "Param" => "VPIP", "Value" => v},
        %{"Player" => label, "Param" => "PFR", "Value" => p},
        %{"Player" => label, "Param" => "AGG", "Value" => min(a, 5.0)},
        %{"Player" => label, "Param" => "BLUFF", "Value" => b}
      ]
    end)
  end)

true_markers =
  Enum.flat_map(players, fn p ->
    [
      %{"Player" => to_string(p.label), "Param" => "VPIP", "True" => p.vpip},
      %{"Player" => to_string(p.label), "Param" => "PFR", "True" => p.pfr},
      %{"Player" => to_string(p.label), "Param" => "AGG", "True" => p.agg},
      %{"Player" => to_string(p.label), "Param" => "BLUFF", "True" => p.bluff}
    ]
  end)

hist =
  Vl.new(width: 160, height: 120)
  |> Vl.data_from_values(posterior_data)
  |> Vl.mark(:bar, opacity: 0.7)
  |> Vl.encode_field(:x, "Value", type: :quantitative, bin: %{maxbins: 25})
  |> Vl.encode_field(:y, "Value", type: :quantitative, aggregate: :count, title: "Count")
  |> Vl.encode_field(:color, "Player", type: :nominal)

rules =
  Vl.new()
  |> Vl.data_from_values(true_markers)
  |> Vl.mark(:rule, color: "red", stroke_width: 2, stroke_dash: [4, 4])
  |> Vl.encode_field(:x, "True", type: :quantitative)

child_posterior =
  Vl.new(width: 160, height: 120)
  |> Vl.layers([hist, rules])

Vl.new(title: "Posterior Distributions (red dashed = true value)")
|> Vl.facet(
  [column: [field: "Param", type: :nominal], row: [field: "Player", type: :nominal]],
  child_posterior
)
|> Vl.resolve(:scale, x: :independent)

What to notice: The red dashed lines are the true values. Wide histograms mean high uncertainty. BLUFF is the hardest to estimate — you need many hands with weak holdings to distinguish a bluff from a value bet.

Make a Decision

You’re holding A♥ K♥. The board shows T♦ J♦ 2♣. You have an open-ended straight draw (any Q gives you the nuts). The pot is 100, and the TAG player bets 30.

What should you do? The decision engine uses your full posterior over the TAG’s parameters to compute expected value. Not a point estimate — an integral over all plausible opponent types, weighted by their posterior probability.

my_hole = [Cards.parse("Ah"), Cards.parse("Kh")]
board = [Cards.parse("Td"), Cards.parse("Jd"), Cards.parse("2c")]

tag_profile = Enum.at(profiles, 0)

decision = Decision.expected_value(my_hole, board, tag_profile, 100, 30)

IO.puts(Decision.format_decision(decision))

Example 2: Same Hand, Different Opponent

Same hand, same board, but now the calling station (Player 2) bet 30. Calling stations rarely bluff — when they bet, they usually have something. But they also call too much, making our raises more profitable when we hit.

The Bayesian advantage: the decision changes not because the cards changed, but because the posterior over the opponent changed. A TAG betting 30 into 100 might be bluffing 25% of the time. A calling station betting the same amount might be bluffing 5%. The EV calculation integrates over these different posterior beliefs.

station_profile = Enum.at(profiles, 2)

decision_vs_station = Decision.expected_value(my_hole, board, station_profile, 100, 30)

IO.puts("vs Calling Station:")
IO.puts(Decision.format_decision(decision_vs_station))

IO.puts("\nCompare: same hand, same board, different opponent →")
IO.puts("  vs TAG:     #{decision.recommended}")
IO.puts("  vs Station: #{decision_vs_station.recommended}")
IO.puts("The posterior, not the cards, determines the decision.")

OTP Mode: The Table as a Process

The poker table is a GenServer. Each call to play/2 simulates more hands, accumulating observations. profile/1 runs NUTS on the full history.

This is the OTP angle Python frameworks can’t match — the table maintains state across interactions, and multiple tables can run concurrently as independent processes with isolated heaps. A poker room with 20 tables running simultaneous NUTS inference: 20 GenServers, 20 posteriors, zero shared state.

{:ok, table} = Poker.Table.start_link(players)

# Play in batches (as you would at a real table)
Poker.Table.play(table, 40)
IO.inspect(Poker.Table.status(table), label: "After 40 hands")

Poker.Table.play(table, 40)
IO.inspect(Poker.Table.status(table), label: "After 80 hands")

# Profile with NUTS
{:ok, table_profiles, table_stats} = Poker.Table.profile(table, num_samples: 300, num_warmup: 300)

IO.puts("\nTable profiling complete (#{table_stats.divergences} divergences)")

# Decision via the table process
{:ok, table_decision} =
  Poker.Table.decide(table,
    [Cards.parse("Qh"), Cards.parse("Qs")],
    [Cards.parse("7d"), Cards.parse("8d"), Cards.parse("3c")],
    80, 20)

IO.puts("\nQ♥Q♠ on 7♦8♦3♣, pot=80, to_call=20:")
IO.puts(Decision.format_decision(table_decision))

Population Hyperparameters

The hierarchical model learns about the population too. These hyperparams describe what “typical players at this stake” look like. With more players at the table, these tighten — giving better priors for a new unknown player.

This is the cold-start problem solved: when a new player sits down, you don’t start from a flat prior. You start from the population posterior — what you’ve learned about this stake level from all the players you’ve observed.

hyper_data =
  [
    {"mu_vpip", "Mean VPIP (logit)", &Nx.to_flat_list/1},
    {"mu_pfr", "Mean PFR (logit)", &Nx.to_flat_list/1},
    {"mu_agg", "Mean AGG (log)", &Nx.to_flat_list/1},
    {"sigma_vpip", "SD VPIP", &Nx.to_flat_list/1},
    {"sigma_pfr", "SD PFR", &Nx.to_flat_list/1},
    {"sigma_agg", "SD AGG", &Nx.to_flat_list/1}
  ]
  |> Enum.flat_map(fn {key, label, to_list} ->
    samples = to_list.(trace[key])
    Enum.map(samples, fn v -> %{"Param" => label, "Value" => v} end)
  end)

child_hist =
  Vl.new(width: 200, height: 120)
  |> Vl.mark(:bar, opacity: 0.7, color: "#4c78a8")
  |> Vl.encode_field(:x, "Value", type: :quantitative, bin: %{maxbins: 25})
  |> Vl.encode_field(:y, "Value", type: :quantitative, aggregate: :count, title: "Count")

Vl.new(title: "Population Hyperparameters")
|> Vl.data_from_values(hyper_data)
|> Vl.facet([column: [field: "Param", type: :nominal]], child_hist)
|> Vl.resolve(:scale, x: :independent)

Hand Evaluation Demo

The cards module handles full Texas Hold’em hand evaluation via Monte Carlo.

# Deal a random hand
deck = Cards.deck() |> Enum.shuffle()
{[c1, c2], rest} = Enum.split(deck, 2)
{board, _} = Enum.split(rest, 5)

IO.puts("Hole cards: #{Cards.card_name(c1)} #{Cards.card_name(c2)}")
IO.puts("Board: #{Enum.map(board, &Cards.card_name/1) |> Enum.join(" ")}")

{category, kickers} = Cards.evaluate_7([c1, c2 | board])
hand_names = ~w(High\ Card Pair Two\ Pair Trips Straight Flush Full\ House Quads Straight\ Flush)
IO.puts("Best hand: #{Enum.at(hand_names, category)}")

flop = Enum.take(board, 3)
equity = Cards.hand_strength([c1, c2], flop, 1000)
IO.puts("Flop equity: #{Float.round(equity * 100, 1)}%")

Study Guide

Key Concepts

Latent variable inference: The parameters (VPIP, PFR, AGG, BLUFF) are hidden. We observe only the actions they produce. NUTS inverts this mapping.
Hierarchical modeling: Population hyperparameters share information across players. A new player at the table inherits the population prior instead of starting from ignorance.
Non-centered parameterization (NCP): The transformation θ_i = μ + σ * z_i where z_i ~ N(0,1) avoids the funnel geometry that traps NUTS in hierarchical models. Without NCP, the sampler gets stuck.
Softmax action model: Maps continuous utilities to discrete action probabilities. The same architecture used in neural networks for classification — but here the inputs are interpretable parameters.
Decision under uncertainty: The expected value computation integrates over the full posterior, not just the MAP estimate. When you’re uncertain about whether the opponent bluffs, your EV calculation accounts for both possibilities.
Cold-start via hierarchical priors: When a new player sits down, the population posterior provides an informative starting point. After 20 hands, the individual posterior starts dominating. This is the Bayesian solution to the exploration-exploitation trade-off.

Exercises

Change the sample size: Run with 40 hands instead of 80. How much wider are the posteriors? At what hand count do the 95% credible intervals exclude the true value?
Add a fourth player: Create a “maniac” (vpip: 0.65, pfr: 0.45, agg: 3.0, bluff: 0.55). Does the population posterior shift?
Misspecified prior: Change the population prior to be very tight (σ = 0.1). Does the model still recover the calling station’s extreme parameters?
Decision sensitivity: For the A♥K♥ hand, plot EV as a function of pot size (20 to 200) against each opponent type. Where do the lines cross?
Warm-starting: Run NUTS on the first 40 hands, then re-run with warm_start: stats on all 80. How much faster is the second run?

What’s Next

This demo shows the core loop: observe → infer → decide. Extensions:

Position-aware model: add position (early/mid/late) as a covariate. Early-position raises are stronger than late-position raises.
Street-specific params: different aggression on flop/turn/river. Many players are aggressive on the flop and passive on the river.
Warm-starting: re-run NUTS with previous posterior as warm_start after each orbit. The mass matrix carries over — 5.8x faster.
Multi-table: concurrent GenServer tables sharing population hyperparams via ETS. The BEAM runs 20 simultaneous NUTS inferences with zero contention.
Live visualization: pipe posteriors to ExmcViz via sample_stream. Watch the opponent profile update in real time as hands are played.