Notesclub

created by hec & contributors

terms privacy

Shape drift — PSI, Jensen–Shannon, and earth-mover distance

06_shape_distribution_distances.livemd

Lars Wikman

@lawik

mobius_smarts

Share to X

Share to Bluesky

More notebooks

Shape drift — PSI, Jensen–Shannon, and earth-mover distance

Mix.install([
  {:mobius_smarts, path: Path.expand("..", __DIR__)},
  {:kino, "~> 0.14"},
  {:kino_vega_lite, "~> 0.1"}
])

alias VegaLite, as: Vl
alias MobiusSmarts.Detect.Shape
alias Mobius.DDSketch

A device that “looks fine” and isn’t

Here is the situation every other detector in this stack misses.

A device reports request latency. For weeks it is one steady hum: most requests land around 20 ms, with some natural spread. Then a cooling fan starts to fail. The CPU begins thermal-throttle cycling — it runs cool and fast for a while, then overheats and crawls, then cools again, over and over. The single hum splits into two alternating regimes: a fast one near 13 ms and a slow, throttled one near 30 ms.

Users feel it. But here is the trap: we are about to engineer the two states so that the average and the spread barely move. A distribution is the full picture of which values happen and how often. The average (mean) and the spread (standard deviation, “std”) are just two numbers squeezed out of that picture — and infinitely many differently-shaped distributions share the same two numbers.

Let’s generate both states. We seed the random generator first so re-running this single cell always gives the same data.

:rand.seed(:exsss, {1, 2, 3})

# Baseline: one steady hum around 20 ms, moderate spread.
baseline = for _ <- 1..6000, do: 20.0 + 6.5 * :rand.normal()

# Current: thermal-throttle cycling. 60% of requests in a cool, tight
# regime near 13 ms; 40% in a slow, throttled regime near 30 ms.
current =
  for _ <- 1..6000 do
    if :rand.uniform() < 0.6 do
      13.0 + 2.5 * :rand.normal()
    else
      30.0 + 4.0 * :rand.normal()
    end
  end

{length(baseline), length(current)}

Now the summary table. A percentile is the value below which a given fraction of requests fall: the p50 (median) is the middle value, the p99 is the value only the slowest 1% exceed.

mean = fn xs -> Enum.sum(xs) / length(xs) end

std = fn xs ->
  m = mean.(xs)
  :math.sqrt(Enum.sum(Enum.map(xs, fn x -> (x - m) * (x - m) end)) / length(xs))
end

pctile = fn xs, p ->
  s = Enum.sort(xs)
  Enum.at(s, round(p * (length(s) - 1)))
end

summary = fn label, xs ->
  %{
    "series" => label,
    "mean" => Float.round(mean.(xs), 1),
    "std_dev" => Float.round(std.(xs), 1),
    "p50" => Float.round(pctile.(xs, 0.50) * 1.0, 1),
    "p99" => Float.round(pctile.(xs, 0.99) * 1.0, 1)
  }
end

Kino.DataTable.new([summary.("baseline", baseline), summary.("current", current)])

Read that table slowly. The mean is nearly identical (20.0 vs 19.7) and the std is close (6.5 vs 9.0) — yet the median has fallen from 20 ms to about 15 ms, and the p99 crept up. The Jump, Shift, and Drift detectors watch exactly the mean and the std. They are not bad at seeing this; they are silent by construction, because the two numbers they read barely changed.

Now look at the actual shape.

# Bin both states onto the SAME axis: count how many requests land in
# each fixed-width slice. This mirrors what a Mobius DDSketch histogram
# stores — values pre-aggregated into bins.
lo = 0.0
hi = 45.0
nbins = 18
bin_width = (hi - lo) / nbins

bin_counts = fn xs ->
  Enum.reduce(xs, List.duplicate(0, nbins), fn x, acc ->
    i = min(nbins - 1, max(0, trunc((x - lo) / bin_width)))
    List.update_at(acc, i, &(&1 + 1))
  end)
end

bin_centers = for i <- 0..(nbins - 1), do: lo + (i + 0.5) * bin_width

baseline_counts = bin_counts.(baseline)
current_counts = bin_counts.(current)

hist_rows =
  for {c, label, counts} <- [
        {nil, "baseline", baseline_counts},
        {nil, "current", current_counts}
      ],
      {center, count} <- Enum.zip(bin_centers, counts),
      _ = c do
    %{"ms" => center, "count" => count, "series" => label}
  end

Vl.new(width: 700, height: 300, title: "Same mean and std, two completely different shapes")
|> Vl.data_from_values(hist_rows)
|> Vl.mark(:bar, opacity: 0.6)
|> Vl.encode_field(:x, "ms", type: :quantitative, bin: %{step: bin_width})
|> Vl.encode_field(:y, "count", type: :quantitative, stack: nil)
|> Vl.encode_field(:color, "series", type: :nominal)

What to look at: the orange baseline is one bump; the blue current is two bumps with an empty valley right where the old typical value lived. No request now experiences the “average” latency of 20 ms — yet the average is still 20 ms. That valley is what the moments cannot see, and it is exactly what we need three purpose-built distances to measure.

From counts to proportions

All three distances compare proportions, not raw counts: “what fraction of requests landed in each bin.” Dividing by the total makes two histograms with different sample counts comparable.

proportions = fn counts ->
  total = Enum.sum(counts)
  Enum.map(counts, &(&1 / total))
end

p = proportions.(baseline_counts)
q = proportions.(current_counts)

prop_rows =
  for {center, pi, qi} <- Enum.zip([bin_centers, p, q]),
      {label, val} <- [{"baseline", pi}, {"current", qi}] do
    %{"ms" => center, "proportion" => Float.round(val, 4), "series" => label}
  end

Vl.new(width: 700, height: 250, title: "Fraction of requests in each bin")
|> Vl.data_from_values(prop_rows)
|> Vl.mark(:bar, opacity: 0.6)
|> Vl.encode_field(:x, "ms", type: :quantitative, bin: %{step: bin_width})
|> Vl.encode_field(:y, "proportion", type: :quantitative, stack: nil)
|> Vl.encode_field(:color, "series", type: :nominal)

Same shape as before, now on a 0–1 scale. Every bin’s p (baseline fraction) and q (current fraction) is what the formulas below chew on.

PSI — the number you can cite

Population Stability Index asks: how much have the bin proportions reshuffled? Its formula sums one contribution per bin:

PSI = sum over bins of (q − p) · ln(q / p)

In words: for each bin, take how much its fraction changed (q − p), multiply by the log-ratio of the two fractions, and add them all up. Bins that swung hard in proportion contribute the most. Let’s compute the per-bin contributions by hand so we can see inside the score.

eps = 1.0e-4

# Empty bins would make ln(q/p) blow up to infinity, so PSI floors each
# proportion at a tiny eps. Shape.psi does this internally too.
psi_terms =
  for {pi, qi} <- Enum.zip(p, q) do
    pf = max(pi, eps)
    qf = max(qi, eps)
    (qf - pf) * :math.log(qf / pf)
  end

psi_rows =
  for {center, term} <- Enum.zip(bin_centers, psi_terms) do
    %{"ms" => center, "contribution" => Float.round(term, 4)}
  end

Vl.new(width: 700, height: 250, title: "Per-bin PSI contribution — which bins drive the score")
|> Vl.data_from_values(psi_rows)
|> Vl.mark(:bar)
|> Vl.encode_field(:x, "ms", type: :quantitative, bin: %{step: bin_width})
|> Vl.encode_field(:y, "contribution", type: :quantitative)

What to look at: the two new peaks (~13 ms and ~30 ms gained mass) and the drained middle (~20 ms lost mass) are the tall bars — those bins are the drift. The sum of every bar is the PSI, and it matches the library function exactly.

psi_by_hand = Enum.sum(psi_terms)
psi_lib = Shape.psi(baseline_counts, current_counts)

%{
  "PSI summed by hand" => Float.round(psi_by_hand, 4),
  "Shape.psi" => Float.round(psi_lib, 4)
}

Both give 1.255. PSI’s real superpower is social, not mathematical. Decades of use in credit-scoring mean its thresholds are conventions you can cite in a postmortem instead of a number you invented on the spot: below 0.1 is stable, 0.1–0.25 is drifting (watch it), above 0.25 has shifted (act). At 1.255 this device is far past “act.”

Jensen–Shannon — the universal ruler

PSI is unbounded, which makes it awkward to compare across metrics. JS divergence asks a related question — how distinguishable are these two distributions? — but answers on a fixed scale from 0 (identical) up to ln 2 ≈ 0.693 (no overlap at all). Two tiny examples pin the ends of the ruler:

%{
  "identical distributions" => Shape.js_divergence([10, 20, 10], [10, 20, 10]),
  "fully disjoint distributions" => Float.round(Shape.js_divergence([100, 0], [0, 100]), 4),
  "ln 2 (the ceiling)" => Float.round(:math.log(2), 4)
}

Identical gives exactly 0.0; non-overlapping gives 0.6931, which is ln 2. Because the scale never depends on the metric’s units, JS scores are directly comparable across metrics. That is its real job: ranking which of fifty metrics drifted most. Here are five synthetic metrics against their baselines, sorted into a leaderboard.

:rand.seed(:exsss, {4, 4, 4})

base_hist = bin_counts.(for _ <- 1..6000, do: 20.0 + 5.0 * :rand.normal())

metric_currents = [
  {"cpu.temp", for(_ <- 1..6000, do: 20.5 + 5.0 * :rand.normal())},
  {"disk.iowait", for(_ <- 1..6000, do: 23.0 + 5.0 * :rand.normal())},
  {"net.rtt", for(_ <- 1..6000, do: 20.0 + 8.0 * :rand.normal())},
  {"mem.pressure", for(_ <- 1..6000, do: 27.0 + 5.0 * :rand.normal())},
  {"req.latency",
   for _ <- 1..6000 do
     if(:rand.uniform() < 0.5, do: 12.0 + 2.5 * :rand.normal(), else: 29.0 + 3.0 * :rand.normal())
   end}
]

leaderboard =
  metric_currents
  |> Enum.map(fn {name, xs} ->
    js = Shape.js_divergence(base_hist, bin_counts.(xs))
    %{"metric" => name, "JS divergence" => Float.round(js, 3)}
  end)
  |> Enum.sort_by(& &1["JS divergence"], :desc)

Kino.DataTable.new(leaderboard)

What to look at: one sweep, one comparable column, sorted worst-first. req.latency (the bimodal split) and mem.pressure (a clean shift) top the board; the barely-moved cpu.temp sits near zero. You did not have to know each metric’s units to rank them.

Earth-mover — the distance you put in an alert

The most human of the three. Picture each histogram as piles of earth along the value axis. The earth-mover’s distance (formally the first Wasserstein distance, moved_by) asks: how much earth must move, and how far, to reshape the baseline pile into the current one? The answer comes out in the metric’s own units — milliseconds here. The trivial case makes that literal:

# All the mass sits at 10 ms, then all of it moves to 13 ms.
# It moved by exactly 3.
Shape.moved_by([100, 0, 0, 0], [0, 0, 0, 100], [10.0, 11.0, 12.0, 13.0])

Exactly 3.0. The way to compute it without moving imaginary dirt is through the CDF — the cumulative distribution function, “what fraction of requests are at or below this value.” The earth-mover distance is the area between the two CDF curves:

W1 = sum over bins of |CDF_baseline − CDF_current| · (next value − this value)

Let’s draw both CDFs and shade the gap.

cumsum = fn xs -> Enum.scan(xs, &(&1 + &2)) end

cdf_b = cumsum.(p)
cdf_c = cumsum.(q)

cdf_rows =
  for {center, cb, cc} <- Enum.zip([bin_centers, cdf_b, cdf_c]) do
    %{"ms" => center, "baseline" => cb, "current" => cc}
  end

gap_area =
  Vl.new()
  |> Vl.mark(:area, opacity: 0.25, color: "purple")
  |> Vl.encode_field(:x, "ms", type: :quantitative)
  |> Vl.encode_field(:y, "baseline", type: :quantitative, title: "fraction at or below")
  |> Vl.encode_field(:y2, "current")

line_b =
  Vl.new()
  |> Vl.mark(:line, color: "orange")
  |> Vl.encode_field(:x, "ms", type: :quantitative)
  |> Vl.encode_field(:y, "baseline", type: :quantitative)

line_c =
  Vl.new()
  |> Vl.mark(:line, color: "steelblue")
  |> Vl.encode_field(:x, "ms", type: :quantitative)
  |> Vl.encode_field(:y, "current", type: :quantitative)

Vl.new(width: 700, height: 300, title: "Shaded area between the two CDFs = earth-mover distance")
|> Vl.data_from_values(cdf_rows)
|> Vl.layers([gap_area, line_b, line_c])

What to look at: the blue current curve climbs early (lots of fast requests) then stalls across the empty valley before catching up — the purple area between the curves is the whole distance, in milliseconds.

moved_ms = Shape.moved_by(baseline_counts, current_counts, bin_centers)
"The latency distribution moved by about #{Float.round(moved_ms, 1)} ms."

That sentence — “moved right by about 3.4 ms” — is the one to drop straight into an alert message. No one needs to know what a divergence of 0.14 means; everyone understands milliseconds.

Three drifts, three lenses

So why keep all three? Because they disagree in useful ways. Here are three different things that can go wrong, built from one baseline: the distribution (a) shifts wholesale to the right, (b) keeps its center but widens, or (c) splits into a symmetric two-peaked shape. We score each with all three distances.

:rand.seed(:exsss, {7, 7, 7})

scen_base = for _ <- 1..6000, do: 20.0 + 5.0 * :rand.normal()
scen_base_counts = bin_counts.(scen_base)

scenarios = [
  {"(a) shift right +8 ms", for(_ <- 1..6000, do: 28.0 + 5.0 * :rand.normal())},
  {"(b) widen, same center", for(_ <- 1..6000, do: 20.0 + 9.0 * :rand.normal())},
  {"(c) split, same center",
   for _ <- 1..6000 do
     if(:rand.uniform() < 0.5, do: 11.0 + 2.5 * :rand.normal(), else: 29.0 + 2.5 * :rand.normal())
   end}
]

scenario_rows =
  for {name, xs} <- scenarios do
    cb = bin_counts.(xs)

    %{
      "scenario" => name,
      "mean" => Float.round(mean.(xs), 1),
      "std_dev" => Float.round(std.(xs), 1),
      "PSI" => Float.round(Shape.psi(scen_base_counts, cb), 2),
      "JS" => Float.round(Shape.js_divergence(scen_base_counts, cb), 3),
      "moved_by (ms)" => Float.round(Shape.moved_by(scen_base_counts, cb, bin_centers), 2)
    }
  end

Kino.DataTable.new(scenario_rows)

What to look at, row by row. The shift has the biggest moved_by (≈ 8 ms — earth literally travelled 8 ms to the right). The split has a moderate moved_by (≈ 5 ms) because mass moved both left and right and the trips partly cancel — but its PSI and JS are the largest of all three, because they count reshuffling regardless of direction. That is the whole reason there are three: moved_by measures net displacement in real units (great for “how bad, in ms”), while PSI and JS measure how scrambled the shape is (great for “did the shape change at all”).

Try it: how far apart are the two regimes?

Drag the slider to set the gap between the cool and throttled regimes, then re-run the two cells below. A bigger gap is a worse fan failure.

gap_input = Kino.Input.range("Regime gap (ms)", min: 4.0, max: 30.0, default: 17.0, step: 1.0)

gap = Kino.Input.read(gap_input)

:rand.seed(:exsss, {9, 9, 9})
slider_base = for _ <- 1..6000, do: 21.0 + 6.0 * :rand.normal()

center = 21.0
cool_peak = center - gap / 2
hot_peak = center + gap / 2

slider_current =
  for _ <- 1..6000 do
    if(:rand.uniform() < 0.6,
      do: cool_peak + 2.5 * :rand.normal(),
      else: hot_peak + 4.0 * :rand.normal()
    )
  end

sb_counts = bin_counts.(slider_base)
sc_counts = bin_counts.(slider_current)

%{
  "regime gap (ms)" => gap,
  "PSI" => Float.round(Shape.psi(sb_counts, sc_counts), 2),
  "JS" => Float.round(Shape.js_divergence(sb_counts, sc_counts), 3),
  "moved_by (ms)" => Float.round(Shape.moved_by(sb_counts, sc_counts, bin_centers), 2)
}

At the default gap of 17 ms you get roughly PSI 1.45 / JS 0.16 / moved_by 3.9 ms. Widen the gap and all three climb together; narrow it and they collapse toward zero as the two regimes merge back into one hum.

Where the histograms come from in production

We binned raw samples by hand to see the mechanics, but on a real device Mobius has already done this for you: it stores a Mobius.DDSketch histogram per metric — values pre-aggregated into bins at observation time. Shape.from_sketches/2 lines two sketches up on a shared bin axis so the three functions apply directly. Here is the latency story driven through real sketches instead of hand-rolled bins.

:rand.seed(:exsss, {1, 2, 3})

# Latency is positive; feed the same baseline/current shapes into sketches.
to_sketch = fn xs ->
  Enum.reduce(xs, DDSketch.new(relative_accuracy: 0.05), fn x, s ->
    DDSketch.insert(s, max(x, 0.1))
  end)
end

sketch_base = to_sketch.(for _ <- 1..6000, do: 20.0 + 6.5 * :rand.normal())

sketch_curr =
  to_sketch.(
    for _ <- 1..6000 do
      if(:rand.uniform() < 0.6, do: 13.0 + 2.5 * :rand.normal(), else: 30.0 + 4.0 * :rand.normal())
    end
  )

aligned = Shape.from_sketches(sketch_base, sketch_curr)

%{
  "bins on shared axis" => Nx.size(aligned.values),
  "PSI" => Float.round(Shape.psi(aligned.baseline, aligned.current), 3),
  "JS" => Float.round(Shape.js_divergence(aligned.baseline, aligned.current), 3),
  "moved_by (ms)" => Float.round(Shape.moved_by(aligned.baseline, aligned.current, aligned.values), 2),
  "baseline p99 (sketch)" => Float.round(DDSketch.quantile(sketch_base, 0.99), 1),
  "current p99 (sketch)" => Float.round(DDSketch.quantile(sketch_curr, 0.99), 1)
}

The sketch-driven scores (PSI ≈ 1.26, JS ≈ 0.14, moved_by ≈ 3.4 ms) land right on top of the hand-binned ones — same story, no manual binning. In production, a trailing baseline sketch is just bin addition upstream, since DDSketch bins are mergeable.

The blind spot

Shape distances tell you that the shape changed and how much — they do not tell you when it changed. A PSI of 1.3 over a one-week window could be a slow slide or a step that happened last Tuesday. For the when, pair these with the Changepoint detector (notebook 05). They also need enough samples per histogram to be stable: a sketch built from a handful of requests will rattle with noise and cry drift where there is none. Give each side a healthy count before you trust the number you put in the alert.

Other notebooks:

@TomBers

livebookNotes

Attractors

attractors.livemd

advanced data-science decimal vega_lite kino

2022-8-18
Kevin Pan
@feng19

spider_man

ElixirJobs

elixirjobs.livemd

tutorial advanced spider_man floki nimble_csv kino

2022-8-18
@TomBers

livebookNotes

Fun with Graphs

graphs.livemd

tutorial advanced intermediate vega_lite kino math

2022-8-18
@TomBers

livebookNotes

Epicycloid - draw Curves with Straight Lines

Epicycloid.livemd

tutorial intermediate vega_lite kino math

2022-8-18
@ohisama

ohi_Livebook

ecto

ecto.livemd

tutorial intermediate sql data-structures ecto_sql ecto kino ecto_sqlite3

2026-7-7
Andrew Dryga
@AndrewDryga

emisar

Executive Overview

01-executive-overview.livemd

tutorial advanced sql postgrex kino kino_vega_lite

2026-7-20
Yann VERY
@yannvery

engie_visualisator

Engie visualisator

engie_visualisator.livemd

tutorial advanced data-science vega_lite kino_vega_lite jason kino_explorer

2026-7-7

Back