Notesclub

created by hec & contributors

terms privacy

Outlier detection — Isolation Forest

08_outlier_isolation_forest.livemd

Lars Wikman

@lawik

mobius_smarts

Share to X

Share to Bluesky

More notebooks

Outlier detection — Isolation Forest

Mix.install([
  {:mobius_smarts, path: Path.expand("..", __DIR__)},
  {:kino, "~> 0.14"},
  {:kino_vega_lite, "~> 0.1"}
])

The question

Every other detector in this stack answers a question you could almost write a rule for: did it jump?, is it drifting?, did the shape change? This one answers the uncomfortable one: is this window weird in some way nobody wrote a rule for?

And “weird” here is not weird for this device — it’s weird against what the whole fleet’s data says normal looks like. A device that has only ever seen its own quiet corner of behaviour can still be told “the combination of numbers you just produced is one almost no healthy device in the fleet produces.”

The division of labour matters, so let’s be blunt about it up front:

MobiusSmarts.Detect.Outlier is inference only. It loads a trained forest and scores vectors. It does not train.
Training happens fleet-side, off the device — for example a sklearn.ensemble.IsolationForest fitted over per-device feature vectors pulled from your warehouse. The fitted forest is exported to a small JSON shape and shipped down to each device.

In this notebook we will train a tiny forest in plain Elixir, purely so we have something concrete to play with. That trainer is not part of the library and is not meant for production — but the forest it produces takes the exact same JSON path a real fleet model takes: export to JSON, ship, load!, score.

alias VegaLite, as: Vl
alias MobiusSmarts.Detect.Outlier
alias MobiusSmarts.Detect.Novelty

Few and different, seen on a chart

The whole idea rests on one observation: outliers are few and different. Because they are few and sit apart from the crowd, a random cut through the data — pick an axis, pick a place on it, slice — tends to corner an outlier almost immediately. It ends up alone in its own little region after only a cut or two. A normal point buried in the dense crowd survives many cuts before it finally sits alone.

Let’s make a 2D dataset to look at: a dense blob of normal (cpu, latency) points plus three obvious outliers off on their own.

:rand.seed(:exsss, {1, 2, 3})

normal_pts =
  for _ <- 1..200 do
    [40.0 + :rand.normal() * 6.0, 100.0 + :rand.normal() * 12.0]
  end

outlier_pts = [[85.0, 95.0], [38.0, 200.0], [80.0, 190.0]]

blob_rows = for [x, y] <- normal_pts, do: %{cpu: x, latency: y, kind: "normal"}
out_rows = for [x, y] <- outlier_pts, do: %{cpu: x, latency: y, kind: "outlier"}
all_rows = blob_rows ++ out_rows
length(all_rows)

Now watch a handful of cuts. These cuts are hand-picked for illustration (real cuts are random — that comes next); they are chosen to show an outlier getting cornered fast. Each rule is one cut. After cut 1 (a vertical line at cpu = 62) the lone high-cpu point on the right is already alone — one cut isolated it. The blob, meanwhile, is still one uncut crowd. The horizontal line at latency = 150 corners the two high-latency outliers together; one more cut would split them.

cuts = [
  %{orient: "x", at: 62.0, label: "cut 1: cpu <= 62"},
  %{orient: "y", at: 150.0, label: "cut 2: latency <= 150"},
  %{orient: "x", at: 60.0, label: "cut 3: cpu <= 60"}
]

vrules =
  Vl.new()
  |> Vl.data_from_values(Enum.filter(cuts, &(&1.orient == "x")))
  |> Vl.mark(:rule, color: "firebrick", strokeDash: [6, 4])
  |> Vl.encode_field(:x, "at", type: :quantitative)

hrules =
  Vl.new()
  |> Vl.data_from_values(Enum.filter(cuts, &(&1.orient == "y")))
  |> Vl.mark(:rule, color: "firebrick", strokeDash: [6, 4])
  |> Vl.encode_field(:y, "at", type: :quantitative)

points =
  Vl.new()
  |> Vl.data_from_values(all_rows)
  |> Vl.mark(:circle, size: 60, opacity: 0.7)
  |> Vl.encode_field(:x, "cpu", type: :quantitative, scale: [domain: [20, 95]])
  |> Vl.encode_field(:y, "latency", type: :quantitative, scale: [domain: [60, 215]])
  |> Vl.encode_field(:color, "kind", type: :nominal)

Vl.new(width: 700, height: 420, title: "A few random cuts corner the outliers fast")
|> Vl.layers([points, vrules, hrules])

Look at the red outliers: each sits alone in a region after just one or two cuts. The orange blob is still a single undivided crowd. That gap — cornered in 2 cuts vs. needs dozens — is the entire signal an isolation forest measures.

Build the trainer

A tree is grown by repeatedly making that random cut: pick a random feature, pick a random threshold somewhere between that feature’s min and max among the points currently in this node, send points with x[feature] <= threshold left and the rest right, and recurse. Stop when a node holds one point or fewer, or when we hit a depth cap. The depth cap is ceil(log2(psi)) — the depth where a balanced tree over psi points would bottom out, so we don’t waste effort splitting the crowd forever.

We record at every node how many training points reached it (its size). That count is what lets scoring later “charge” a leaf for the cuts it skipped.

The code below is about 30 lines and is the lesson — read it.

defmodule TinyForest do
  # Grow one tree. Returns a flat list of node maps in pre-order;
  # we renumber into columnar arrays at export time.
  def grow_tree(points, depth_cap) do
    {nodes, _next} = build(points, 0, depth_cap, 0)
    nodes
  end

  defp build(points, depth, depth_cap, id) do
    n = length(points)

    if n <= 1 or depth >= depth_cap do
      {[%{id: id, feature: -1, threshold: 0.0, left: -1, right: -1, size: n}], id + 1}
    else
      feature = :rand.uniform(2) - 1
      vals = Enum.map(points, &Enum.at(&1, feature))
      {lo, hi} = Enum.min_max(vals)

      if lo == hi do
        # No spread on this feature here — make it a leaf.
        {[%{id: id, feature: -1, threshold: 0.0, left: -1, right: -1, size: n}], id + 1}
      else
        threshold = lo + :rand.uniform() * (hi - lo)
        {left_pts, right_pts} = Enum.split_with(points, &(Enum.at(&1, feature) <= threshold))

        # Reserve id for this node, then lay out the left subtree, then right.
        {left_nodes, after_left} = build(left_pts, depth + 1, depth_cap, id + 1)
        {right_nodes, after_right} = build(right_pts, depth + 1, depth_cap, after_left)

        node = %{
          id: id,
          feature: feature,
          threshold: threshold,
          left: id + 1,
          right: after_left,
          size: n
        }

        {[node | left_nodes ++ right_nodes], after_right}
      end
    end
  end

  def grow_forest(points, n_trees, psi) do
    depth_cap = :math.log2(psi) |> :math.ceil() |> trunc()

    for _ <- 1..n_trees do
      sample = Enum.take_random(points, psi)
      grow_tree(sample, depth_cap)
    end
  end
end

Note the recursion lays nodes out in pre-order and threads an id counter through, so node 0 is always the root and children point at the right indices — exactly the columnar layout we export next.

Grow it

We grow 100 trees, each on a fresh random subsample of psi = 64 points. Subsampling is deliberate: small samples make the outliers even more alone, so they get isolated even faster. We seed first so the forest is reproducible across re-runs.

:rand.seed(:exsss, {7, 7, 7})

psi = 64
n_trees = 100
training = normal_pts ++ outlier_pts

forest_nodes = TinyForest.grow_forest(training, n_trees, psi)

depths = Enum.map(forest_nodes, &length/1)
{min_nodes, max_nodes} = Enum.min_max(depths)
IO.puts("grew #{length(forest_nodes)} trees; node counts #{min_nodes}..#{max_nodes}")

Export to JSON and load it

Now the ship-to-device step. We turn each tree into the documented columnar shape: parallel feature, threshold, left, right, size arrays where node 0 is the root, -1 marks a leaf, and you go left when x[feature] <= threshold. This is the same shape the sklearn exporter in the moduledoc produces (sklearn’s own arrays mark leaves with -2 in feature; load!/1 accepts and normalizes that) — the device cannot tell where the forest came from.

defmodule Export do
  def tree_to_columns(nodes) do
    sorted = Enum.sort_by(nodes, & &1.id)

    %{
      "feature" => Enum.map(sorted, & &1.feature),
      "threshold" => Enum.map(sorted, & &1.threshold),
      "left" => Enum.map(sorted, & &1.left),
      "right" => Enum.map(sorted, & &1.right),
      "size" => Enum.map(sorted, & &1.size)
    }
  end

  def forest_to_model(forest, psi) do
    %{
      "psi" => psi,
      "n_features" => 2,
      "trees" => Enum.map(forest, &tree_to_columns/1)
    }
  end
end

model_map = Export.forest_to_model(forest_nodes, psi)
Map.take(model_map, ["psi", "n_features"])

To prove the JSON path actually works end to end, we round-trip the model through :json.encode and :json.decode (OTP 27+) — bytes out, bytes in — before handing the decoded maps to Outlier.load!/1.

json_iodata = :json.encode(model_map)
json_string = IO.iodata_to_binary(json_iodata)
IO.puts("encoded model is #{byte_size(json_string)} bytes of JSON")

decoded = :json.decode(json_string)
model = Outlier.load!(decoded)

%{loaded_trees: length(model.trees), psi: model.psi, n_features: model.n_features}

load! validated every tree (ragged columns, bad indices, etc. would have raised) and converted them to a fast traversal form. From here on we only touch the public API: Outlier.score/2.

Path lengths, tree by tree

Before the score, let’s see the raw signal it’s built from: how deep does a point fall in each tree? We walk a handful of trees by hand for one clear outlier and one normal point and compare the depth each reaches.

defmodule Walk do
  # Depth at which `x` lands in a leaf of one columnar tree.
  def depth(tree, x) do
    walk(tree, x, 0, 0)
  end

  defp walk(tree, x, node, depth) do
    feat = Enum.at(tree["feature"], node)

    if feat == -1 do
      depth
    else
      thresh = Enum.at(tree["threshold"], node)

      next =
        if Enum.at(x, feat) <= thresh,
          do: Enum.at(tree["left"], node),
          else: Enum.at(tree["right"], node)

      walk(tree, x, next, depth + 1)
    end
  end
end

outlier_x = [38.0, 200.0]
normal_x = [40.0, 100.0]

rows =
  model_map["trees"]
  |> Enum.take(12)
  |> Enum.with_index(1)
  |> Enum.map(fn {tree, i} ->
    %{
      "tree #" => i,
      "depth(outlier)" => Walk.depth(tree, outlier_x),
      "depth(normal)" => Walk.depth(tree, normal_x)
    }
  end)

Kino.DataTable.new(rows)

Scan the two depth columns: the normal point bottoms out at the depth cap in every tree, while the outlier lands shallower in most of them and never deeper. No single tree is decisive — a few tie — but the pattern across independent random trees is what makes the signal trustworthy.

The c(n) curve and the score

There’s one subtlety. A leaf isn’t always a single point — at the depth cap a leaf can still hold a whole crowd of n training points we stopped splitting. We don’t pretend that crowd was isolated; instead we charge the path the cuts that would have been needed to keep splitting n points apart. That estimated extra cost is c(n), the average path length of an unsuccessful search in a binary tree over n points. It is public (though @doc false) as Outlier.c/1.

c_rows =
  for n <- [2, 4, 8, 16, 32, 64, 128, 256] do
    %{"n" => n, "c(n)" => Float.round(Outlier.c(n), 3)}
  end

c_chart_data = for n <- 2..256, do: %{n: n, c: Outlier.c(n)}

c_chart =
  Vl.new(width: 700, height: 300, title: "c(n): expected cuts to isolate one of n points")
  |> Vl.data_from_values(c_chart_data)
  |> Vl.mark(:line, color: "steelblue")
  |> Vl.encode_field(:x, "n", type: :quantitative)
  |> Vl.encode_field(:y, "c", type: :quantitative, title: "c(n)")

Kino.Layout.grid([Kino.DataTable.new(c_rows), c_chart], columns: 1)

The curve climbs but flattens — doubling the crowd adds only a little. The value at n = psi is the yardstick for the whole score. With that, the score formula is:

s = 2 ^ ( -mean_path / c(psi) )

In words: take a point’s mean path length over all trees, divide by the path length an ordinary point would be expected to have (c(psi)), and map it onto (0, 1]. A point whose paths are about as long as expected scores ~0.5; a point buried even deeper in a dense crowd than average scores below 0.5; a point cornered suspiciously fast — short paths — scores toward 1.0. The conventional reading treats ~0.5 as ordinary and pushes the alarm above ~0.6–0.7.

IO.puts("c(psi=#{psi}) = #{Float.round(Outlier.c(psi), 4)}")

for {label, vec} <- [normal: normal_x] ++ Enum.with_index(outlier_pts, &{:"outlier#{&2 + 1}", &1}) do
  {label, Float.round(Outlier.score(model, vec), 3)}
end

The normal point scores about 0.39 — below 0.5, because our blob is denser than a random psi-point subsample, so ordinary points fall even deeper than the baseline expects. Each outlier scores 0.65–0.76, clearly into anomalous territory. That is the whole detector working: nothing was told these three points were special — the random cuts found them.

The score landscape

To see the detector’s whole opinion at once, we score a 60×60 grid over the (cpu, latency) plane — Outlier.score/2 takes a batch (a list of [x, y] vectors) and returns a score per cell — and draw it as a heat map with the training points and outliers laid on top.

xs = for i <- 0..59, do: 20.0 + i * (95.0 - 20.0) / 59
ys = for j <- 0..59, do: 60.0 + j * (215.0 - 60.0) / 59

grid_vectors = for y <- ys, x <- xs, do: [x, y]
grid_scores = Outlier.score(model, grid_vectors)

heat_rows =
  Enum.zip(grid_vectors, grid_scores)
  |> Enum.map(fn {[x, y], s} -> %{cpu: x, latency: y, score: s} end)

length(heat_rows)

heat =
  Vl.new()
  |> Vl.data_from_values(heat_rows)
  |> Vl.mark(:rect)
  |> Vl.encode_field(:x, "cpu", type: :quantitative, bin: [maxbins: 60])
  |> Vl.encode_field(:y, "latency", type: :quantitative, bin: [maxbins: 60])
  |> Vl.encode_field(:color, "score",
    type: :quantitative,
    scale: [scheme: "inferno", domain: [0.39, 0.77]]
  )

overlay =
  Vl.new()
  |> Vl.data_from_values(all_rows)
  |> Vl.mark(:circle, size: 40, opacity: 0.8, stroke: "white", strokeWidth: 0.5)
  |> Vl.encode_field(:x, "cpu", type: :quantitative)
  |> Vl.encode_field(:y, "latency", type: :quantitative)
  |> Vl.encode_field(:color, "kind",
    type: :nominal,
    scale: [range: ["#4fc3f7", "#ffffff"]]
  )

Vl.new(width: 500, height: 500, title: "Anomaly score landscape")
|> Vl.layers([heat, overlay])

Read the colours: the blob of normal points sits in a cool (dark) basin (scores around 0.39), while it gets hotter toward the empty far corners and, especially, around the three outliers. Hot means “cornered fast” means anomalous. The outlier neighbourhoods push to ~0.65–0.77, well clear of the ordinary 0.5 line.

Watch the score settle as the forest grows

One forest is a bundle of random trees, so any single tree’s verdict is noisy — the averaging is what makes the score stable. Use the input to grow forests of different sizes (on the same seed) and watch the outlier’s score settle as you add trees.

trees_input =
  Kino.Input.select("trees in forest", [{10, "10"}, {30, "30"}, {100, "100"}],
    default: 30
  )

:rand.seed(:exsss, {7, 7, 7})

chosen_trees = Kino.Input.read(trees_input)
small_forest = TinyForest.grow_forest(training, chosen_trees, psi)
small_map = Export.forest_to_model(small_forest, psi)
small_model = Outlier.load!(:json.decode(IO.iodata_to_binary(:json.encode(small_map))))

%{
  "trees" => chosen_trees,
  "score(outlier)" => Float.round(Outlier.score(small_model, [38.0, 200.0]), 3),
  "score(normal)" => Float.round(Outlier.score(small_model, [40.0, 100.0]), 3)
}

With few trees the outlier’s score is jumpier (about 0.59 at 10 trees, climbing toward ~0.65 by 100); the normal point sits steadily around 0.39 throughout. More trees buy stability, not a different answer — the outlier is anomalous at every forest size.

Why both Outlier and Novelty exist

The forest and MobiusSmarts.Detect.Novelty (Mahalanobis, from notebook 07) both answer “is this combination of metrics weird?” — but they see different shapes. The forest carves the space with axis-aligned cuts and can wrap several separate normal regions. Mahalanobis fits a single ellipse: one centre, one correlation. When “normal” has two operating modes, the ellipse averages them and calls the empty space between them normal-ish — exactly where the forest correctly stays suspicious.

Let’s build a two-mode dataset and fit both on it.

:rand.seed(:exsss, {9, 9, 9})

mode_a = for _ <- 1..150, do: [25.0 + :rand.normal() * 4.0, 80.0 + :rand.normal() * 8.0]
mode_b = for _ <- 1..150, do: [70.0 + :rand.normal() * 4.0, 170.0 + :rand.normal() * 8.0]
two_mode = mode_a ++ mode_b

nov_model = Novelty.fit(two_mode)

:rand.seed(:exsss, {3, 1, 4})
forest2_nodes = TinyForest.grow_forest(two_mode, 100, 64)
forest2_map = Export.forest_to_model(forest2_nodes, 64)
forest2 = Outlier.load!(:json.decode(IO.iodata_to_binary(:json.encode(forest2_map))))

mode_rows = for [x, y] <- two_mode, do: %{cpu: x, latency: y}
length(mode_rows)

Now score the same grid two ways: the forest’s (0,1] anomaly score and Novelty’s distance (bigger = more novel). We draw them side by side.

gx = for i <- 0..49, do: 10.0 + i * (85.0 - 10.0) / 49
gy = for j <- 0..49, do: 50.0 + j * (200.0 - 50.0) / 49
grid2 = for y <- gy, x <- gx, do: [x, y]

forest2_scores = Outlier.score(forest2, grid2)
nov_scores = Novelty.score(nov_model, grid2) |> Nx.to_flat_list()

forest2_rows =
  Enum.zip(grid2, forest2_scores)
  |> Enum.map(fn {[x, y], s} -> %{cpu: x, latency: y, val: s} end)

nov_rows =
  Enum.zip(grid2, nov_scores)
  |> Enum.map(fn {[x, y], s} -> %{cpu: x, latency: y, val: s} end)

mid = [47.5, 125.0]

IO.puts("empty middle point #{inspect(mid)}:")
IO.puts("  forest score  = #{Float.round(Outlier.score(forest2, mid), 3)}")
IO.puts("  novelty score = #{Float.round(Novelty.score(nov_model, mid), 3)}")
:ok

build_heat = fn rows, title, scheme ->
  heat =
    Vl.new()
    |> Vl.data_from_values(rows)
    |> Vl.mark(:rect)
    |> Vl.encode_field(:x, "cpu", type: :quantitative, bin: [maxbins: 50])
    |> Vl.encode_field(:y, "latency", type: :quantitative, bin: [maxbins: 50])
    |> Vl.encode_field(:color, "val", type: :quantitative, scale: [scheme: scheme])

  pts =
    Vl.new()
    |> Vl.data_from_values(mode_rows)
    |> Vl.mark(:circle, size: 20, opacity: 0.5, color: "white")
    |> Vl.encode_field(:x, "cpu", type: :quantitative)
    |> Vl.encode_field(:y, "latency", type: :quantitative)

  Vl.new(width: 350, height: 350, title: title)
  |> Vl.layers([heat, pts])
end

Kino.Layout.grid(
  [
    build_heat.(forest2_rows, "Isolation Forest", "inferno"),
    build_heat.(nov_rows, "Novelty (Mahalanobis)", "viridis")
  ],
  columns: 2
)

Look at the empty gap between the two clouds. On the forest map it is hot — the forest never saw points there, so it’s suspicious. On the Novelty map that same gap sits near the centre of the one big ellipse, so it reads as low distance — “normal-ish” — which is wrong here.

The honest flip side, in prose: the forest cuts only along the axes and is scale-free per feature, so it is blind to a rotated correlation — a long tilted cloud where cpu and latency move together. Mahalanobis fits that tilt exactly. Neither sees everything. Run both; they cover each other’s blind spots.

Honest caveats

The forest is powerful precisely because it learned a specific notion of normal — which is also its fragility:

Scores only mean something relative to the training distribution. A 0.7 here is not comparable to a 0.7 from a model trained on different data. Read scores against the model that produced them.
Features must be built identically to training — same features, same order, same units. Swap two columns or change a unit and the cuts land on the wrong axis; the score becomes noise that still looks like a number.
Models go stale. When the fleet’s notion of normal drifts — new firmware, new hardware revision, seasonal load — yesterday’s outliers become today’s normal. Retrain fleet-side and re-ship the JSON. The device side never changes; only the model it loads does.

The device’s job is small and fixed: build the feature vector the same way every time, load! the current model, score/2. All the judgement lives in the model — and the model lives in the JSON you ship.

Other notebooks:

@TomBers

livebookNotes

Attractors

attractors.livemd

advanced data-science decimal vega_lite kino

2022-8-18
Kevin Pan
@feng19

spider_man

ElixirJobs

elixirjobs.livemd

tutorial advanced spider_man floki nimble_csv kino

2022-8-18
@TomBers

livebookNotes

Fun with Graphs

graphs.livemd

tutorial advanced intermediate vega_lite kino math

2022-8-18
@TomBers

livebookNotes

Epicycloid - draw Curves with Straight Lines

Epicycloid.livemd

tutorial intermediate vega_lite kino math

2022-8-18
@DeSchoel

Elixir_Curriculum

Concurrent Word Count

concurrent_word_count.livemd

advanced gen-server jason kino youtube hidden_cell

2026-7-12
Numerical Elixir (Nx)
@elixir-nx

axon

Classifying fraudulent transactions

credit_card_fraud.livemd

tutorial advanced data-science axon nx exla explorer kino

2026-7-7
@dgigafox

programming_machine_learn...

Chapter 14: The Zen of Testing

14_the_zen_of_testing.livemd

tutorial advanced data-science testing nx exla kino_vega_lite

2026-7-6

Back