ModernBERT sequence classification on Emily

modernbert_classification.livemd

@ausimian

emily

Share to X

Share to Bluesky

More notebooks

ModernBERT sequence classification on Emily

Mix.install(
  [
    {:emily, "~> 0.4"},
    {:bumblebee, "~> 0.7"},
    {:tokenizers, "~> 0.5"},
    {:nx, "~> 0.12"},
    {:kino, "~> 0.14"}
  ],
  config: [
    nx: [default_backend: Emily.Backend]
  ]
)

Overview

This notebook runs a ModernBERT sequence-classification fine-tune on Emily.Backend. ModernBERT is one of the three new model families that landed with Bumblebee 0.7 — a long-context (8192 token) BERT successor with RoPE positional encoding, GeGLU activations, and alternating local/global attention. It’s the most interesting of the three for Emily because it’s the first encoder with rotary embeddings to land in the conformance suite, and it exercises both local and global attention paths in a single forward.

The integration with Emily is the Mix.install config line above; no further setup is required.

Loading the model

repo = "tasksource/ModernBERT-base-nli"

{:ok, model_info} = Bumblebee.load_model({:hf, repo})
{:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, repo})

tasksource/ModernBERT-base-nli is an NLI fine-tune (entailment / neutral / contradiction across MNLI, ANLI, FEVER, …). Swap in any ModernBERT-based *ForSequenceClassification checkpoint — the Bumblebee 0.7 auto-detect resolves the architecture and the serving pipeline is identical. The base checkpoint is ~600 MB on first fetch.

Building a classification serving

serving =
  Bumblebee.Text.text_classification(model_info, tokenizer,
    top_k: 3,
    defn_options: [compiler: Emily.Compiler]
  )

Emily.Compiler pins the result backend to Emily.Backend and caps partition concurrency at 1. For an encoder this size, a single MLX command queue saturates the GPU on a single inference job — use Emily.Stream if you need parallel inferences.

Classifying a few examples

ModernBERT NLI fine-tunes expect the sentence-pair format [SEP]. Bumblebee’s tokenizer accepts a {premise, hypothesis} tuple and emits the right framing.

examples = [
  {"Elixir runs on the BEAM virtual machine.", "Elixir runs on a virtual machine."},
  {"Elixir runs on the BEAM virtual machine.", "Elixir is a compiled language with no runtime."},
  {"Cats sleep most of the day.", "Cats are nocturnal predators."}
]

Nx.Serving.run(serving, examples)

Each result is a %{predictions: [%{label: _, score: _} | _]} map. With this fine-tune the top label will be one of "entailment", "neutral", or "contradiction".

Local-vs-global attention path

ModernBERT alternates :sliding_attention (local window, default 128 tokens) with :full_attention (global, every block sees every other) across its 22 layers. Both paths run on Emily.Backend through the standard scaled-dot-product attention kernel — the local window is realised as an additive mask, so there’s no separate code path on the Emily side. Long-document inputs (~2k+ tokens) are where the saved compute on local layers actually shows up in throughput.

Telemetry

Emily emits :telemetry events at the evaluation boundary. Attach a handler to sample timing for each forward pass:

:telemetry.attach(
  "modernbert-cls",
  [:emily, :eval, :stop],
  fn _event, %{duration: duration}, _meta, _config ->
    ms = System.convert_time_unit(duration, :native, :millisecond)
    IO.puts("eval #{ms} ms")
  end,
  nil
)

Nx.Serving.run(serving, hd(examples))

See Emily.Telemetry for the full event catalogue, including the [:emily, :block, :fallback] event that fires whenever an op routes through Nx.BinaryBackend.

Other notebooks:

Michal Slaski
@michalslaski

livebook_examples

Salary predictions

salary_prediction.livemd

advanced data-science exla axon nx

2022-8-18
Dr. Christian Geuer-Pollmann
@chgeuer

livebook_on_azure

Christian's first LiveBook test

notebook1.livemd

tutorial advanced data-science axon exla nx

2022-8-18
@andyl

elix_util

MNIST

mnist.livemd

tutorial advanced data-science req axon exla nx

2022-8-18
Yejun Su
@goofansu

ogp

ogp

ogp.livemd

tutorial intermediate ogp kino

2022-8-18
@DockYard-Academy

curriculum

Score Tracker

score_tracker.livemd

tutorial intermediate gen-server jason kino youtube hidden_cell

2026-5-14
Ammar Massoud
@ammar-mohamed-massoud

Elixir-DockYard

Supervised Stack

supervised_stack.livemd

tutorial advanced gen-server jason kino youtube hidden_cell

2026-5-18
Ammar Massoud
@ammar-mohamed-massoud

Elixir-DockYard

Book Changeset

book_changeset.livemd

tutorial advanced intermediate data-structures jason kino youtube hidden_cell ecto

2026-5-19

Back