NomicBERT text embeddings on Emily
Mix.install(
[
{:emily, "~> 0.7"},
{:bumblebee, "~> 0.7"},
{:tokenizers, "~> 0.5"},
{:nx, "~> 0.12"},
{:kino, "~> 0.14"}
],
config: [
nx: [default_backend: Emily.Backend]
]
)
Overview
This notebook runs the nomic-ai/nomic-embed-text-v1 encoder on
Emily.Backend to produce sentence embeddings. NomicBERT is one of
the three new model families shipped with Bumblebee 0.7. It’s a
long-context (8192-token) BERT variant with rotary position
embeddings and SwiGLU FFNs — used as a drop-in replacement for
sentence-transformers style embedders.
The integration with Emily is the Mix.install config above; no
further setup is required.
Loading the model
{:ok, model_info} =
Bumblebee.load_model({:hf, "nomic-ai/nomic-embed-text-v1"},
module: Bumblebee.Text.NomicBert,
architecture: :base
)
{:ok, tokenizer} =
Bumblebee.load_tokenizer({:hf, "nomic-ai/nomic-embed-text-v1"})
The checkpoint is ~550 MB on first fetch. module: and
architecture: are passed explicitly because the upstream config
predates the Bumblebee 0.7 auto-detect mapping for this repo.
Building an embedding serving
serving =
Bumblebee.Text.text_embedding(model_info, tokenizer,
output_attribute: :hidden_state,
output_pool: :mean_pooling,
embedding_processor: :l2_norm,
defn_options: [compiler: Emily.Compiler, native: true, native_fallback: :raise]
)
NomicBERT’s :base graph returns :hidden_state, so the serving
applies mean pooling over the sequence axis and then L2-normalises —
matching the recipe the upstream sentence-transformers adapter
uses. Emily.Compiler pins the result backend to Emily.Backend
and caps partition concurrency at 1. native: true lowers the whole
forward through Emily’s native Expr compiler (one NIF replay per call
rather than op-by-op dispatch), and native_fallback: :raise fails
loudly instead of silently degrading to the evaluator. This encoder
lowers fully native.
Embedding a few texts
texts = [
"search_document: Elixir is a functional language for the BEAM.",
"search_document: Rust is a systems language with strict ownership rules.",
"search_query: which language runs on the BEAM?"
]
[doc_a, doc_b, query] =
for %{embedding: e} <- Nx.Serving.run(serving, texts), do: e
NomicBERT expects the search_document: / search_query: prefix
on every input — without it the query/document embedding spaces
don’t align and similarities collapse. See
the model card
for the full prefix list (classification:, clustering:, etc).
Cosine similarity
defmodule Cosine do
def sim(a, b) do
a
|> Nx.multiply(b)
|> Nx.sum()
|> Nx.to_number()
end
end
%{
query_vs_elixir: Cosine.sim(query, doc_a),
query_vs_rust: Cosine.sim(query, doc_b)
}
Because the embeddings are L2-normalised, the dot product is the cosine similarity. The Elixir document should score noticeably higher than the Rust one.
Telemetry
Under native: true the forward is a single NIF replay, so the
op-by-op [:emily, :eval, :stop] span never fires — there’s no
per-op boundary to time. The native-compiler event to watch instead
is [:emily, :compiler, :fallback]: a tripwire that fires only if an
op can’t lower and routes through the evaluator. Attach it, run the
forward, then read Emily.Memory.stats/0 (which itself emits
[:emily, :memory, :stats]):
:telemetry.attach(
"nomic-embed-fallback",
[:emily, :compiler, :fallback],
fn _event, %{count: count}, %{reason: reason}, _config ->
IO.puts("native fallback (#{count}): #{reason}")
end,
nil
)
Emily.Memory.reset_peak()
Nx.Serving.run(serving, "search_query: how fast is Emily?")
%{active: active, peak: peak} = Emily.Memory.stats()
IO.puts("no fallback above => forward lowered fully native")
IO.puts(
"MLX memory — active #{div(active, 1024 * 1024)} MiB, " <>
"peak #{div(peak, 1024 * 1024)} MiB"
)
See Emily.Telemetry for the full event catalogue, including the
[:emily, :fallback, *] span that fires whenever an op routes
through Nx.BinaryBackend.