Cost Tracking: Cost, track/1, and inspect_history
Mix.install(
[
{:dsxir, path: Path.expand("../..", __DIR__)},
{:sycophant, "~> 0.4"},
{:kino, "~> 0.19"}
]
)
Overview
Every LM call dsxir makes carries a price: tokens in, tokens out, and —
when the provider reports it — a money cost. dsxir surfaces this through
one value type, Dsxir.Cost, and three ways to read it:
| You want | Use |
|---|---|
| the cost of a single predictor call |
prediction.lm_usage |
| the total cost of a block of work |
Dsxir.Cost.track/1 |
| an after-the-fact log of recent calls |
Dsxir.History (inspect_history) |
| to feed your own dashboard / billing |
telemetry on [:dsxir, :predictor, :stop] |
Dsxir.Cost is provider-independent: the mapping from a specific
provider’s usage struct into a Dsxir.Cost lives in the Dsxir.LM
implementation, not in the value type. That means the same aggregation
code works whether you are on OpenAI, Anthropic, or a local model.
When run from a checkout of dsxir, Mix.install/1 above resolves the
library from the parent directory. If you launch this livebook from
elsewhere, replace the path: line with the dsxir version.
The Cost struct
A Dsxir.Cost is a flat breakdown of one call’s usage, or — after
aggregation — of many calls.
alias Dsxir.Cost
Cost.zero()
#Dsxir.Cost<%{calls: 0}>
The fields:
| Field | Meaning |
|---|---|
:input_tokens, :output_tokens |
prompt and completion token counts |
:cache_read_tokens, :cache_write_tokens |
prompt-cache hits and writes, when the provider reports them |
:reasoning_tokens |
hidden reasoning tokens (e.g. on reasoning models) |
:input_cost, :output_cost, :cache_read_cost, :cache_write_cost, :reasoning_cost |
per-bucket money cost |
:total_cost |
the sum the provider charged |
:currency |
currency string, e.g. "USD" |
:calls |
1 for a single call, N after aggregation |
The key invariant: nil is distinct from 0. A nil token field
means the provider did not report that bucket; a 0 means it reported
zero. Aggregation preserves this — a field that is nil in every call
stays nil rather than collapsing to 0.
Dsxir.Cost.zero/0 is the additive identity: every numeric field nil,
calls: 0. It is what track/1 and sum/1 seed with.
The Inspect output is deliberately compact — it shows only the
populated token fields, the total cost, and the call count, hiding the
per-bucket costs and currency:
%Cost{input_tokens: 1200, output_tokens: 340, total_cost: 0.0021, calls: 1}
#Dsxir.Cost<%{in: 1200, out: 340, cost: 0.0021, calls: 1}>
Configuring the LM
api_key_input = Kino.Input.password("OPENAI_API_KEY")
lm_frame = fn ->
api_key = Kino.Input.read(api_key_input)
[lm: {Dsxir.LM.Sycophant,
[model: "openai:gpt-4o-mini", api_key: api_key, temperature: 0.0]}]
end
#Function<43.113135111/0 in :erl_eval.expr/6>
A tiny program to spend tokens on. One ChainOfThought predictor that
answers a question.
defmodule CostDemo.QA do
use Dsxir.Signature
signature do
instruction "Answer the question in a single, short sentence."
input :question, :string
output :answer, :string
end
end
{:module, CostDemo.QA, <<70, 79, 82, 49, 0, 0, 105, ...>>, ...}
defmodule CostDemo.Program do
use Dsxir.Module
predictor :qa, Dsxir.Predictor.ChainOfThought, signature: CostDemo.QA
def forward(prog, %{question: q}) do
call(prog, :qa, %{question: q})
end
end
{:module, CostDemo.Program, <<70, 79, 82, 49, 0, 0, 83, ...>>, ...}
Per-call cost: lm_usage
Every Dsxir.Prediction returned from a predictor call carries the cost
of that call in its :lm_usage field. Run this once you have entered an
API key above.
Dsxir.context(lm_frame.(), fn ->
prog = Dsxir.Program.new(CostDemo.Program)
{_prog, pred} = CostDemo.Program.forward(prog, %{question: "What is the capital of France?"})
%{answer: pred[:answer], usage: pred.lm_usage}
end)
%{usage: #Dsxir.Cost<%{in: 118, out: 50, reasoning: 0, cost: 4.77e-5, calls: 1}>, answer: "Paris"}
pred.lm_usage is a %Dsxir.Cost{} for that single call (calls: 1).
Token counts are populated when the provider reported usage; total_cost
is populated only when the LM implementation has pricing data for the
model. If a field reads nil, that is “not reported”, not “free”.
Aggregating a whole run: track/1
Reading lm_usage works for one call, but a real program makes many — a
ChainOfThought step, a retrieval embedding, a fan-out across examples.
Dsxir.Cost.track/1 runs a function and hands you back both its result
and the summed cost of every LM call made inside the block:
Dsxir.context(lm_frame.(), fn ->
{answers, total} =
Cost.track(fn ->
prog = Dsxir.Program.new(CostDemo.Program)
["What is 2+2?", "Who wrote Hamlet?", "What color is the sky?"]
|> Enum.map(fn q ->
{_prog, pred} = CostDemo.Program.forward(prog, %{question: q})
pred[:answer]
end)
end)
%{answers: answers, total_cost: total}
end)
%{
total_cost: #Dsxir.Cost<%{in: 350, out: 191, reasoning: 0, cost: 1.671e-4, calls: 3}>,
answers: ["4", "William Shakespeare", "The sky is typically blue during the day."]
}
total is a single Dsxir.Cost with calls equal to the number of
predictor (and embedding) calls in the block, and every token/cost field
summed. It captures both [:dsxir, :predictor, :stop] and
[:dsxir, :lm, :embed, :stop] events, so retrieval-augmented programs
account for their embedding spend too.
Tracking across fan-out workers
track/1 does not just watch the calling process. It pushes a scope id
onto the Dsxir.Settings stack, which propagates to workers that replay
the settings snapshot (Dsxir.Settings.run/2) — the same mechanism
Dsxir.Predictor.Parallel and Task.async_stream-based fan-out use. So
costs from concurrent workers are folded into the same total:
Dsxir.context(lm_frame.(), fn ->
{_results, total} =
Cost.track(fn ->
snapshot = Dsxir.Settings.snapshot()
prog = Dsxir.Program.new(CostDemo.Program)
["What is 10 * 10?", "Name a primary color.", "What is the boiling point of water in C?"]
|> Task.async_stream(
fn q ->
Dsxir.Settings.run(snapshot, fn ->
{_prog, pred} = CostDemo.Program.forward(prog, %{question: q})
pred[:answer]
end)
end,
timeout: :infinity
)
|> Enum.map(fn {:ok, answer} -> answer end)
end)
total
end)
#Dsxir.Cost<%{in: 356, out: 147, reasoning: 0, cost: 1.416e-4, calls: 3}>
track/1 blocks are safe to nest: an inner block captures only its own
calls, and the outer total still includes them. It also tears down its
telemetry handler and ETS table even if the block raises — you get the
partial cost back via the exception path, never a leaked handler.
Combining costs by hand
track/1 is built on three pure helpers you can use directly when you
have Dsxir.Cost values from elsewhere (stored, returned from
lm_usage, etc.):
a = %Cost{input_tokens: 10, output_tokens: 5, total_cost: 0.001, currency: "USD", calls: 1}
b = %Cost{input_tokens: 20, output_tokens: 7, total_cost: 0.002, currency: "USD", calls: 1}
Cost.merge(a, b)
#Dsxir.Cost<%{in: 30, out: 12, cost: 0.003, calls: 2}>
merge/2 is field-wise addition with nil as the identity per field;
calls always adds, and the first non-nil currency wins. sum/1
folds a list the same way, seeded with zero/0:
Cost.sum([
%Cost{input_tokens: 10, total_cost: 0.001, calls: 1},
%Cost{input_tokens: 20, total_cost: 0.002, calls: 1},
%Cost{input_tokens: 5, total_cost: 0.0005, calls: 1}
])
#Dsxir.Cost<%{in: 35, cost: 0.0035, calls: 3}>
Because nil is the identity, mixing reported and unreported buckets
does the sane thing: a bucket reported by some calls and not others sums
the reported ones; a bucket reported by none stays nil.
Inspecting recent history
Dsxir.History is the developer tool behind a DSPy-style
inspect_history. It is a supervised owner of an ETS table that records
one row per LM call, trimmed to a configurable window
(:max_history_size, default 10_000). It is off by default — turn
it on with enable/0:
Dsxir.History.enable()
:ok
Now make a few calls, then read them back newest-first:
Dsxir.context(lm_frame.(), fn ->
prog = Dsxir.Program.new(CostDemo.Program)
for q <- ["Define entropy.", "What is a monad?"] do
{_prog, _pred} = CostDemo.Program.forward(prog, %{question: q})
end
end)
Dsxir.History.last(2)
[
%{
signature: #Dsxir.Signature.Compiled reasoning, answer>,
metadata: %{},
source: :predictor,
adapter: Dsxir.Adapter.Chat,
cache_read_tokens: nil,
cache_write_tokens: nil,
reasoning_tokens: 0,
cost: 8.835e-5,
tokens_in: 117,
tokens_out: 118,
predictor: Dsxir.Predictor.Predict,
duration: 1941790333,
prediction: #Dsxir.Prediction<
reasoning: "A monad is a design pattern used in functional programming to handle computations as a series of steps, encapsulating values and providing a way to chain operations while managing side effects. It consists of three main components: a type constructor, a unit function (or return), and a bind function (or flatMap), which together allow for the sequencing of operations in a consistent manner.",
answer: "A monad is a design pattern in functional programming that encapsulates computations and manages side effects through a type constructor, a unit function, and a bind function."
>,
model: nil,
cost_breakdown: #Dsxir.Cost<%{in: 117, out: 118, reasoning: 0, cost: 8.835e-5, calls: 1}>,
occurred_at: 1779876843077191
},
%{
signature: #Dsxir.Signature.Compiled reasoning, answer>,
metadata: %{},
source: :predictor,
adapter: Dsxir.Adapter.Chat,
cache_read_tokens: nil,
cache_write_tokens: nil,
reasoning_tokens: 0,
cost: 8.91e-5,
tokens_in: 114,
tokens_out: 120,
predictor: Dsxir.Predictor.Predict,
duration: 2129844875,
prediction: #Dsxir.Prediction<
reasoning: "Entropy is a measure of the disorder or randomness in a system, often associated with the amount of information that is missing from our knowledge of the complete microstate of the system. In thermodynamics, it quantifies the energy in a physical system that is not available to do work. In information theory, it represents the average amount of information produced by a stochastic source of data. Thus, entropy can be understood in both physical and informational contexts.",
answer: "Entropy is a measure of disorder or randomness in a system, reflecting the amount of unavailable energy or information."
>,
model: nil,
cost_breakdown: #Dsxir.Cost<%{in: 114, out: 120, reasoning: 0, cost: 8.91e-5, calls: 1}>,
occurred_at: 1779876841134987
}
]
Each row carries the flat token measurements (tokens_in, tokens_out,
the cache/reasoning breakdown), a cost (the total_cost float), and a
cost_breakdown (the full %Dsxir.Cost{} struct) — plus source
(:predictor or :embed), the predictor, signature, adapter, the
prediction, the model (for embeds), duration, and any extra
telemetry metadata.
Pull just the cost-relevant columns:
Dsxir.History.last(5)
|> Enum.map(fn row ->
%{source: row.source, tokens_in: row.tokens_in, tokens_out: row.tokens_out, cost: row.cost}
end)
[
%{source: :predictor, cost: 8.835e-5, tokens_in: 117, tokens_out: 118},
%{source: :predictor, cost: 8.91e-5, tokens_in: 114, tokens_out: 120}
]
To total the cost across the recorded window, fold the cost_breakdown
structs through sum/1:
Dsxir.History.last(100)
|> Enum.map(& &1.cost_breakdown)
|> Enum.reject(&is_nil/1)
|> Cost.sum()
#Dsxir.Cost<%{in: 231, out: 238, reasoning: 0, cost: 1.7745e-4, calls: 2}>
You can also dump rows to disk for offline inspection — one
JSON-encoded entry per line. The prediction field is rendered through
inspect/1 so structs containing functions or PIDs do not break
encoding:
Dsxir.History.last(50, file: Path.join(System.tmp_dir!(), "dsxir_history.jsonl"))
[
%{
signature: #Dsxir.Signature.Compiled reasoning, answer>,
metadata: %{},
source: :predictor,
adapter: Dsxir.Adapter.Chat,
cache_read_tokens: nil,
cache_write_tokens: nil,
reasoning_tokens: 0,
cost: 8.835e-5,
tokens_in: 117,
tokens_out: 118,
predictor: Dsxir.Predictor.Predict,
duration: 1941790333,
prediction: #Dsxir.Prediction<
reasoning: "A monad is a design pattern used in functional programming to handle computations as a series of steps, encapsulating values and providing a way to chain operations while managing side effects. It consists of three main components: a type constructor, a unit function (or return), and a bind function (or flatMap), which together allow for the sequencing of operations in a consistent manner.",
answer: "A monad is a design pattern in functional programming that encapsulates computations and manages side effects through a type constructor, a unit function, and a bind function."
>,
model: nil,
cost_breakdown: #Dsxir.Cost<%{in: 117, out: 118, reasoning: 0, cost: 8.835e-5, calls: 1}>,
occurred_at: 1779876843077191
},
%{
signature: #Dsxir.Signature.Compiled reasoning, answer>,
metadata: %{},
source: :predictor,
adapter: Dsxir.Adapter.Chat,
cache_read_tokens: nil,
cache_write_tokens: nil,
reasoning_tokens: 0,
cost: 8.91e-5,
tokens_in: 114,
tokens_out: 120,
predictor: Dsxir.Predictor.Predict,
duration: 2129844875,
prediction: #Dsxir.Prediction<
reasoning: "Entropy is a measure of the disorder or randomness in a system, often associated with the amount of information that is missing from our knowledge of the complete microstate of the system. In thermodynamics, it quantifies the energy in a physical system that is not available to do work. In information theory, it represents the average amount of information produced by a stochastic source of data. Thus, entropy can be understood in both physical and informational contexts.",
answer: "Entropy is a measure of disorder or randomness in a system, reflecting the amount of unavailable energy or information."
>,
model: nil,
cost_breakdown: #Dsxir.Cost<%{in: 114, out: 120, reasoning: 0, cost: 8.91e-5, calls: 1}>,
occurred_at: 1779876841134987
}
]
Turn it back off when you are done — the handler runs in the calling process on every LM call, so leave it disabled in production unless you want the recording:
Dsxir.History.disable()
:ok
Rolling your own: telemetry
lm_usage, track/1, and Dsxir.History are all built on the same
telemetry event. For a billing pipeline or a live dashboard, attach your
own handler to [:dsxir, :predictor, :stop] (and
[:dsxir, :lm, :embed, :stop] for embeddings):
:telemetry.attach(
"cost-tracking-livebook",
[:dsxir, :predictor, :stop],
fn _event, measurements, metadata, _config ->
# measurements: %{tokens_in, tokens_out, cache_read_tokens, cache_write_tokens,
# reasoning_tokens, cost, duration} (token values nil when unreported)
# metadata.cost: the full %Dsxir.Cost{} struct
IO.inspect(metadata.cost, label: "predictor cost")
end,
nil
)
warning: variable "measurements" is unused (if the variable is not meant to be used, prefix it with an underscore)
└─ Development/edlon/dsxir/guides/tutorials/cost_tracking.livemd#cell:hbanqra4j2rkeduy:4
12:14:03.103 [info] The function passed as a handler with ID "cost-tracking-livebook" is a local function.
This means that it is either an anonymous function or a capture of a function without a module specified. That may cause a performance penalty when calling that handler. For more details see the note in `telemetry:attach/4` documentation.
https://hexdocs.pm/telemetry/telemetry.html#attach/4
:ok
Dsxir.context(lm_frame.(), fn ->
prog = Dsxir.Program.new(CostDemo.Program)
{_prog, _pred} = CostDemo.Program.forward(prog, %{question: "What is 7 * 6?"})
:ok
end)
predictor cost: #Dsxir.Cost<%{
in: 119,
out: 41,
reasoning: 0,
cost: 4.2449999999999995e-5,
calls: 1
}>
:ok
:telemetry.detach("cost-tracking-livebook")
:ok
The metadata also carries _cost_scope (the list of active track/1
scope ids, [] outside any block), so a single handler can branch on
whether it is inside a tracked run. Dsxir.Cost.to_measurements/1 is the
function that flattens a Dsxir.Cost onto those measurement keys, should
you need to re-emit one.
Choosing between them
-
prediction.lm_usage— you already hold the prediction and want the cost of that one call. Zero ceremony. -
Dsxir.Cost.track/1— you want the total for a unit of work (a request, a compilation, an eval batch), including fan-out. The go-to for “what did this run cost?”. -
Dsxir.History— interactive debugging: “what were the last few calls and what did they cost?”. A dev tool, not a production meter. - Telemetry — anything durable: metrics, dashboards, per-tenant billing. The other three are conveniences over this.