Image Vision
> #### Livebook Desktop users {: .info}
>
> Livebook Desktop launches as a GUI app and does not inherit your terminal’s PATH. If Mix.install below fails with “cargo: command not found” or similar, your Rust toolchain isn’t visible to Livebook Desktop. Fix by creating ~/.livebookdesktop.sh with at minimum:
>
> sh > export PATH="$HOME/.cargo/bin:$PATH" > # If you use mise/asdf, also activate them here. >
>
> Restart Livebook Desktop after editing the file. See the project README for details on which toolchains are needed.
Mix.install(
[
{:image_vision, "~> 0.2"},
{:kino, "~> 0.14"},
# Required for classification (Bumblebee servings).
{:bumblebee, "~> 0.6"},
{:nx, "~> 0.10"},
{:exla, "~> 0.10"},
# Required for detection and segmentation (ONNX runtime).
{:ortex, "~> 0.1"}
],
config: [
# Tell ImageVision.Application to start the classification serving
# under its own supervisor. Classifier weights (~110 MB) download
# from HuggingFace on first run.
image_vision: [classifier: [autostart: true]]
]
)
# Use EXLA as the Nx backend for any tensor work in this notebook.
Nx.global_default_backend(EXLA.Backend)
Classification
Upload any image and the model returns a ranked list of ImageNet labels with confidence scores. The default model is facebook/convnext-tiny-224 (Apache 2.0, ~110 MB).
classification_input = Kino.Input.image("Image to classify")
image =
classification_input
|> Kino.Input.read()
|> Image.from_kino!()
predictions =
image
|> Image.Classification.classify()
|> Map.get(:predictions)
|> Enum.take(5)
|> Enum.map(fn %{label: label, score: score} ->
%{"Label" => label, "Confidence" => "#{Float.round(score * 100, 1)}%"}
end)
Kino.DataTable.new(predictions)
Object Detection
Upload an image and the model returns bounding boxes with labels for every recognised object. The default model is onnx-community/rtdetr_r50vd (RT-DETR, Apache 2.0, ~175 MB), which detects 80 COCO object classes.
detection_input = Kino.Input.image("Image for object detection")
image =
detection_input
|> Kino.Input.read()
|> Image.from_kino!()
detections = Image.Detection.detect(image)
# Draw bounding boxes and labels using the library helper.
# See Image.Detection.draw_bbox_with_labels/3 for tunable options
# (`:opacity`, `:stroke_width`, `:font_size`, `:palette`).
Image.Detection.draw_bbox_with_labels(detections, image)
# Summary table of what was detected.
detections
|> Enum.map(fn %{label: label, score: score, box: {x, y, w, h}} ->
%{
"Label" => label,
"Confidence" => "#{Float.round(score * 100, 1)}%",
"Box (x, y, w, h)" => "#{trunc(x)}, #{trunc(y)}, #{trunc(w)}, #{trunc(h)}"
}
end)
|> Kino.DataTable.new()
Segmentation
Segmentation answers “exactly which pixels belong to this object?”. We show two modes: panoptic (every pixel automatically labelled by class) and promptable (you specify a point inside the object you want).
Panoptic segmentation
The default model is Xenova/detr-resnet-50-panoptic (Apache 2.0, ~175 MB). It labels every pixel across 133 COCO panoptic categories — both things (cat, car) and stuff (sky, grass).
panoptic_input = Kino.Input.image("Image for panoptic segmentation")
image =
panoptic_input
|> Kino.Input.read()
|> Image.from_kino!()
segments = Image.Segmentation.segment_panoptic(image)
# Render a coloured overlay — one colour per segment label.
overlay = Image.Segmentation.compose_overlay(image, segments)
Kino.Layout.grid(
[
Kino.Layout.grid([image, Kino.Markdown.new("**Original**")], boxed: true),
Kino.Layout.grid([overlay, Kino.Markdown.new("**Panoptic segments**")], boxed: true)
],
columns: 2
)
# Segment summary table.
segments
|> Enum.sort_by(& &1.score, :desc)
|> Enum.map(fn %{label: label, score: score} ->
%{"Label" => label, "Score" => "#{Float.round(score * 100, 1)}%"}
end)
|> Kino.DataTable.new()
Promptable segmentation (SAM 2)
The default model is SharpAI/sam2-hiera-tiny-onnx (Apache 2.0, ~150 MB total — encoder + decoder). Click a point inside the object you want to isolate and SAM 2 returns a precise pixel mask for it.
sam_input = Kino.Input.image("Image for SAM 2 segmentation")
image =
sam_input
|> Kino.Input.read()
|> Image.from_kino!()
width = Image.width(image)
height = Image.height(image)
# Use a Kino point picker or hard-code a prompt point.
# Replace {cx, cy} with the pixel coordinates of the object you want to isolate.
cx = div(width, 2)
cy = div(height, 2)
%{mask: mask, score: score} =
Image.Segmentation.segment(image, prompt: {:point, cx, cy})
IO.puts("Mask score: #{Float.round(score, 3)}")
# Apply the mask as an alpha channel so only the segmented object is visible.
{:ok, cutout} = Image.Segmentation.apply_mask(image, mask)
# Build an SVG to mark the prompt point on the original image.
point_svg = """
"""
{:ok, point_overlay} = Image.open(point_svg, access: :sequential)
{:ok, annotated} = Image.compose(image, point_overlay)
Kino.Layout.grid(
[
Kino.Layout.grid([annotated, Kino.Markdown.new("**Prompt point**")], boxed: true),
Kino.Layout.grid([mask, Kino.Markdown.new("**Predicted mask**")], boxed: true),
Kino.Layout.grid([cutout, Kino.Markdown.new("**Cutout**")], boxed: true)
],
columns: 3
)