Powered by AppSignal & Oban Pro

Image Vision

livebooks/image_vision.livemd

Image Vision

> #### Livebook Desktop users {: .info} > > Livebook Desktop launches as a GUI app and does not inherit your terminal’s PATH. If Mix.install below fails with “cargo: command not found” or similar, your Rust toolchain isn’t visible to Livebook Desktop. Fix by creating ~/.livebookdesktop.sh with at minimum: > > sh > export PATH="$HOME/.cargo/bin:$PATH" > # If you use mise/asdf, also activate them here. > > > Restart Livebook Desktop after editing the file. See the project README for details on which toolchains are needed.

Mix.install(
  [
    {:image_vision, "~> 0.2"},
    {:kino, "~> 0.14"},
    # Required for classification (Bumblebee servings).
    {:bumblebee, "~> 0.6"},
    {:nx, "~> 0.10"},
    {:exla, "~> 0.10"},
    # Required for detection and segmentation (ONNX runtime).
    {:ortex, "~> 0.1"}
  ],
  config: [
    # Tell ImageVision.Application to start the classification serving
    # under its own supervisor. Classifier weights (~110 MB) download
    # from HuggingFace on first run.
    image_vision: [classifier: [autostart: true]]
  ]
)
# Use EXLA as the Nx backend for any tensor work in this notebook.
Nx.global_default_backend(EXLA.Backend)

Classification

Upload any image and the model returns a ranked list of ImageNet labels with confidence scores. The default model is facebook/convnext-tiny-224 (Apache 2.0, ~110 MB).

classification_input = Kino.Input.image("Image to classify")
image =
  classification_input
  |> Kino.Input.read()
  |> Image.from_kino!()

predictions =
  image
  |> Image.Classification.classify()
  |> Map.get(:predictions)
  |> Enum.take(5)
  |> Enum.map(fn %{label: label, score: score} ->
    %{"Label" => label, "Confidence" => "#{Float.round(score * 100, 1)}%"}
  end)

Kino.DataTable.new(predictions)

Object Detection

Upload an image and the model returns bounding boxes with labels for every recognised object. The default model is onnx-community/rtdetr_r50vd (RT-DETR, Apache 2.0, ~175 MB), which detects 80 COCO object classes.

detection_input = Kino.Input.image("Image for object detection")
image =
  detection_input
  |> Kino.Input.read()
  |> Image.from_kino!()

detections = Image.Detection.detect(image)
# Draw bounding boxes and labels using the library helper.
# See Image.Detection.draw_bbox_with_labels/3 for tunable options
# (`:opacity`, `:stroke_width`, `:font_size`, `:palette`).
Image.Detection.draw_bbox_with_labels(detections, image)
# Summary table of what was detected.
detections
|> Enum.map(fn %{label: label, score: score, box: {x, y, w, h}} ->
  %{
    "Label" => label,
    "Confidence" => "#{Float.round(score * 100, 1)}%",
    "Box (x, y, w, h)" => "#{trunc(x)}, #{trunc(y)}, #{trunc(w)}, #{trunc(h)}"
  }
end)
|> Kino.DataTable.new()

Segmentation

Segmentation answers “exactly which pixels belong to this object?”. We show two modes: panoptic (every pixel automatically labelled by class) and promptable (you specify a point inside the object you want).

Panoptic segmentation

The default model is Xenova/detr-resnet-50-panoptic (Apache 2.0, ~175 MB). It labels every pixel across 133 COCO panoptic categories — both things (cat, car) and stuff (sky, grass).

panoptic_input = Kino.Input.image("Image for panoptic segmentation")
image =
  panoptic_input
  |> Kino.Input.read()
  |> Image.from_kino!()

segments = Image.Segmentation.segment_panoptic(image)

# Render a coloured overlay — one colour per segment label.
overlay = Image.Segmentation.compose_overlay(image, segments)

Kino.Layout.grid(
  [
    Kino.Layout.grid([image, Kino.Markdown.new("**Original**")], boxed: true),
    Kino.Layout.grid([overlay, Kino.Markdown.new("**Panoptic segments**")], boxed: true)
  ],
  columns: 2
)
# Segment summary table.
segments
|> Enum.sort_by(& &1.score, :desc)
|> Enum.map(fn %{label: label, score: score} ->
  %{"Label" => label, "Score" => "#{Float.round(score * 100, 1)}%"}
end)
|> Kino.DataTable.new()

Promptable segmentation (SAM 2)

The default model is SharpAI/sam2-hiera-tiny-onnx (Apache 2.0, ~150 MB total — encoder + decoder). Click a point inside the object you want to isolate and SAM 2 returns a precise pixel mask for it.

sam_input = Kino.Input.image("Image for SAM 2 segmentation")
image =
  sam_input
  |> Kino.Input.read()
  |> Image.from_kino!()

width = Image.width(image)
height = Image.height(image)

# Use a Kino point picker or hard-code a prompt point.
# Replace {cx, cy} with the pixel coordinates of the object you want to isolate.
cx = div(width, 2)
cy = div(height, 2)

%{mask: mask, score: score} =
  Image.Segmentation.segment(image, prompt: {:point, cx, cy})

IO.puts("Mask score: #{Float.round(score, 3)}")

# Apply the mask as an alpha channel so only the segmented object is visible.
{:ok, cutout} = Image.Segmentation.apply_mask(image, mask)

# Build an SVG to mark the prompt point on the original image.
point_svg = """

  
  

"""

{:ok, point_overlay} = Image.open(point_svg, access: :sequential)
{:ok, annotated} = Image.compose(image, point_overlay)

Kino.Layout.grid(
  [
    Kino.Layout.grid([annotated, Kino.Markdown.new("**Prompt point**")], boxed: true),
    Kino.Layout.grid([mask, Kino.Markdown.new("**Predicted mask**")], boxed: true),
    Kino.Layout.grid([cutout, Kino.Markdown.new("**Cutout**")], boxed: true)
  ],
  columns: 3
)