Computer Vision - Extracting Data from Images

pages/cookbooks/vision.livemd

Alex Martsinovich

@martosaur

instructor_lite

Share to X

Share to Bluesky

More notebooks

Computer Vision - Extracting Data from Images

Mix.install(
  [
    {:instructor_lite, "~> 1.0"},
    {:req, "~> 0.5"},
    {:kino, "~> 0.12.3"}
  ]
)

Motivation

In recent months, the latest AI research labs have turned LLMs into multimodal models. What this means is that they no longer just interpret text, but they can also interpret images. One example of this provided by Anthropic is the Claude 3.5 Sonnet model. With no extra work, you can now provide images into your prompts with Instructor and still do the normal structured extractions that you’re used to.

In the following example, we will extract product details from a screenshot of a Shopify store.

Setup

In order to run code in this notebook, you need to add your Anthropic API key as an ANTHROPIC_KEY Livebook secret. It will then be accessible through an environment variable.

secret_key = System.fetch_env!("LB_ANTHROPIC_KEY")
:ok

:ok

Example

image = Kino.FS.file_path("shopify-screenshot.png") |> File.read!()
base64_image = Base.encode64(image)

defmodule Product do
  use Ecto.Schema

  @primary_key false
  embedded_schema do
    field(:name, :string)
    field(:price, :decimal)
    field(:currency, Ecto.Enum, values: [:usd, :gbp, :eur, :cny])
    field(:color, :string)
  end
end

{:ok, result} =
  InstructorLite.instruct(%{
      messages: [
        %{
          role: "user",
          content: [
            %{type: "text", text: "What is the product details of the following image?"},
            %{type: "image", source: %{data: base64_image, type: "base64", media_type: "image/png"}}
          ]
        }
      ]
    },
    adapter: InstructorLite.Adapters.Anthropic,
    response_model: Product,
    adapter_context: [api_key: secret_key]
  )

result

%Product{
  name: "Thomas Wooden Railway Thomas The Tank Engine",
  price: Decimal.new("33.0"),
  currency: :usd,
  color: "blue"
}

Other notebooks:

@andyl

elix_util

MNIST

mnist.livemd

req axon exla nx

2022-8-18
@TomBers

livebookNotes

Attractors

attractors.livemd

decimal vega_lite kino

2022-8-18
Wojtek Mach
@wojtekmach

notebooks

Playground

rss.livemd

req easyxml

2022-8-18
Wojtek Mach
@wojtekmach

notebooks

RSS

rss2.livemd

req easyxml

2022-8-18
Phil Chen
@fahchen

fahchen

Copying Tables Using `Ecto.Repo.stream/2`

ecto-streaming-copy.livemd

ecto ecto_sql postgrex

2024-10-8
Ryo Wakabayashi
@RyoWakabayashi

elixir-learning

Image Classification Client

Image_classification_client.livemd

exla axon_onnx evision flow req kino nx

2023-6-25
@DockYard-Academy

curriculum

Rubber Ducky

deprecated_rubber_ducky.livemd

jason kino youtube hidden_cell

2023-1-23

Back