Powered by AppSignal & Oban Pro
Would you like to see your link here? Contact us

GPT-4o-vision - Extracting Data from Images

pages/cookbook/gpt4o-vision.livemd

GPT-4o-vision - Extracting Data from Images

Mix.install(
  [
    {:instructor, path: Path.expand("../../", __DIR__)},
    {:kino, "~> 0.12.3"}
  ],
  config: [
    instructor: [
      adapter: Instructor.Adapters.OpenAI,
      openai: [
        api_key: System.fetch_env!("LB_OPENAI_API_KEY"),
      ]
    ]
  ]
)

Motivation

The latest models support vision capabilities as well. This, with no extra work, is a feature of Instructor. All you have to do is pass a URL or Base64 encoded image as one of the messages, and everything should just work seamlessly.

In the following example, we will extract product details from a screenshot of a Shopify store.

image = Kino.FS.file_path("shopify-screenshot.png") |> File.read!()
base64_image = "data:image/png;base64," <> Base.encode64(image)

defmodule Product do
  use Ecto.Schema

  @primary_key false
  embedded_schema do
    field(:name, :string)
    field(:price, :decimal)
    field(:currency, Ecto.Enum, values: [:usd, :gbp, :eur, :cny])
    field(:color, :string)
  end
end

{:ok, result} =
  Instructor.chat_completion(
    model: "gpt-4o",
    response_model: Product,
    messages: [
      %{
        role: "user",
        content: [
          %{type: "text", text: "What is the product details of the following image?"},
          %{type: "image_url", image_url: %{url: base64_image, detail: "high"}}
        ]
      }
    ]
  )

result
%Product{
  name: "Thomas Wooden Railway Thomas The Tank Engine",
  price: Decimal.new("33.0"),
  currency: :usd,
  color: "blue"
}