GPT-4o-vision - Extracting Data from Images
Mix.install(
[
{:instructor, path: Path.expand("../../", __DIR__)},
{:kino, "~> 0.12.3"}
],
config: [
instructor: [
adapter: Instructor.Adapters.OpenAI,
openai: [
api_key: System.fetch_env!("LB_OPENAI_API_KEY"),
]
]
]
)
Motivation
The latest models support vision capabilities as well. This, with no extra work, is a feature of Instructor. All you have to do is pass a URL or Base64 encoded image as one of the messages, and everything should just work seamlessly.
In the following example, we will extract product details from a screenshot of a Shopify store.
image = Kino.FS.file_path("shopify-screenshot.png") |> File.read!()
base64_image = "data:image/png;base64," <> Base.encode64(image)
defmodule Product do
use Ecto.Schema
@primary_key false
embedded_schema do
field(:name, :string)
field(:price, :decimal)
field(:currency, Ecto.Enum, values: [:usd, :gbp, :eur, :cny])
field(:color, :string)
end
end
{:ok, result} =
Instructor.chat_completion(
model: "gpt-4o",
response_model: Product,
messages: [
%{
role: "user",
content: [
%{type: "text", text: "What is the product details of the following image?"},
%{type: "image_url", image_url: %{url: base64_image, detail: "high"}}
]
}
]
)
result
%Product{
name: "Thomas Wooden Railway Thomas The Tank Engine",
price: Decimal.new("33.0"),
currency: :usd,
color: "blue"
}