Powered by AppSignal & Oban Pro
Would you like to see your link here? Contact us

Extracting data from images of Receipts

ocr-receipts-with-gpt4o.livemd

Extracting data from images of Receipts

Mix.install(
  [
    {:instructor, path: Path.expand("../../", __DIR__)},
    {:kino, "~> 0.12.3"}
  ],
  config: [
    instructor: [
      adapter: Instructor.Adapters.OpenAI,
      openai: [
        api_key: System.fetch_env!("LB_OPENAI_API_KEY"),
        http_options: [receive_timeout: 60_000, connect_options: [protocols: [:http2]]]
      ]
    ]
  ]
)

Our Object Model

image = Kino.FS.file_path("receipt.jpg") |> File.read!()
<<255, 216, 255, 224, 0, 16, 74, 70, 73, 70, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 255, 226, 2, 40, 73, 67,
  67, 95, 80, 82, 79, 70, 73, 76, 69, 0, 1, 1, 0, 0, 2, 24, 0, 0, 0, 0, 2, 16, 0, 0, ...>>

We can look at the image of the receipt and outline the fields that we want to extract using an Ecto schema, as shown below.

defmodule Receipt do
  use Ecto.Schema
  use Instructor.Validator
  
  import Ecto.Changeset

  @primary_key false
  embedded_schema do
    field(:total, :decimal)
    field(:subtotal, :decimal)

    embeds_many :items, Item do
      field(:name, :string)
      field(:price, :decimal)
      field(:quantity, :integer)
    end
  end

  @impl true
  def validate_changeset(changeset) do
    changeset
    |> validate_required([:total, :subtotal])
    |> validate_items_total()
  end

  defp validate_items_total(changeset) do
    items = get_field(changeset, :items) || []
    subtotal = get_field(changeset, :subtotal)

    items_total = Enum.reduce(items, Decimal.new(0), fn item, acc ->
      item_total = Decimal.mult(item.price, Decimal.new(item.quantity))
      Decimal.add(acc, item_total)
    end)

    if Decimal.equal?(items_total, subtotal) do
      changeset
    else
      add_error(changeset, :subtotal, """
        Subtotal does not match sum of item prices. 
        Subtotal is #{subtotal} and items total is #{items_total}")
      """)
    end
  end
end

Kino.nothing()

Validating the Subtotal

Notice how we used the validate_changeset callback to check that the items we extract sum up to the subtotal on the receipt. This process allows us to gain confidence that the OCR application is functioning correctly. In fact, later on, we can even use the re-ask feature of the instructor to have the LLM correct any errors itself.

Calling the LLM

base64_image = "data:image/jpeg;base64," <> Base.encode64(image)

{:ok, receipt} = Instructor.chat_completion(
  model: "gpt-4o",
  response_model: Receipt,
  messages: [
    %{
      role: "user",
      content: [
        %{
          type: "image_url",
          image_url: %{url: base64_image}
        },
        %{
          type: "text",
          text: "Analyze the image and return the items in the receipt and the total amount."
        }
      ]
    }
  ]
)

Kino.Layout.grid([Kino.Image.new(image, :jpeg), receipt], columns: 2, boxed: true)
%Receipt{
  total: Decimal.new("107.6"),
  subtotal: Decimal.new("107.6"),
  items: [
    %Receipt.Item{id: "1", name: "Lorem ipsum", price: Decimal.new("9.2"), quantity: 1},
    %Receipt.Item{id: "2", name: "Lorem ipsum dolor sit", price: Decimal.new("19.2"), quantity: 1},
    %Receipt.Item{
      id: "3",
      name: "Lorem ipsum dolor sit amet",
      price: Decimal.new("15"),
      quantity: 1
    },
    %Receipt.Item{id: "4", name: "Lorem ipsum", price: Decimal.new("15"), quantity: 1},
    %Receipt.Item{id: "5", name: "Lorem ipsum", price: Decimal.new("15"), quantity: 1},
    %Receipt.Item{id: "6", name: "Lorem ipsum dolor sit", price: Decimal.new("15"), quantity: 1},
    %Receipt.Item{id: "7", name: "Lorem ipsum", price: Decimal.new("19.2"), quantity: 1}
  ]
}

Now, we simply using Instructor to call gpt-4o with the base64 encoded image and the response model, and we can get back the results. We can have confidence that the results returned match all of our validations, and we’ve reduced the effects of any hallucinations.