Extracting data from images of Receipts

ocr-receipts-with-gpt4o.livemd

Thomas Millar

@thmsmlr

instructor_ex

Share to X

Share to Bluesky

More notebooks

Extracting data from images of Receipts

Mix.install(
  [
    {:instructor, path: Path.expand("../../", __DIR__)},
    {:kino, "~> 0.12.3"}
  ],
  config: [
    instructor: [
      adapter: Instructor.Adapters.OpenAI,
      openai: [
        api_key: System.fetch_env!("LB_OPENAI_API_KEY"),
      ]
    ]
  ]
)

Our Object Model

image = Kino.FS.file_path("receipt.jpg") |> File.read!()

<<255, 216, 255, 224, 0, 16, 74, 70, 73, 70, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 255, 226, 2, 40, 73, 67,
  67, 95, 80, 82, 79, 70, 73, 76, 69, 0, 1, 1, 0, 0, 2, 24, 0, 0, 0, 0, 2, 16, 0, 0, ...>>

We can look at the image of the receipt and outline the fields that we want to extract using an Ecto schema, as shown below.

defmodule Receipt do
  use Ecto.Schema
  use Instructor.Validator
  
  import Ecto.Changeset

  @primary_key false
  embedded_schema do
    field(:total, :decimal)
    field(:subtotal, :decimal)

    embeds_many :items, Item do
      field(:name, :string)
      field(:price, :decimal)
      field(:quantity, :integer)
    end
  end

  @impl true
  def validate_changeset(changeset) do
    changeset
    |> validate_required([:total, :subtotal])
    |> validate_items_total()
  end

  defp validate_items_total(changeset) do
    items = get_field(changeset, :items) || []
    subtotal = get_field(changeset, :subtotal)

    items_total = Enum.reduce(items, Decimal.new(0), fn item, acc ->
      item_total = Decimal.mult(item.price, Decimal.new(item.quantity))
      Decimal.add(acc, item_total)
    end)

    if Decimal.equal?(items_total, subtotal) do
      changeset
    else
      add_error(changeset, :subtotal, """
        Subtotal does not match sum of item prices. 
        Subtotal is #{subtotal} and items total is #{items_total}")
      """)
    end
  end
end

Kino.nothing()

Validating the Subtotal

Notice how we used the validate_changeset callback to check that the items we extract sum up to the subtotal on the receipt. This process allows us to gain confidence that the OCR application is functioning correctly. In fact, later on, we can even use the re-ask feature of the instructor to have the LLM correct any errors itself.

Calling the LLM

base64_image = "data:image/jpeg;base64," <> Base.encode64(image)

{:ok, receipt} = Instructor.chat_completion(
  model: "gpt-4o",
  response_model: Receipt,
  messages: [
    %{
      role: "user",
      content: [
        %{
          type: "image_url",
          image_url: %{url: base64_image}
        },
        %{
          type: "text",
          text: "Analyze the image and return the items in the receipt and the total amount."
        }
      ]
    }
  ]
)

Kino.Layout.grid([Kino.Image.new(image, :jpeg), receipt], columns: 2, boxed: true)

%Receipt{
  total: Decimal.new("107.6"),
  subtotal: Decimal.new("107.6"),
  items: [
    %Receipt.Item{id: "1", name: "Lorem ipsum", price: Decimal.new("9.2"), quantity: 1},
    %Receipt.Item{id: "2", name: "Lorem ipsum dolor sit", price: Decimal.new("19.2"), quantity: 1},
    %Receipt.Item{
      id: "3",
      name: "Lorem ipsum dolor sit amet",
      price: Decimal.new("15.0"),
      quantity: 1
    },
    %Receipt.Item{id: "4", name: "Lorem ipsum", price: Decimal.new("15.0"), quantity: 1},
    %Receipt.Item{id: "5", name: "Lorem ipsum", price: Decimal.new("15.0"), quantity: 1},
    %Receipt.Item{id: "6", name: "Lorem ipsum dolor sit", price: Decimal.new("15.0"), quantity: 1},
    %Receipt.Item{id: "7", name: "Lorem ipsum", price: Decimal.new("19.2"), quantity: 1}
  ]
}

Now, we simply using Instructor to call gpt-4o with the base64 encoded image and the response model, and we can get back the results. We can have confidence that the results returned match all of our validations, and we’ve reduced the effects of any hallucinations.

Other notebooks:

@TomBers

livebookNotes

Attractors

attractors.livemd

decimal vega_lite kino

2022-8-18
Kevin Pan
@feng19

spider_man

ElixirJobs

elixirjobs.livemd

spider_man floki nimble_csv kino

2022-8-18
@TomBers

livebookNotes

Fun with Graphs

graphs.livemd

vega_lite kino math

2022-8-18
@TomBers

livebookNotes

Epicycloid - draw Curves with Straight Lines

Epicycloid.livemd

vega_lite kino math

2022-8-18
Sérgio Deusdedith de Araujo Neto
@osergioneto

estudos

Examples - Bumblebee

bumblebee_examples.livemd

nx exla axon kino bumblebee

2023-6-23
@DockYard-Academy

curriculum

Games: Score Tracker

deprecated_games_score_tracker.livemd

jason kino youtube hidden_cell

2023-6-5
profiq
@profiq

elixir-ml-example

train_model

train.livemd

exla axon nx

2022-8-18

Back