Powered by AppSignal & Oban Pro
Would you like to see your link here? Contact us

PII Data Sanitization

pii-data-sanitization.livemd

PII Data Sanitization

Section

Mix.install(
  [
    {:instructor, "~> 0.0.0"}
  ],
  config: [
    instructor: [
      adapter: Instructor.Adapters.OpenAI,
      openai: [api_key: System.fetch_env!("LB_OPENAI_API_KEY")]
    ]
  ]
)
:ok

Overview

This example demonstrates the usage of OpenAI’s ChatCompletion model for the extraction and scrubbing of Personally Identifiable Information (PII) from an input. The code defines Ecto schema to manage the PII data and offers function for both extraction and sanitation.

Defining the Structures

First, Ecto schemas are defined to represent the PII data and the overall structure for PII data extraction.

defmodule PII do
  use Ecto.Schema
  use Instructor.Validator

  @doc """
  ## Field Descriptions:
  - index: an auto incrementing integer starting at zero
  - type: the type of personal identifiable information
  - value: the PII value
  """
  @primary_key false
  embedded_schema do
    embeds_many :data, Datum, primary_key: false do
      field(:index, :integer)
      field(:type, :string)
      field(:value, :string)
    end
  end

  @doc """
  Iterates over the private data and replaces the value with a placeholder in the
  form of <{data_type}_{i}>
  """
  def scrub({:ok, pii}, input) do
    Enum.reduce(pii.data, input, fn datum, acc ->
      String.replace(acc, datum.value, "<#{datum.type}_#{datum.index}>")
    end)
  end

  def scrub({:error, reason}, _input) do
    dbg(reason)
  end

  def extract(input) do
    Instructor.chat_completion(
      model: "gpt-3.5-turbo",
      response_model: PII,
      max_retries: 3,
      messages: [
        %{
          role: "system",
          content:
            "You are a world class PII scrubbing model, Extract the PII data from the following document"
        },
        %{
          role: "system",
          content: """
          Examples of PII: names, addresses, phone numbers, email addresses, financial information
          """
        },
        %{
          role: "system",
          content: """
          Instructions:
          - any spaces in the type should be converted to underscores and all letters should be lower case
          - use abbreviations when choosing the type
          """
        },
        %{
          role: "user",
          content: input
        }
      ]
    )
  end
end
{:module, PII, <<70, 79, 82, 49, 0, 0, 26, ...>>, {:extract, 1}}

Extracting PII Data

The OpenAI API is utilized to extract PII information from a given input.

input =
  "Hello John Smith, I am Jill. Your GitBoat, LLC credit card account 1111-0000-1111-8765 has a minimum payment of $33.32 that is due by July 24th."

pii_data = PII.extract(input)
{:ok,
 %PII{
   data: [
     %PII.Datum{index: 0, type: "credit_card_number", value: "1111-0000-1111-8765"},
     %PII.Datum{index: 1, type: "currency", value: "$33.32"},
     %PII.Datum{index: 2, type: "date", value: "July 24th"},
     %PII.Datum{index: 3, type: "person_name", value: "John Smith"},
     %PII.Datum{index: 4, type: "person_name", value: "Jill"},
     %PII.Datum{index: 5, type: "organization_name", value: "GitBoat, LLC"}
   ]
 }}

Scrubbing PII Data

After extracting the PII data, the PII.scrub/2 funnction is used to sanitize the input.

PII.scrub(pii_data, input)
"Hello , I am . Your  credit card account  has a minimum payment of  that is due by ."