Powered by AppSignal & Oban Pro
Would you like to see your link here? Contact us
Notesclub

DataSchema From Scratch

livebooks/from_scratch.livemd

DataSchema From Scratch

Prerequisites

This livebook dives into how we chose to implement data schema, touching on some of the design decisions along the way.

Mix.install([:data_schema, :decimal, :sweet_xml])

DataSchemas From Scratch.

In Duffel we make requests to external services that return data and we want to be able to turn the responses into something structured quickly and easily. A simple way to do that is to use a struct.

A struct means later on in the system we can pattern match on the struct and know what kinds of fields we expect to be there, and we know which module is relevant to go to for docs etc.

So let’s take the following example

response = """

"""

We want to turn it into one of these:

defmodule Money do
  defstruct [:amount, :currency]
end

defmodule Order do
  defstruct [:origin, :destination, :departing_at, :total_price]
end

The struct that we want would look something like this:

%Order{
  origin: "LHR",
  destination: "JFK",
  departing_at: ~T[14:00:00],
  total_price: %Money{amount: Decimal.new("100.00"), currency: "GBP"}
}

You will notice that this transformation involves a few things. First we must traverse “in” to the input data to extract values from within it. Next we must transform those values somehow - sometimes combining multiple values from the input data together. Finally we must put those values under a key of some kind in a struct.

To implement this let’s first create a description of the data we want from the input, this will just be a list of paths to the values we want. Because the input is XML we will use xpaths to describe where to get the values

[
  "/Flight/@leaving",
  "/Flight/@arriving",
  "/Flight/@departing_at",
  "/Flight/@price",
  "/Flight/@currency"
]

Next we want to be able to transform those values in some way, for example the time string should become a Time struct and we want to combine the money data into a money type. For now we are going to keep the money data separate and come back to what we can do to combine them later.

Let’s add a “casting” function for each value, the easiest way to do this is with a tuple:

[
  {"/Flight/@leaving", fn leaving -> leaving end},
  {"/Flight/@arriving", fn arriving -> arriving end},
  {"/Flight/@departing_at", fn time -> Time.from_iso8601!(time) end},
  {"/Flight/@price", fn price -> price end},
  {"/Flight/@currency", fn currency -> currency end}
]

This does the simplest thing for now and provides a function that will get the value at the end of the path which it gets the chance to do something with before putting it into the struct.

Which leaves us with the last missing bit - we need to specify the keys we want each value to live under. Again for now we will just ignore that we need to combine currecny and price to make a Money struct.

fields = [
  {:origin, "/Flight/@leaving", fn leaving -> leaving end},
  {:destination, "/Flight/@arriving", fn arriving -> arriving end},
  {:departing_at, "/Flight/@departing_at", fn time -> Time.from_iso8601!(time) end},
  {:price, "/Flight/@price", fn price -> price end},
  {:currency, "/Flight/@currency", fn currency -> currency end}
]

This has then given us a simple description of the data we want, how we can (optionally) transform that data, and the keys we want the transformed data to live under.

But how do we do that transformation?

Well the simplest way is a reduce. We just need one extra thing which is something that can interpret the Xpath for us. I will use SweetXML for this demo.

import SweetXml, only: [sigil_x: 2]

input = """

"""

Enum.reduce(fields, %{}, fn {key, path, cast_fn}, acc ->
  input_data = SweetXml.xpath(input, ~x"#{path}"s)
  Map.put(acc, key, cast_fn.(input_data))
end)

Nice, this is close! We have transformed the time as desired but now we want to create the money correctly. To do that we should recognise that there are really a few kinds of things at play in the description of the fields we provided.

There are some kinds of fields that are just “take the value transform and put it under a key” and there are some that are more nuanced. They may want to return a nested struct, or a list of them, or aggregate some data in some way.

DatSchema currently defines 5 kinds of fields you can use.

DataSchema Field Types

The 5 kinds of fields are:

  1. field - The value will be a casted value from the source data.
  2. list_of - The value will be a list of casted values created from the source data.
  3. has_one - The value will be created from a nested data schema (so will be a struct)
  4. has_many - The value will be created by casting a list of values into a data schema. (You end up with a list of structs defined by the provided schema). Similar to has_many in ecto
  5. aggregate - The value will be a casted value formed from multiple bits of data in the source.

Using has_one

Let’s use has_one to create the money type we desire. This says “I will create a nested struct”. To do that we can first define the fields needed to create a Money, then put them in the parent schema.

money_fields = [
  field: {:amount, "./@price", fn price -> Decimal.new(price) end},
  field: {:currency, "./@currency", fn currency -> currency end}
]

fields = [
  field: {:origin, "/Flight/@leaving", fn leaving -> leaving end},
  field: {:destination, "/Flight/@arriving", fn arriving -> arriving end},
  field: {:departing_at, "/Flight/@departing_at", fn time -> Time.from_iso8601!(time) end},
  has_one: {:total_price, "/Flight", {%Money{}, money_fields}}
]

You will see our total_price has this at the end

{%Money{}, money_fields}

This just says “take the money fields and create a Money struct from what it describes”

Let’s update our reduce function, we will put it into a module so we can use recursion.

input = """

"""

defmodule Schema do
  def to_struct(input, fields, accumulator) do
    Enum.reduce(fields, accumulator, fn
      {:field, {key, path, cast_fn}}, acc ->
        input_data = SweetXml.xpath(input, ~x"#{path}"s)
        Map.put(acc, key, cast_fn.(input_data))

      {:has_one, {key, path, {nested_acc, nested_fields}}}, acc ->
        input_data = SweetXml.xpath(input, ~x"#{path}")
        value = to_struct(input_data, nested_fields, nested_acc)
        Map.put(acc, key, value)
    end)
  end
end

Schema.to_struct(input, fields, %{})

Using aggregate

If we were to insted to use an aggregate field we could do so by changing our fields slightly:

money_fields = [
  field: {:amount, "/Flight/@price", fn price -> Decimal.new(price) end},
  field: {:currency, "/Flight/@currency", fn currency -> currency end}
]

to_money = fn %{amount: amount, currency: currency} ->
  %Money{amount: amount, currency: currency}
end

fields = [
  field: {:origin, "/Flight/@leaving", fn leaving -> leaving end},
  field: {:destination, "/Flight/@arriving", fn arriving -> arriving end},
  field: {:departing_at, "/Flight/@departing_at", fn time -> Time.from_iso8601!(time) end},
  aggregate: {:total_price, money_fields, to_money}
]

Now we update the reduce a bit

defmodule SchemaV2 do
  def to_struct(input, fields, accumulator) do
    Enum.reduce(fields, accumulator, fn
      {:field, {key, path, cast_fn}}, acc ->
        input_data = SweetXml.xpath(input, ~x"#{path}"s)
        Map.put(acc, key, cast_fn.(input_data))

      {:has_one, {key, path, {nested_acc, nested_fields}}}, acc ->
        input_data = SweetXml.xpath(input, ~x"#{path}")
        value = to_struct(input_data, nested_fields, nested_acc)
        Map.put(acc, key, value)

      {:aggregate, {key, nested_fields, cast_fn}}, acc ->
        data_map = to_struct(input, nested_fields, %{})
        Map.put(acc, key, cast_fn.(data_map))
    end)
  end
end

SchemaV2.to_struct(input, fields, %{})

Using Compiled Schemas

What we have seen so far gives us a lot of flexibility in what exactly we create when we call to_struct. Our fields specify a key that we will put values under, but it does now care which specific struct we create. To demo that we could do this:

SchemaV2.to_struct(input, fields, %Order{})

Or even

defmodule Ticket do
  defstruct [:origin, :destination, :departing_at, :total_price]
end
SchemaV2.to_struct(input, fields, %Ticket{})

We also have the option to forgo that flexibility and create structs at the same time that we our fields. We could define what we have already in the following way

defmodule MoneyV2 do
  import DataSchema

  data_schema(
    field: {:amount, "./@price", fn price -> Decimal.new(price) end},
    field: {:currency, "./@currency", fn currency -> currency end}
  )
end

defmodule OrderV2 do
  import DataSchema

  data_schema(
    field: {:origin, "/Flight/@leaving", fn leaving -> leaving end},
    field: {:destination, "/Flight/@arriving", fn arriving -> arriving end},
    field: {:departing_at, "/Flight/@departing_at", fn time -> Time.from_iso8601!(time) end},
    has_one: {:total_price, "/Flight", MoneyV2}
  )
end

Behind the scenes what data_schema does is define a defstruct for us and create a function called __data_schema_fields/0 and puts the field descriptions in it. That means we can define a simpler to_struct function:

defmodule SchemaV3 do
  def to_struct(input, module) do
    to_struct(input, module.__data_schema_fields(), struct(module, %{}))
  end

  def to_struct(input, fields, accumulator) do
    Enum.reduce(fields, accumulator, fn
      {:field, {key, path, cast_fn}}, acc ->
        input_data = SweetXml.xpath(input, ~x"#{path}"s)
        Map.put(acc, key, cast_fn.(input_data))

      {:has_one, {key, path, {nested_acc, nested_fields}}}, acc ->
        input_data = SweetXml.xpath(input, ~x"#{path}")
        value = to_struct(input_data, nested_fields, nested_acc)
        Map.put(acc, key, value)

      {:has_one, {key, path, nested_module}}, acc ->
        input_data = SweetXml.xpath(input, ~x"#{path}")
        value = to_struct(input_data, nested_module)
        Map.put(acc, key, value)

      {:aggregate, {key, nested_fields, cast_fn}}, acc ->
        data_map = to_struct(input, nested_fields, %{})
        Map.put(acc, key, cast_fn.(data_map))
    end)
  end
end

SchemaV3.to_struct(input, OrderV2)

Different Input data

Finally, all of the demonstrations so far have used XML as the input data, but the schemas can handle any given data type. To think about how let’s look at where we currently call the SweetXML function and parameterize that instead:

defmodule SchemaV4 do
  def to_struct(input, module) do
    accessor = module.__data_accessor()
    fields = module.__data_schema_fields()
    to_struct(input, fields, accessor, struct(module, %{}))
  end

  def to_struct(input, fields, data_accessor, accumulator) do
    Enum.reduce(fields, accumulator, fn
      {:field, {key, path, cast_fn}}, acc ->
        input_data = data_accessor.field(input, path)
        Map.put(acc, key, cast_fn.(input_data))

      {:has_one, {key, path, {nested_acc, nested_fields}}}, acc ->
        input_data = data_accessor.has_one(input, path)
        value = to_struct(input_data, nested_fields, data_accessor, nested_acc)
        Map.put(acc, key, value)

      {:has_one, {key, path, nested_module}}, acc ->
        input_data = data_accessor.has_one(input, path)
        value = to_struct(input_data, nested_module)
        Map.put(acc, key, value)

      {:aggregate, {key, nested_fields, cast_fn}}, acc ->
        data_map = to_struct(input, nested_fields, data_accessor, %{})
        Map.put(acc, key, cast_fn.(data_map))
    end)
  end
end

defmodule XpathAccessor do
  def has_one(input, path) do
    SweetXml.xpath(input, ~x"#{path}")
  end

  def field(input, path) do
    SweetXml.xpath(input, ~x"#{path}"s)
  end
end

SchemaV4.to_struct(input, fields, XpathAccessor, %{})

What you notice is that the function we just parameterized could be anything. That means our path can be anything as long as the data accessor knows how to use that path on the input data to extract some value.

That means we can define different schemas and have them work on different input data, we can specify a different accessor by using a module attribute on the schema.

input = %{
  "leaving" => "LHR",
  "arriving" => "JFK",
  "price" => "100.00",
  "currency" => "GBP",
  "departing_at" => "14:00:00"
}

defmodule AccessAccessor do
  def has_one(input, []), do: input

  def has_one(input, path) do
    get_in(input, path)
  end

  def field(input, path) do
    get_in(input, path)
  end
end

defmodule MapAccessor do
  def has_one(input, ""), do: input

  def has_one(input, path) do
    Map.get(input, path)
  end

  def field(input, path) do
    Map.get(input, path)
  end
end

defmodule MoneyV3 do
  import DataSchema

  @data_accessor AccessAccessor
  data_schema(
    field: {:amount, ["price"], fn price -> Decimal.new(price) end},
    field: {:currency, ["currency"], fn currency -> currency end}
  )
end

defmodule OrderV3 do
  import DataSchema

  @data_accessor MapAccessor
  data_schema(
    field: {:origin, "leaving", fn x -> {:ok, to_string(x)} end},
    field: {:destination, "arriving", fn arriving -> arriving end},
    field: {:departing_at, "departing_at", fn time -> Time.from_iso8601!(time) end},
    has_one: {:total_price, "", MoneyV3}
  )
end

SchemaV4.to_struct(input, OrderV3)