Powered by AppSignal & Oban Pro
Would you like to see your link here? Contact us

Selecting Objects From Tables

docs/guides/usage/tables.livemd

Selecting Objects From Tables

Mix.install(
  [
    {:matcha, github: "christhekeele/matcha", tag: "stable"},
    {:jason, ">= 0.0.1"},
    {:finch, ">= 0.0.1"}
  ],
  force: true
)

Finch.start_link(name: Pokemon.Data)

IO.puts("Installed matcha version: #{Application.spec(:matcha, :vsn)}")

What Are Tables?

Seeding Some Data

Before we can really get cooking using Matcha to extract data from an :ets table, we’re going to need to put some data in a table!

For fun example datasets, I like to use data about Pokémon. There’s a lot of them, and they’re fairly well-known — Pokémon being one of the top highest-grossing franchises in human history.

We’ve already created an :ets table called Pokemon.Data in our setup. We’ve also nabbed the Finch library to make HTTP requests; and Jason to parse JSON datasets into Elixir datastructures.

Now we just need a dataset — we’ll use @fanzeyi‘s data here.

{:ok, response} =
  Finch.build(:get, "https://raw.githubusercontent.com/fanzeyi/pokemon.json/master/pokedex.json")
  |> Finch.request(Pokemon.Data)

data =
  response
  |> Map.fetch!(:body)
  |> Jason.decode!()

:ets objects take the form of an n-tuple, with (usually) the first item in the tuple being the “key” we recognize the object by. We’ll massage our dataset to fit this structure: in our case, {id, pokemon_data}, extracting each pokemon’s numeric “id” as our key from the pokemon_data to index it by:

pokedex_objects =
  data
  |> Enum.map(fn pokemon_data -> {Map.fetch!(pokemon_data, "id"), pokemon_data} end)
  |> Enum.sort_by(&elem(&1, 0))

Now we’re ready to insert these records into an :ets table we’ll call the Pokedex (a Pokédex is the in-game term for a registry of known Pokémon). We’ll make a new table of type :set, to indicate that each object is unique by key:

pokedex = :ets.new(Pokedex, [:set])

:ets.insert(pokedex, pokedex_objects)
IO.puts("Loaded #{:ets.info(pokedex, :size)} pokémon into the `Pokedex` table.")

Mapping And Filtering Without Matcha

Let’s start off using the basic :ets APIs to retrieve data from our table.

The :ets.lookup/2 function lets us get our objects by their key — the first element of our tuples. For example, we can check out what the 500th Pokémon is:

:ets.lookup(pokedex, 500)
|> List.first()
|> elem(1)
|> get_in(["name", "english"])

The 500th Pokémon appears to be some creature called Emboar. This scares and confuses me, as I am an elder millenial who never moved on from the 1st generation of 151 Pokémon, and refuse to believe anything has changed from my childhood.

How might we go about getting only Pokémon from the first generation, to soothe my anxiety? While we can use simple :ets APIs to get specific objects by key, there isn’t an easy way to ask it to give us all objects with a key less than 152…

Filtering

We could, of course, load all of our table’s data into memory, and filter out later generation pokemon there:

first_gen_pokemon =
  :ets.tab2list(pokedex)
  |> Enum.filter(fn {id, _pokemon} -> id >= 1 and id <= 151 end)
  |> Enum.sort_by(&amp;elem(&amp;1, 0))
  |> Enum.map(&amp;elem(&amp;1, 1))

Here, we’re using :ets.tab2list/1 to load the entire :ets dataset into our process’s memory.

This filtering succeeds in giving us only the 151 Pokémon I am personally comfortable admitting exist:

first_gen_pokemon |> length

Mapping

Let’s also try extracting just the names of the Pokémon we know about:

pokemon_names =
  :ets.tab2list(pokedex)
  |> Enum.map(fn {_id, %{"name" => %{"english" => name}}} -> name end)

The Problem

Of course, this is all terribly inefficient. :ets has to load all of its data into our process’s memory, where we only want a known handful of it. For large tables, this would greatly harm the performance of our filtering, and even risk crashing the process.

Mapping is even more problematic: if we don’t do a filter beforehand, we have to map over every object, even the ones we are not interested in.

This is generally how :ets is designed to be used, though: storing data we want globally accessible, and looking up known, specific objects by index where and when we need them. :ets in its normal operation can be thought of as a global key/value store for arbitrary terms… That is, until you start using match specs to query them!

Filtering and Mapping With Matcha

Filtering and mapping :ets data efficiently is exactly what match specs were invented to accomplish. We can trivially reproduce the in-memory filtering of our 1st generation of Pokémon with Matcha, using match specs to push the actual work of filtering into :ets itself, with a much more efficient querying mechanism, and copying just the data we want into our process.

Filtering

A Matcha.Spec used in :ets filtering looks like a case statement, where any object that does not match our patterns is never returned from the table:

require Matcha.Table.ETS

Matcha.Table.ETS.select pokedex do
  {id, _pokemon} = object when id in 1..151 -> object
end
|> length()

Mapping

Of course, we don’t have to just return the full :ets object; we can also destructure matched objects, and extract just the data from them we are interested in:

alias Matcha.Table.ETS
require ETS

ETS.select pokedex do
  {id, %{"name" => %{"english" => name}}} when id in 1..151 -> name
end

This is what the “mapping” aspect of match specs refers to: that we can not only filter out objects that do not match our pattern, but transform the data we return from the match into what we are interested in, as if passing it through an Enum.map/2, except at a much lower level, greatly improving the efficiency of our querying.

Multiple Clauses

Just as data in a list in Elixir need not be homogenous, but can contain any term; objects in the same :ets table can be any term — just so long as they are all tuples, and all have a key entry at the same position in the tuples:

hetrogenous_table = :ets.new(HetrogenousTable, [:set])

:ets.insert(hetrogenous_table, [
  {3, "three", "tuple"},
  {4, "four", "tuple", "object"}
])

Our match specs can use multiple clauses to match on these different object shapes:

alias Matcha.Table.ETS
require ETS

ETS.select hetrogenous_table do
  {_key, length, _} -> length
  {_key, length, _, _} -> length
end

We can also return hetrogenous shapes from :ets, our mapping operation need not always shape query results the same:

alias Matcha.Table.ETS
require ETS

ETS.select hetrogenous_table do
  {_key, _, _} -> [{:three, :object, :shape}]
  {_key, _, _, _} -> [{:four, :object, :shape, :tuple}]
end

Combining this with the specificity and expressivity of pattern matching, we can do some very powerful filter/mapping in a very efficient way:

alias Matcha.Table.ETS
require ETS

ETS.select pokedex do
  {id, %{"type" => types, "name" => %{"english" => name}}}
  when id in 1..151 ->
    {name, generation: 1, types: types}

  {id, %{"type" => types, "name" => %{"english" => name}}}
  when id in 152..251 ->
    {name, generation: 2, types: types}

    # Other generations were a mistake
end