Powered by AppSignal & Oban Pro
Would you like to see your link here? Contact us

Caching: ETS and Agent

caching_agent_ets.livemd

Caching: ETS and Agent

Mix.install([
  {:kino, github: "livebook-dev/kino", override: true},
  {:kino_lab, "~> 0.1.0-dev", github: "jonatanklosko/kino_lab"},
  {:vega_lite, "~> 0.1.4"},
  {:kino_vega_lite, "~> 0.1.1"},
  {:benchee, "~> 0.1"},
  {:ecto, "~> 3.7"},
  {:math, "~> 0.7.0"},
  {:faker, "~> 0.17.0"},
  {:utils, path: "#{__DIR__}/../utils"}
])

Navigation

Return Home Report An Issue

Setup

Ensure you type the ea keyboard shortcut to evaluate all Elixir cells before starting. Alternatively you can evaluate the Elixir cells as you read.

Cache

A cache allows us to store data to avoid computing values.

We commonly use caches to avoid performance-demanding functions or re-retrieving external resources.

flowchart LR
input --> e[expensive computation] --> output

A cache can store the expected output for a given input to a function for quicker access.

flowchart
input --> cache --cached value--> output
cache --no cached value --> e[expensive computation] --> output

Some caches are smart and save newly computed values in the cache to avoid recomputing the same value.

Other caches may be static or infrequently changing data that occasionally updates in the background.

For example, if we had a slow fibonacci $fib(n) = fib(n - 1) + fib(n-2)$ function, we could speed it up by saving computed values in a cache.

defmodule Fib do
  def of(0), do: 0
  def of(1), do: 1
  def of(n) do
    of(n - 1) + of(n - 2)
  end
end

cache = %{
  0 => 0,
  1 => 1,
  2 => 1,
  3 => 2,
  4 => 3,
  5 => 5,
  6 => 8,
  7 => 13,
  8 => 21,
  9 => 34,
  10 => 55,
  11 => 89,
  12 => 144,
  13 => 233,
  14 => 377,
  ...
  150 => 9969216677189303386214405760200
}

That cache retrieves inputs up to 150 for fib and speeds things up a lot! You can think of this cache as a table of inputs and outputs.

Utils.table(:fib_cache)

However, be aware that caching should be well considered. The cache permanently takes up memory in your program, so there’s a considerable memory cost. Often it’s best to consider how you can improve the performance of your application rather than immediately reaching for a cache.

For example, a slow Fibonacci function with a cache is no replacement for a fast Fibonacci function!

There are many ways to implement a cache. Using tools already covered in this course, you could use a simple map with keys for the input and values for the output. Alternatively, you could use a GenServer with changing state.

In this lesson, we’ll also cover how you can use Agent and :ets tables to build in-memory caches.

Your Turn

Create a Cache GenServer that can hold some cached state. We should be able to set a value and retrieve a value like so.

cache = GenServer.start_link(Cache, %{})
GenServer.call(cache, {:set, {:key1, "value"}})

GenServer.call(cache, {:get, :key1})
{:ok, "value"}
defmodule Cache do
  use GenServer
end

Agent

We don’t necessarily need a GenServer when creating a simple cache.

For a more lightweight solution, we can use an Agent. An Agent is a simple wrapper around state.

We can start an Agent process with start_link/2.

{:ok, agent} = Agent.start_link(fn -> "initial state" end)

We retrieve the state of an Agent process with get/2.

Agent.get(agent, fn state -> state end)

We can update the state of an Agent process with update/2. After update/2 is called, get/2 will return the newly updated state.

:ok = Agent.update(agent, fn _state -> "new state" end)

Agent.get(agent, fn state -> state end)

Notice that state can be any elixir term. Above we’ve made it a string, but it could be a map.

:ok = Agent.update(agent, fn _state -> %{new: "state"} end)

Agent.get(agent, fn state -> state end)

While nothing prevents you from doing so, it’s generally not advised to change the data structure of state. Changing the structure of state makes it difficult to reason about your code, and could result in a crash or unwanted behavior.

To improve the consistency of your program, you might consider using a struct or otherwise enforcing the shape of your state.

Named Agents

We can name an agent process so that any other process can access it without needing to know it’s PID.

Agent.start_link(fn -> "initial state" end, name: :my_agent)

Agent.get(:my_agent, fn state -> state end)

You can store state in an Agent process and access it from any module. This is useful for sharing data across multiple processes.

flowchart
  A[Process]
  B[Process]
  C[Process]
  State[Agent Shared State]

  A --> State
  B --> State
  C --> State

Agent Module

When working with Agents, it’s common to put functionality into a single module. For example, we’ll create a simple Cache module.

defmodule Cache do
  use Agent

  def start_link(initial_state) do
    Agent.start_link(fn -> initial_state end, name: __MODULE__)
  end

  def get(key) do
    Agent.get(__MODULE__, fn state -> Map.get(state, key) end)
  end

  def set(key, value) do
    Agent.update(__MODULE__, fn state -> Map.put(state, key, value) end)
  end
end

Now we can use the Cache module directly.

Cache.start_link(%{})

Because we’ve used __MODULE__ for the name of the Agent process, any process can access the Cache‘s state using the Cache module.

Cache.set(:key, "value")

Cache.get(:key)

Your Turn

In the Elixir cell below, create a Counter agent, It should store an integer in state, which you can get/1 and increment/1 like so:

{:ok, counter} = Counter.start_link(0)

0 = Counter.get(counter)
Counter.increment(counter)
1 = Counter.get(counter)
defmodule Counter do
  use Agent
end

Further Reading

For more on Agents, consider reading:

Erlang Term Storage (ETS)

We can use ETS tables for a convenient in-memory cache. :ets is a library provided by Erlang through the :ets module.

While an Agent is a simple solution for an in-memory cache best for small loads or non-concurrent operations, an :ets table generally performs better and can support concurrent read and write operations.

Generally, We can use :ets for a more performant in-memory key-value store.

:ets tables are much like a GenServer except designed explicitly for in-memory key-value storage.

flowchart
subgraph ETS
  Process[Process] --> State
  State --> KV[Key Value Storage]
end

Livebook allows us to view the :ets process as a table. Let’s start a new :ets table called :example_table.

table = :ets.new(:example_table, [])

We can then insert values into the table with :ets.insert/2. We insert values as {key, value} tuples.

The key and the value can be nearly any value, but we’ll often use an atom for the key.

Your Turn

Try changing the :key and value to see that the :ets table will store any key-value pair. However, the :ets table will only store one value for a given key.

:ets.insert(table, {:key, "value"})

Named Tables

:ets tables can be created with a :named_table option. This allows us to create the table and refer to them without the variable bound to their pid.

:ets.new(:my_table, [:named_table])

Now we can insert values to the table using the atom name :my_table instead of the PID.

:ets.insert(:my_table, {:key, "value"})

The same goes for looking up values.

:ets.lookup(:my_table, :key)

Your Turn

In the Elixir cell below, create a :super_heros :ets table. You should be able to insert super hero’s and their secret identities.

:ets.insert(:super_heros, {"Spider Man", "Peter Parker"})

ETS Configuration

:ets Tables are configured with a Table Type and an Access Control.

Table Types

  • :set (default). One value per unique key.
  • :ordered_set One value per unique key, ordered by Elixir terms.
  • :bag Many values per key, but only one instance of each value per key.
  • :duplicate_bag Many values per key with duplicates allowed.

Access Control

  • :protected (default) Read from all process. Write allowed only for the parent process.
  • :public Read/Write available from all processes.
  • :private Read/Write allowed only for the parent process.

By default, :ets tables use the :set and :protected configuration values. So we may include or exclude them when starting an :ets table, and it does not have any effect.

default_table = :ets.new(:default_table, [:set, :protected])
:ets.insert(default_table, {:key, "value"})

# We return the default table to display the table in livebook.
default_table

With :protected access control, the :ets raises an ArgumentError if another process attempts to write to it.

Uncomment and execute this code. Notice that the child process crashes.

# Task.start(fn ->
#   :ets.insert(default_table, {:new_key, "new value"})
# end)

However, reading from other processes is allowed.

Task.start(fn ->
  :ets.lookup(default_table, :key) |> IO.inspect(label: "lookup result")
end)

:public

A public table can be read and written from any process, so the following no longer crashes.

public = :ets.new(:public_example, [:public])

Task.start(fn ->
  :ets.insert(public, {:key, "value"})
end)

# We return the public table to display it in livebook.
public

:bag

A :bag table type allows for multiple keys but not the same value under the same key.

bag = :ets.new(:bag_example, [:bag])

:ets.insert(bag, {:key, "duplicate value"})
:ets.insert(bag, {:key, "duplicate value"})
:ets.insert(bag, {:key, "non-duplicate value"})

# We return the bag table to display it in livebook.
bag

:duplicate_bag

:duplicate_bag allows for duplicate keys with the same value.

bag = :ets.new(:bag_example, [:duplicate_bag])

:ets.insert(bag, {:key, "duplicate value"})
:ets.insert(bag, {:key, "duplicate value"})
:ets.insert(bag, {:key, "non-duplicate value"})

# We return the bag table to display it in livebook.
bag

Your Turn

In the Elixir cell below, use the :ordered_set and :private configuration values to make an ordered and private :ets table.

Demonstrate that the table will not allow read/write operations to the :ets table from another process.

Race Conditions

Concurrency is a powerful tool, but it introduces a new class of bugs. For example, race conditions are a common problem.

A race condition occurs when events of operations occur in an unexpected order.

Below we have an :ets table, which stores a count. Two tasks read the current count and then increment it. We use Process.sleep(100) to simulate a time-consuming computation, then insert a new count.

If these tasks were synchronous, we would expect the count to increment twice and return two. However, when both of these operations are concurrent, both tasks read the count when it’s zero, then increment the count to one. They then save the resulting count in the :ets table. As a result, the count is one rather than the expected value of two. This is a classic issue you’ll often run into when working with concurrency.

sequenceDiagram
ETS Table->>ETS Table: sets count = 0
    ETS Table->>Task 1: reads count (0)
    ETS Table->>Task 2: reads count (0)
    Task 1->>ETS Table: sets count = 1
    Task 2->>ETS Table: sets count = 1
activate Child Process
Child Process-->>Parent Process: pid
deactivate Child Process
table = :ets.new(:concurrent, [:public])

:ets.insert(table, {:count, 0})

task1 =
  Task.async(fn ->
    [count: count] = :ets.lookup(table, :count)
    Process.sleep(100)
    :ets.insert(table, {:count, count + 1})
  end)

task2 =
  Task.async(fn ->
    [count: count] = :ets.lookup(table, :count)
    Process.sleep(100)
    :ets.insert(table, {:count, count + 1})
  end)

Task.await(task1)
Task.await(task2)

table

For an overview of race conditions and how large systems deal with many concurrent operations, there is an excellent video by Tom Scott.

Kino.YouTube.new("https://www.youtube.com/watch?v=RY_2gElt3SA")

Further Reading

For more information, consider reading:

Commit Your Progress

Run the following in your command line from the project folder to track and save your progress in a Git commit.

$ git add .
$ git commit -m "finish caching agent ets section"