Caching: ETS and Agent
Mix.install([
{:kino, github: "livebook-dev/kino", override: true},
{:kino_lab, "~> 0.1.0-dev", github: "jonatanklosko/kino_lab"},
{:vega_lite, "~> 0.1.4"},
{:kino_vega_lite, "~> 0.1.1"},
{:benchee, "~> 0.1"},
{:ecto, "~> 3.7"},
{:math, "~> 0.7.0"},
{:faker, "~> 0.17.0"},
{:utils, path: "#{__DIR__}/../utils"}
])
Navigation
Setup
Ensure you type the ea
keyboard shortcut to evaluate all Elixir cells before starting. Alternatively you can evaluate the Elixir cells as you read.
Cache
A cache allows us to store data to avoid computing values.
We commonly use caches to avoid performance-demanding functions or re-retrieving external resources.
flowchart LR
input --> e[expensive computation] --> output
A cache can store the expected output for a given input to a function for quicker access.
flowchart
input --> cache --cached value--> output
cache --no cached value --> e[expensive computation] --> output
Some caches are smart and save newly computed values in the cache to avoid recomputing the same value.
Other caches may be static or infrequently changing data that occasionally updates in the background.
For example, if we had a slow fibonacci $fib(n) = fib(n - 1) + fib(n-2)$ function, we could speed it up by saving computed values in a cache.
defmodule Fib do
def of(0), do: 0
def of(1), do: 1
def of(n) do
of(n - 1) + of(n - 2)
end
end
cache = %{
0 => 0,
1 => 1,
2 => 1,
3 => 2,
4 => 3,
5 => 5,
6 => 8,
7 => 13,
8 => 21,
9 => 34,
10 => 55,
11 => 89,
12 => 144,
13 => 233,
14 => 377,
...
150 => 9969216677189303386214405760200
}
That cache retrieves inputs up to 150
for fib and speeds things up a lot!
You can think of this cache as a table of inputs and outputs.
Utils.table(:fib_cache)
However, be aware that caching should be well considered. The cache permanently takes up memory in your program, so there’s a considerable memory cost. Often it’s best to consider how you can improve the performance of your application rather than immediately reaching for a cache.
For example, a slow Fibonacci function with a cache is no replacement for a fast Fibonacci function!
There are many ways to implement a cache. Using tools already covered in this course, you could use a simple map with keys for the input and values for the output. Alternatively, you could
use a GenServer
with changing state.
In this lesson, we’ll also cover how you can use Agent
and :ets
tables
to build in-memory caches.
Your Turn
Create a Cache
GenServer
that
can hold some cached state. We should be able to set a value
and retrieve a value like so.
cache = GenServer.start_link(Cache, %{})
GenServer.call(cache, {:set, {:key1, "value"}})
GenServer.call(cache, {:get, :key1})
{:ok, "value"}
defmodule Cache do
use GenServer
end
Agent
We don’t necessarily need a GenServer
when creating a simple cache.
For a more lightweight solution, we can use an Agent
.
An Agent
is a simple wrapper around state.
We can start an Agent
process with start_link/2
.
{:ok, agent} = Agent.start_link(fn -> "initial state" end)
We retrieve the state of an Agent
process with get/2
.
Agent.get(agent, fn state -> state end)
We can update the state of an Agent
process with update/2
. After update/2
is called, get/2
will return the newly updated state.
:ok = Agent.update(agent, fn _state -> "new state" end)
Agent.get(agent, fn state -> state end)
Notice that state can be any elixir term. Above we’ve made it a string, but it could be a map.
:ok = Agent.update(agent, fn _state -> %{new: "state"} end)
Agent.get(agent, fn state -> state end)
While nothing prevents you from doing so, it’s generally not advised to change the data structure of state. Changing the structure of state makes it difficult to reason about your code, and could result in a crash or unwanted behavior.
To improve the consistency of your program, you might consider using a struct or otherwise enforcing the shape of your state.
Named Agents
We can name an agent process so that any other process can access it without needing to know it’s PID.
Agent.start_link(fn -> "initial state" end, name: :my_agent)
Agent.get(:my_agent, fn state -> state end)
You can store state in an Agent
process and access it from any module. This is useful for sharing data across multiple processes.
flowchart
A[Process]
B[Process]
C[Process]
State[Agent Shared State]
A --> State
B --> State
C --> State
Agent Module
When working with Agents
, it’s common to put functionality into a single module. For example, we’ll create a
simple Cache
module.
defmodule Cache do
use Agent
def start_link(initial_state) do
Agent.start_link(fn -> initial_state end, name: __MODULE__)
end
def get(key) do
Agent.get(__MODULE__, fn state -> Map.get(state, key) end)
end
def set(key, value) do
Agent.update(__MODULE__, fn state -> Map.put(state, key, value) end)
end
end
Now we can use the Cache
module directly.
Cache.start_link(%{})
Because we’ve used __MODULE__
for the name of
the Agent
process, any process can access the Cache
‘s state using the Cache
module.
Cache.set(:key, "value")
Cache.get(:key)
Your Turn
In the Elixir cell below, create a Counter
agent, It should store an integer in state, which you
can get/1
and increment/1
like so:
{:ok, counter} = Counter.start_link(0)
0 = Counter.get(counter)
Counter.increment(counter)
1 = Counter.get(counter)
defmodule Counter do
use Agent
end
Further Reading
For more on Agents, consider reading:
- The Elixir Lang Guide
- The Agent HexDocs
Erlang Term Storage (ETS)
We can use ETS
tables for a convenient in-memory cache. :ets
is a library provided by Erlang
through the :ets
module.
While an Agent
is a simple solution for an in-memory cache best for small loads or non-concurrent operations, an :ets
table generally
performs better and can support concurrent read and write operations.
Generally, We can use :ets
for a more performant in-memory key-value store.
:ets
tables are much like a GenServer
except designed explicitly for in-memory key-value storage.
flowchart
subgraph ETS
Process[Process] --> State
State --> KV[Key Value Storage]
end
Livebook allows us to view the :ets
process as a table. Let’s start a new :ets
table called
:example_table
.
table = :ets.new(:example_table, [])
We can then insert values into the table with :ets.insert/2
. We insert
values as {key, value}
tuples.
The key and the value can be nearly any value, but we’ll often use an atom for the key.
Your Turn
Try changing the :key
and value
to see that the :ets
table will store any key-value pair.
However, the :ets
table will only store one value for a given key.
:ets.insert(table, {:key, "value"})
Named Tables
:ets
tables can be created with a :named_table
option. This allows us to create
the table and refer to them without the variable bound to their pid.
:ets.new(:my_table, [:named_table])
Now we can insert values to the table using the atom name
:my_table
instead of the PID.
:ets.insert(:my_table, {:key, "value"})
The same goes for looking up values.
:ets.lookup(:my_table, :key)
Your Turn
In the Elixir cell below, create a :super_heros
:ets
table.
You should be able to insert super hero’s and their
secret identities.
:ets.insert(:super_heros, {"Spider Man", "Peter Parker"})
ETS Configuration
:ets
Tables are configured with a Table Type and an Access Control.
Table Types
-
:set
(default). One value per unique key. -
:ordered_set
One value per unique key, ordered by Elixir terms. -
:bag
Many values per key, but only one instance of each value per key. -
:duplicate_bag
Many values per key with duplicates allowed.
Access Control
-
:protected
(default) Read from all process. Write allowed only for the parent process. -
:public
Read/Write available from all processes. -
:private
Read/Write allowed only for the parent process.
By default, :ets
tables use the :set
and :protected
configuration values. So we may include or exclude them when starting an :ets
table, and it does not have any effect.
default_table = :ets.new(:default_table, [:set, :protected])
:ets.insert(default_table, {:key, "value"})
# We return the default table to display the table in livebook.
default_table
With :protected
access control, the :ets
raises an ArgumentError
if another process attempts to write to it.
Uncomment and execute this code. Notice that the child process crashes.
# Task.start(fn ->
# :ets.insert(default_table, {:new_key, "new value"})
# end)
However, reading from other processes is allowed.
Task.start(fn ->
:ets.lookup(default_table, :key) |> IO.inspect(label: "lookup result")
end)
:public
A public table can be read and written from any process, so the following no longer crashes.
public = :ets.new(:public_example, [:public])
Task.start(fn ->
:ets.insert(public, {:key, "value"})
end)
# We return the public table to display it in livebook.
public
:bag
A :bag
table type allows for multiple keys but not the same value under the same key.
bag = :ets.new(:bag_example, [:bag])
:ets.insert(bag, {:key, "duplicate value"})
:ets.insert(bag, {:key, "duplicate value"})
:ets.insert(bag, {:key, "non-duplicate value"})
# We return the bag table to display it in livebook.
bag
:duplicate_bag
:duplicate_bag
allows for duplicate keys with the same value.
bag = :ets.new(:bag_example, [:duplicate_bag])
:ets.insert(bag, {:key, "duplicate value"})
:ets.insert(bag, {:key, "duplicate value"})
:ets.insert(bag, {:key, "non-duplicate value"})
# We return the bag table to display it in livebook.
bag
Your Turn
In the Elixir cell below, use the :ordered_set
and :private
configuration values to make
an ordered and private :ets
table.
Demonstrate that the table will not allow read/write operations to the :ets
table
from another process.
Race Conditions
Concurrency is a powerful tool, but it introduces a new class of bugs. For example, race conditions are a common problem.
A race condition occurs when events of operations occur in an unexpected order.
Below we have an :ets
table, which stores a count. Two tasks read the current count and then increment it. We use Process.sleep(100)
to simulate a time-consuming computation, then insert a new count.
If these tasks were synchronous, we would expect the count to increment twice and return two. However, when both of these operations are concurrent, both tasks read the count when it’s zero, then increment the count to one. They then save the resulting count in the :ets
table. As a result, the count is one rather than the expected value of two. This is a classic issue you’ll often run into when working with concurrency.
sequenceDiagram
ETS Table->>ETS Table: sets count = 0
ETS Table->>Task 1: reads count (0)
ETS Table->>Task 2: reads count (0)
Task 1->>ETS Table: sets count = 1
Task 2->>ETS Table: sets count = 1
activate Child Process
Child Process-->>Parent Process: pid
deactivate Child Process
table = :ets.new(:concurrent, [:public])
:ets.insert(table, {:count, 0})
task1 =
Task.async(fn ->
[count: count] = :ets.lookup(table, :count)
Process.sleep(100)
:ets.insert(table, {:count, count + 1})
end)
task2 =
Task.async(fn ->
[count: count] = :ets.lookup(table, :count)
Process.sleep(100)
:ets.insert(table, {:count, count + 1})
end)
Task.await(task1)
Task.await(task2)
table
For an overview of race conditions and how large systems deal with many concurrent operations, there is an excellent video by Tom Scott.
Kino.YouTube.new("https://www.youtube.com/watch?v=RY_2gElt3SA")
Further Reading
For more information, consider reading:
- The ETS lesson by Elixir Schools.
- The ETS Elixir Lang Guide.
Commit Your Progress
Run the following in your command line from the project folder to track and save your progress in a Git commit.
$ git add .
$ git commit -m "finish caching agent ets section"