Tutorial - Elixir
Mix.install([
{:erlfdb, "~> 0.3"}
])
Class Scheduling
This tutorial provides a walkthrough of designing and building a simple application in Elixir using FoundationDB. In this tutorial, we use a few simple data modeling techniques. For a more in-depth discussion of data modeling in FoundationDB, see Data Modeling.
The tutorial is a copy of the official Class Scheduling Tutorial tailored for Elixir.
> erlfdb requires FoundationDB to be installed on your system. If you received an error on the Mix.install setup, please make sure you have both foundationdb-server
and foundationdb-clients
packages installed on your system. Also, ensure that your Livebook PATH
environment variable includes the directory containing the fdbcli
executable.
The concepts in this tutorial are applicable to all the languages officially supported by FoundationDB and for Erlang and Elixir via :erlfdb
. If you prefer, you can see a version of this tutorial in:
First Steps
Let’s begin with “Hello world.”
If you have not yet installed FoundationDB, see Getting Started on macOS or Getting Started on Linux.
You can execute the commands in this tutorial within iex
or from Livebook.
The Python tutorial indicates that you must select the API version. Because the erlfdb library is compiled on your system, the library automatically detects the correct API version and applies it behind the scenes, so you don’t have to.
First, we open a FoundationDB database. :erlfdb_sandbox
is provided to start a single fdbserver process with data stored in a temporary directory.
db = :erlfdb_sandbox.open()
If you wish to connect to a real database, you must use :erlfdb.open/0
to connect to the default cluster file or :erlfdb.open/1
.
# Connect to the default cluster file
db = :erlfdb.open()
# Connect to a database specified in the file
db = :erlfdb.open("/etc/foundationdb/fdb.cluster")
We are ready to use the database. First, let’s simply write a key-value pair:
:erlfdb.set(db, "hello", "world")
When this command returns without exception, the modification is durably stored in FoundationDB! Under the covers, this function creates a transation with a single modification. We’ll see later how to do multiple operations in a single transaction. For now, let’s read back the data:
IO.puts("hello " <> :erlfdb.get(db, "hello"))
If this is all working, it looks like we are ready to start building a real application. For reference, here’s the full code for “hello world”:
# In mix.exs deps, add: `{:erlfdb, "~> 0.3"}`
db = :erlfdb.open()
:erlfdb.set(db, "hello", "world")
IO.puts("hello " <> :erlfdb.get(db, "hello"))
Class scheduling application
Let’s say we’ve been asked to build a class scheduling system for students and administrators. We’ll walk through the design and implementation of this application. Instead of typing everything in as you follow along, consider executing this file in Livebook!
Requirements
We’ll need to let users list available classes and track which students have signed up for which classes. Here’s a first cut at the functions we’ll need to implement:
available_classes() # returns list of classes
signup(student_id, class) # signs up a student for a class
drop(student_id, class) # drops a student from a class
Data model
First, we need to design a data model. A data model is just a method for storing our application data using keys and values in FoundationDB. We seem to have two main types of data: (1) a list of classes and (2) a record of which students will attend which classes. Let’s keep attending data like this:
{"attends", student, class} = ""
We’ll just store the key with a blank value to indicate that a student is signed up for a particular class. For this application, we’re going to think about a key-value pair’s key as a tuple. Encoding a tuple of data elements into a key is a very common pattern for an ordered key-value store.
We’ll keep data about classes like this:
{"class", class} = seats_available
Similarly, each such key will represent an available class. We’ll use seats_available
to record the number of seats available.
Directories and Subspaces
FoundationDB includes a few tools that make it easy to model data using this approach. Let’s begin by opening a directory in the database:
root = :erlfdb_directory.root()
scheduling = :erlfdb_directory.create_or_open(db, root, ["scheduling"])
The create_or_open/3
function returns a map that includes a subspace where we’ll store our application data. Each subspace has a fixed prefix it uses when defining keys. The prefix corresponds to the first element of a tuple. We decided that we wanted "attends"
and "class"
as our prefixes, so we’ll create new subspaces for them within the scheduling subspace:
course =
scheduling
|> :erlfdb_directory.get_subspace()
|> :erlfdb_subspace.create({"class"})
attends =
scheduling
|> :erlfdb_directory.get_subspace()
|> :erlfdb_subspace.create({"attends"})
Subspaces have a pack/2
function for defining keys. To store the records for our data model, we can use :erlfdb_subspace.pack(attends, {student, class})
and :erlfdb_subspace.pack(course, {class})
.
Transactions
We’re going to rely on the powerful guarantees of transactions to help keep all of our modifications straight, so let’s look at how erlfdb lets you write a transactional function. Let’s write the very simple add_class
function we will use to populate the database’s class list:
defmodule Scheduling1 do
@total_seats_available 100
def add_class(db_or_tr, course, class) do
:erlfdb.transactional(db_or_tr, fn tr ->
key = :erlfdb_subspace.pack(course, {class})
value = :erlfdb_tuple.pack({@total_seats_available})
:erlfdb.set(tr, key, value)
end)
end
end
In the add_class/3
function, the argument tr
represents the transactional context that the function executes within. Nearly all :erlfdb
functions accept this as the first argument.
For a FoundationDB database db
:
Scheduling1.add_class(db, course, "class1")
is equivalent to something like:
defmodule ManualAddClassTransaction do
def commit(db, course, class) do
tr = :erlfdb.create_transaction(db)
commit(tr, course, class)
end
def commit(tr, course, class) do
try do
Scheduling.add_class(tr, course, class)
tr
|> :erlfdb.commit()
|> :erlfdb.wait()
catch
:error, {:erlfdb_error, code} ->
tr
|> :erlfdb.on_error(code)
|> :erlfdb.wait()
commit(tr, course, class)
end
end
end
If instead you pass a transaction for the db_or_tx
argument, the transaction will be used directly, and it is assumed that the caller implements appropriate retry logic for errors. This permits transactionally decorated functions to be composed into larger transactions.
For example:
:erlfdb.transactional(db, fn tr ->
Scheduling.add_class(tr, course, "class1")
# ... and more
end)
Note that by default, the operation will be retried an infinite number of times and the transaction will never time out. It is therefore recommended that the client choose a default timeout value that is suitable for their application. This can be set either at the transaction level using the timeout
transaction option or at the database level with the timeout
database option. For example, one can set a one minute timeout on each transaction by calling:
# 60,000 ms = 1 minute
:erlfdb.set_option(db, :timeout, <<60000::little-integer-size(64)>>)
Making some sample classes
Let’s make some sample classes and put them in the class_names
variable.
# Generate 1,620 classes like '9:00 chem for dummies'
levels = ["intro", "for dummies", "remedial", "101", "201", "301", "mastery", "lab", "seminar"]
types = ["chem", "bio", "cs", "geometry", "calc", "alg", "film", "music", "art", "dance"]
times = for h <- 2..20, do: "#{h}:00"
class_names = for i <- times, t <- types, l <- levels, do: "#{i} #{t} #{l}"
Initializing the database
We initialize the database with our class list:
defmodule Scheduling2 do
@total_seats_available 100
def init(db_or_tr, scheduling, course, class_names) do
{range_start, range_end} = :erlfdb_directory.range(scheduling) |> dbg()
:erlfdb.transactional(db_or_tr, fn tr ->
# Clear the directory
:erlfdb.clear_range(tr, range_start, range_end)
for class_name <- class_names,
do: add_class(tr, course, class_name)
end)
end
# Same as Scheduling1
def add_class(db_or_tr, course, class) do
:erlfdb.transactional(db_or_tr, fn tr ->
key = :erlfdb_subspace.pack(course, {class})
value = :erlfdb_tuple.pack({@total_seats_available})
:erlfdb.set(tr, key, value)
end)
end
end
Scheduling2.init(db, scheduling, course, class_names)
How FoundationDB Actually Stores Your Data
You may be wondering what these keys and values actually look like. Let’s take a brief moment to inspect the encodings of the keys. To use :erlfdb
you are not required to know these encodings, so feel free to come back to this section later on if you’d like.
Internally, the data stored from the code above would look like:
\x15)\x01class\x00\x019:00 music seminar\x00
\xfe\x01\xfe\x00\x14\x02scheduling\x00
When we created our directory with :erlfdb_directory.create_or_open(db, root, ["scheduling"])
, FoundationDB assigned the compact two-byte identifier \x15)
(or <<21, 41>>
in Elixir binary syntax) to represent the scheduling directory in all subsequent entries. You can see exactly what prefix your directory received from the dbg()
output in our init function - it might show something like {<<21, 41, 0>>, <<21, 41, 255>>}
. This <<21, 41>>
(which is \x15)
in hex notation) is the prefix allocated for your scheduling directory.
This compaction makes a significant difference at scale. Storing the word “scheduling” in a million entries would use 10 million bytes just on that prefix alone, compared with 2 million bytes with a 2-byte prefix.
The next segment, \x01class\x00\x01
, identifies the “class” subspace. The \x00
and \x01
bytes are chosen by the FDB Subspace Layer to ensure our keys sort correctly and can’t accidentally overlap with each other. Having all class data share the same prefix also keeps it physically grouped together on disk, which is why the range reads in available_classes()
are so efficient.
The second key (starting with \xfe)
is FoundationDB’s internal metadata that maps the compact prefix \x15)
(or <<21, 41>>
) back to the human-readable name “scheduling”.
This is how FoundationDB translates between what we write in our code and what it stores on disk.
Listing available classes
Before students can do anything else, they need to be able to retrieve a list of available classes from the database. Because FoundationDB sorts its data by key and therefore has efficient range-read capability, we can retrieve all of the classes in a single database call. We find this range of keys with :erlfdb_subspace.range(course)
:
defmodule Scheduling3 do
# ** New **
def available_classes(db_or_tr, course) do
{range_start, range_end} = :erlfdb_subspace.range(course)
:erlfdb.transactional(db_or_tr, fn tr ->
tr
|> :erlfdb.get_range(range_start, range_end)
|> Stream.map(fn {encoded_class, _v} ->
{class} = :erlfdb_subspace.unpack(course, encoded_class)
class
end)
|> Enum.to_list()
end)
end
def init(_db_or_tr, _scheduling, _course, _class_names) do
# Omitted for this step
end
def add_class(_db_or_tr, _course, _c) do
# Omitted for this step
end
end
Scheduling3.available_classes(db, course)
The :erlfdb_subspace.range/1
function returns a 2-tuple containing binaries that represent the start and end (exclusive) of the subspace key range:
:erlfdb_subspace.range(course) #=> {<<21, 41, 1, 99, 108, 97, 115, 115, 0, 0>>, <<21, 41, 1, 99, 108, 97, 115, 115, 0, 255>>}
We retrieve all key-value pairs and unpack the key to extract the class name. The technique used here will hold all key-value pairs in system memory. :erlfdb.fold_range/5
can be used for a memory-safe reduce operation.
> Note: :erlfdb.get_range/4
with the option wait: false
can be used to control at what point in your transaction the wait occurs. The default is wait: true
.
>
> elixir > f = :erlfdb.get_range(tr, s, e, wait: false) > # ... other code ... > kvs = :erlfdb.wait(f) >
>
> Deciding when to wait is an important part of designing your FDB transactions and keeping good performance characteristics. :erlfdb
provides other wait-related functions that are worth exploring.
Signing up for a class and Dropping a class
We finally get to the crucial function. A student has decided on a class (by name) and wants to sign up. The signup
function will take a student
and a class
:
defmodule Scheduling4 do
# ** New **
def signup(db_or_tr, attends, student, class) do
:erlfdb.transactional(db_or_tr, fn tr ->
rec = :erlfdb_subspace.pack(attends, {student, class})
:erlfdb.set(tr, rec, "")
end)
end
# ** New **
def drop(db_or_tr, attends, student, class) do
:erlfdb.transactional(db_or_tr, fn tr ->
rec = :erlfdb_subspace.pack(attends, {student, class})
:erlfdb.clear(tr, rec)
end)
end
def available_classes(_db_or_tr, _course) do
# Omitted for this step
end
def init(_db_or_tr, _scheduling, _course, _class_names) do
# Omitted for this step
end
def add_class(_db_or_tr, _course, _c) do
# Omitted for this step
end
end
For signup
, we simply insert the appropriate record (with a blank value).
For drop
, we need to be able to delete a record from the database. We do this with the clear/2
function.
Scheduling4.signup(db, attends, "Alice", "10:00 bio 301")
Scheduling4.drop(db, attends, "Alice", "10:00 bio 301")
Done?
We report back to the project leader that our application is done—students can sign up for, drop, and list classes. Unfortunately, we learn that a new problem has been discovered: popular classes are being over-subscribed. Our application now needs to enforce the class size constraint as students add and drop classes.
Seats are limited!
Let’s go back to the data model. Remember that we stored the number of seats in the class in the value of the key-value entry in the class list. Let’s refine that a bit to track the remaining number of seats in the class. The initialization can work the same way. (In our example, all classes initially have 100 seats), but the available_classes
, signup
, and drop
functions are going to have to change. We’re going to define the whole final module. New sections of code are marked with comments. Below the module definition, we’ll have some additional discussion about the changes.
defmodule Scheduling do
@total_seats_available 100
def signup(db_or_tr, attends, course, student, class) do
:erlfdb.transactional(db_or_tr, fn tr ->
rec = :erlfdb_subspace.pack(attends, {student, class})
# ** New **
with :not_found <- :erlfdb.wait(:erlfdb.get(tr, rec)) do
{seats_left} =
tr
|> :erlfdb.get(:erlfdb_subspace.pack(course, {class}))
|> :erlfdb.wait()
|> :erlfdb_tuple.unpack()
if seats_left == 0, do: raise("No remaining seats")
:erlfdb.set(
tr,
:erlfdb_subspace.pack(course, {class}),
:erlfdb_tuple.pack({seats_left - 1})
)
:erlfdb.set(tr, rec, "")
else
_ ->
# already signed up
:ok
end
end)
end
def drop(db_or_tr, attends, course, student, class) do
:erlfdb.transactional(db_or_tr, fn tr ->
rec = :erlfdb_subspace.pack(attends, {student, class})
# ** New **
with :not_found <- :erlfdb.wait(:erlfdb.get(tr, rec)) do
# not taking this class
:ok
else
_ ->
{seats_left} = :erlfdb_tuple.unpack(:erlfdb_subspace.pack(course, {class}))
:erlfdb.set(
tr,
:erlfdb_subspace.pack(course, {class}),
:erlfdb_tuple.pack({seats_left + 1})
)
:erlfdb.clear(tr, rec)
end
end)
end
def available_classes(db_or_tr, course) do
{range_start, range_end} = :erlfdb_subspace.range(course)
:erlfdb.transactional(db_or_tr, fn tr ->
# ** New **
tr
|> :erlfdb.get_range(range_start, range_end)
|> Stream.map(fn {packed_class, packed_seats} ->
{class} = :erlfdb_subspace.unpack(course, packed_class)
{availability} = :erlfdb_tuple.unpack(packed_seats)
{class, availability}
end)
|> Stream.filter(fn {_class, availability} -> availability > 0 end)
|> Stream.map(fn {class, _availability} -> class end)
|> Enum.to_list()
end)
end
def init(db_or_tr, scheduling, course, class_names) do
{range_start, range_end} = :erlfdb_directory.range(scheduling)
:erlfdb.transactional(db_or_tr, fn tr ->
# Clear the directory
:erlfdb.clear_range(tr, range_start, range_end)
for class_name <- class_names,
do: add_class(tr, course, class_name)
end)
end
def add_class(db_or_tr, course, class) do
:erlfdb.transactional(db_or_tr, fn tr ->
key = :erlfdb_subspace.pack(course, {class})
value = :erlfdb_tuple.pack({@total_seats_available})
:erlfdb.set(tr, key, value)
end)
end
end
-
available_classes
: This is easy – we simply add a condition to check that the value is non-zero. -
signup
: We now have to check that we aren’t already signed up, since we don’t want a double sign up to decrease the number of seats twice. Then we look up how many seats are left to make sure there is a seat remaining so we don’t push the counter into the negative. If there is a seat remaining, we decrement the counter. -
drop
: Once again we check to see if the student is signed up and if not, we can just return as we don’t want to incorrectly increase the number of seats. We then adjust the number of seats by one by taking the current value, incrementing it by one, and then storing back. -
Also notice that
signup
anddrop
now require both theattends
andcourse
subspaces, whereas previously they only requiredattends
.
Let’s try it out. We choose a course for 100 students named “Bob”, and sign them all up. Then, when Charlie attemps to signup, we get the exception.
popular_class =
Scheduling.available_classes(db, course)
|> hd()
the_bobs = for i <- 1..100, do: "Bob #{i}"
Enum.each(
the_bobs,
&Scheduling.signup(db, attends, course, &1, popular_class)
)
# RuntimeError: "No remaining seats" is expected
Scheduling.signup(db, attends, course, "Charlie", popular_class)
Concurrency and consistency
The signup
function is starting to get a bit complex; it now reads and writes a few different key-value pairs in the database. One of the tricky issues in this situation is what happens as multiple clients/students read and modify the database at the same time. Couldn’t two students both see one remaining seat and sign up at the same time?
These are tricky issues without simple answers—unless you have transactions! Because these functions are defined as FoundationDB transactions, we can have a simple answer: Each transactional function behaves as if it is the only one modifying the database. There is no way for a transaction to ‘see’ another transaction change the database, and each transaction ensures that either all of its modifications occur or none of them do.
Looking deeper, it is, of course, possible for two transactions to conflict. For example, if two people both see a class with one seat and sign up at the same time, FoundationDB must allow only one to succeed. This causes one of the transactions to fail to commit (which can also be caused by network outages, crashes, etc.). To ensure correct operation, applications need to handle this situation, usually via retrying the transaction. In this case, the conflicting transaction will be retried automatically by the :erlfdb.transactional/2
function and will eventually lead to the correct result, a "No remaining seats"
exception.
Let’s try it out with some aggressive concurrency:
next_popular_class =
Scheduling.available_classes(db, course)
|> hd()
the_dans = for i <- 1..100, do: "Dan #{i}"
the_dans
|> Task.async_stream(&Scheduling.signup(db, attends, course, &1, next_popular_class))
|> Stream.run()
# RuntimeError: "No remaining seats" is expected
Scheduling.signup(db, attends, course, "Charlie", next_popular_class)
As expected, the next_popular_class
is no longer returned by available_classes
.
Scheduling.available_classes(db, course)
|> hd()
Idempotence
Occasionally, a transaction might be retried even after it succeeds (for example, if the client loses contact with the cluster at just the wrong moment). This can cause problems if transactions are not written to be idempotent, i.e. to have the same effect if committed twice as if committed once. There are generic design patterns for making any transaction idempotent, but many transactions are naturally idempotent. For example, all of the transactions in this tutorial are idempotent.
More features?!
Of course, as soon as our new version of the system goes live, we hear of a trick that certain students are using. They are signing up for all classes immediately, and only later dropping those that they don’t want to take. This has led to an unusable system, and we have been asked to fix it. We decide to limit students to five classes:
def signup(db_or_tr, attends, course, student, class) do
# ... snipped ...
if seats_left == 0, do:
raise "No remaining seats"
{sk, ek} = :erlfdb_subspace.range(attends, {student})
if length(:erlfdb.get_range(tr, sk, ek)) == 5, do:
raise "Too many classes"
# ... snipped ...
end
end)
end
Fortunately, we decided on a data model that keeps all of the attending records for a single student together. With this approach, we can use a single range read in the attends
subspace to retrieve all the classes that a student is signed up for. We simply throw an exception if the number of classes has reached the limit of five.
Feel free to add this new logic to the Scheduling
module above.
Composing transactions
Oh, just one last feature, we’re told. We have students that are trying to switch from one popular class to another. By the time they drop one class to free up a slot for themselves, the open slot in the other class is gone. By the time they see this and try to re-add their old class, that slot is gone too! So, can we make it so that a student can switch from one class to another without this worry?
Fortunately, we have FoundationDB, and this sounds an awful lot like the transactional property of atomicity—the all-or-nothing behavior that we already rely on. All we need to do is to compose the drop
and signup
functions into a new switch
function. This makes the switch
function exceptionally easy:
def switch(db_or_tr, student, old_class, new_class) do
:erlfdb.transactional(db_or_tr, fn tr ->
drop(tr, student, old_class)
singup(tr, student, new_class)
end)
end
The simplicity of this implementation belies the sophistication of what FoundationDB is taking care of for us.
By dropping the old class and signing up for the new one inside a single transaction, we ensure that either both steps happen, or that neither happens. The first notable thing about the switch
function is that it calls :erlfdb.transactional/2
decorated, but it also calls the functions signup
and drop
, which each call :erlfdb.transactional/2
themselves. Because these functions can accept either a database or an existing transaction as the tr
argument, the switch
function can be called with a database by a simple client, and a new transaction will be automatically created. However, once this transaction is created and passed in as tr
, the calls to drop
and signup
both share the same tr
. This ensures that they see each other’s modifications to the database, and all of the changes that both of them make in sequence are made transactionally when the switch
function returns. This compositional capability is very powerful.
Also note that, if an exception is raised, for example, in signup
, the exception is not caught by switch
and so will be thrown to the calling function. In this case, the transaction object is destroyed, automatically rolling back all database modifications, leaving the database completely unchanged by the half-executed function.
Are we done?
Yep, we’re done and ready to deploy. If you want to see this entire application from the Python tutorial in one place plus some multithreaded testing code to simulate concurrency, look at the Appendix: SchedulingTutorial.py.
Deploying and scaling
Since we store all state for this application in FoundationDB, deploying and scaling this solution up is impressively painless. Just run a web server, the UI, this back end, and point the whole thing at FoundationDB. We can run as many computers with this setup as we want, and they can all hit the database at the same time because of the transactional integrity of FoundationDB. Also, since all of the state in the system is stored in the database, any of these computers can fail without any lasting consequences.
Next steps
- See Data Modeling for guidance on using tuple and subspaces to enable effective storage and retrieval of data.
- See Developer Guide for general guidance on development using FoundationDB.
- See the API References for detailed API documentation.
-
See the KV Queue Livebook for a sample implemenation of a queue data structure built using
:erlfdb
Reminder
This tutorial is a copy of FDB | Class Scheduling, hosted on the official FoundationDB website.