Distributed portals with Elixir
Introduction
This notebook is a fast-paced introduction to the Elixir programming language. We will explore both basic and advanced concepts to implement our own version of the Portal game to transfer data across notebooks using Elixir’s distribution capabilities.
For a more structured introduction to the language, see Elixir’s Getting Started guide and the many learning resources available.
The plan ahead
The Portal game consists of a series of puzzles that must be solved by teleporting the player’s character and simple objects from one place to another.
In order to teleport, the player uses the Portal gun to shoot doors onto flat planes, like a floor or a wall. Entering one of those doors teleports you to the other:
Our version of the Portal game will use Elixir to shoot doors of different colors and transfer data between them! We will even learn how we can distribute doors across different machines in our network:
Here is what we will learn:
- Elixir’s basic data structures
- Pattern matching
- Using agents for state
- Using structs for custom data structures
- Extending the language with protocols
- Supervision trees and applications
- Distributed Elixir nodes
At the end of this notebook, we will make the following code work:
# Shoot two doors: one orange, another blue
Portal.shoot(:orange)
Portal.shoot(:blue)
# Start transferring the list [1, 2, 3, 4] from orange to blue
portal = Portal.transfer(:orange, :blue, [1, 2, 3, 4])
# This will output:
#
# #Portal<
# :orange <=> :blue
# [1, 2, 3, 4] <=> []
# >
# Now every time we call push_right, data goes to blue
Portal.push_right(portal)
# This will output:
#
# #Portal<
# :orange <=> :blue
# [1, 2, 3] <=> [4]
# >
Intrigued? Let’s get started!
Basic data structures
Elixir has numbers, strings, and variables. Code comments start with #
:
# Numbers
IO.inspect(40 + 2)
# Strings
variable = "hello" <> " world"
IO.inspect(variable)
Executing the cell above prints the number 42
and the string
"hello world"
. To do so, we called the function inspect
in
the IO
module, using the IO.inspect(...)
notation. This function
prints the given data structure to your terminal - in this case,
our notebook - and returns the value given to it.
Elixir also has three special values, true
, false
, and nil
.
Everything in Elixir is considered to be a truthy value, except for
false
and nil
:
# && is the logical and operator
IO.inspect(true && true)
IO.inspect(13 && 42)
# || is the logical or operator
IO.inspect(true || false)
IO.inspect(nil || 42)
For working with collections of data, Elixir has three data types:
# Lists (typically hold a dynamic amount of items)
IO.inspect([1, 2, "three"])
# Tuples (typically hold a fixed amount of items)
IO.inspect({:ok, "value"})
# Maps (key-value data structures)
IO.inspect(%{"key" => "value"})
In the snippet above, we also used a new data structure represented
as :ok
. All values starting with a leading :
in Elixir are called
atoms. Atoms are used as identifiers across the language. Common
atoms are :ok
and :error
. Which brings us to the next topic: pattern
matching.
Pattern matching
The =
operator in Elixir is a bit different from the ones we see
in other languages:
x = 1
x
So far so good, but what happens if we invert the operands?
1 = x
It worked! That’s because Elixir tries to match the right side
against the left side. Since both are set to 1
, it works. Let’s
try something else:
2 = x
Now the sides did not match, so we got an error. We use pattern
matching in Elixir to match on collection too. For example, we
can use [head | tail]
to extract the head (the first element)
and tail (the remaining ones) from a list:
[head | tail] = [1, 2, 3]
IO.inspect(head)
IO.inspect(tail)
Matching an empty list against [head | tail]
causes a match error:
[head | tail] = []
Finally, we can also use the [head | tail]
expression to add
elements to the head of a list:
list = [1, 2, 3]
[0 | list]
We can also pattern match on tuples. This is often used to match
on the return types of function calls. For example, take the function
Date.from_iso8601(string)
, which returns {:ok, date}
if the
string represents a valid date, in the format YYYY-MM-DD
, otherwise
it returns {:error, reason}
:
# A valid date
Date.from_iso8601("2020-02-29")
# An invalid date
Date.from_iso8601("2020-02-30")
Now, what happens if we want our code to behave differently depending
if the date is valid or not? We can use case
to pattern match on
the different tuples:
# Give an invalid date as input
input = "2020-02-30"
# And then match on the return value
case Date.from_iso8601(input) do
{:ok, date} ->
"We got a valid date: #{inspect(date)}"
{:error, reason} ->
"Oh no, the date is invalid. Reason: #{inspect(reason)}"
end
In this example, we are using case
to pattern match on the different
outcomes of the Date.from_iso8601
. We say the case
above has two
clauses, one matching on {:ok, date}
and another on {:error, reason}
.
Now try changing the input
variable above and reevaluate the cell
accordingly. What happens when you give it a valid date?
Finally, we can also pattern match on maps. This is used to extract the values for the given keys:
map = %{:elixir => :functional, :python => :object_oriented}
%{:elixir => type} = map
type
If the key does not exist on the map, it raises:
%{:c => type} = map
With pattern matching out of the way, we are ready to start our Portal implementation!
Modeling portal doors with Agents
Elixir data structures are immutable. In the examples above, we never mutated the list. We can break a list apart or add new elements to the head, but the original list is never modified.
That said, when we need to keep some sort of state, like the data transferring through a portal, we must use an abstraction that stores this state for us. One such abstraction in Elixir is called an agent. Before we use agents, we need to briefly talk about anonymous functions.
Anonymous functions are a mechanism to represent pieces of code that we can pass around and execute later on:
adder = fn a, b -> a + b end
An anonymous function is delimited by the words fn
and end
and an
arrow ->
is used to separate the arguments from the anonymous function
body. We can now call the anonymous function above multiple times by
providing two numbers as inputs:
adder.(1, 2)
adder.(3, 5)
In Elixir, we also use anonymous functions to initialize, get, and update the agent state:
{:ok, agent} = Agent.start_link(fn -> [] end)
In the example above, we created a new agent, passing a function that
returns the initial state of an empty list. The agent returns
{:ok, #PID<...>}
, where PID stands for a process identifier, which
uniquely identifies the agent. Elixir has many abstractions for concurrency,
such as agents, tasks, generic servers, but at the end of the day they all
boil down to processes. When we say processes in Elixir, we don’t mean
Operating System processes, but rather Elixir Processes, which are lightweight
and isolated, allowing us to run hundreds of thousands of them on the same
machine.
We store the agent’s PID in the agent
variable, which allows us to
send messages to get the agent’s state:
Agent.get(agent, fn list -> list end)
As well as update it before reading again:
Agent.update(agent, fn list -> [0 | list] end)
Agent.get(agent, fn list -> list end)
We will use agents to implement our portal doors.
Whenever we need to encapsulate logic in Elixir, we create modules,
which are essentially a collection of functions. We define modules
with defmodule
and functions with def
. Our functions will
encapsulate the logic to interact with the agent, using the API we
learned in the cells above:
defmodule Portal.Door do
use Agent
def start_link(color) when is_atom(color) do
Agent.start_link(fn -> [] end, name: color)
end
def get(door) do
Agent.get(door, fn list -> list end)
end
def push(door, value) do
Agent.update(door, fn list -> [value | list] end)
end
def pop(door) do
Agent.get_and_update(door, fn list ->
case list do
[h | t] -> {{:ok, h}, t}
[] -> {:error, []}
end
end)
end
def stop(door) do
Agent.stop(door)
end
end
We declare a module by giving it a name, in this case, Portal.Door
.
At the top of the module, we say use Agent
, which brings some
Agent
-related functionality into the module.
The first function is start_link
, which we often refer to as
start_link/1
, where the number 1 is called the “arity” of the
function and it indicates the number of arguments it receives.
Then we check that the argument is an atom and proceed to call
Agent.start_link/2
, as we did earlier in this section, except we are
now passing name: color
as an argument. By giving a name to the
Agent, we can refer to it anywhere by its name, instead of using its
PID.
The next two functions, get/1
and push/2
perform simple operation
to the agent, reading its state and adding a new element respectively.
Let’s take a look at them:
Portal.Door.start_link(:pink)
Portal.Door.get(:pink)
Note how we didn’t need to store the PID anywhere and we can use the
atom :pink
to refer to the door and read its state. If the door
already exists, and we try to start another one with the same name,
it returns an {:error, reason}
tuple instead of {:ok, pid}
:
Portal.Door.start_link(:pink)
Next, let’s push some events:
Portal.Door.push(:pink, 1)
Portal.Door.push(:pink, 2)
Portal.Door.get(:pink)
We pushed some events and they show up in our state. Although, note they appear in reverse order. That’s because we are always adding new entries to the top of the list.
The fourth function we defined is called pop/1
. If there is any
item in the agent, it takes the head of the list and returns it
wrapped in a {:ok, value}
tuple. However, if the list is empty,
it returns :error
.
IO.inspect(Portal.Door.pop(:pink))
IO.inspect(Portal.Door.pop(:pink))
IO.inspect(Portal.Door.pop(:pink))
Portal.Door.get(:pink)
Finally, the last function, stop/1
, simply terminates the agent,
effectively closing the door. Let’s try it:
Portal.Door.stop(:pink)
Now, if we try to do anything with it, it will raise:
Portal.Door.get(:pink)
Note the error message points out why the operation did not work, great!
Portal transfers
Our portal doors are ready so it is time to start working on portal
transfers! In order to store the portal data, we are going to create a
struct named Portal
. Let’s first learn what structs are about.
Structs define data structures with pre-defined keys. The keys are verified
at compilation time, so if you make a typo in the key name, you get an error
early on. Structs are defined inside modules, by calling the defstruct
with
a list of atom keys. Let’s define a User
struct with the fields :name
and :age
:
defmodule User do
defstruct [:name, :age]
end
Now, we can create structs using the %User{...}
notation:
user = %User{name: "john doe", age: 27}
We can access struct fields using the struct.field
syntax:
user.name
We can pattern match on structs too:
%User{age: age} = user
age
Finally, let’s see what happens if we do a typo in a field:
%User{agee: age} = user
Now we are ready to define our Portal
struct. It will have
two fields, :left
and :right
, which point respectively to
the portal door on the left and the door on the right. Our goal
is to transfer data from the left door to the right one. The
Portal
module, where we define our struct, will also have
four other functions:
-
shoot(color)
- shoots a door of the given color. This is a wrapper aroundPortal.Door.start_link/1
-
transfer(left_door, right_door, data)
- starts a transfer by loading the givendata
toleft_door
and returns aPortal
struct -
push_right(portal)
- receives a portal and continues the transfer by pushing data from the left to the right -
close(portal)
- closes the portal by explicitly stopping both doors
Let’s implement them:
defmodule Portal do
defstruct [:left, :right]
def shoot(color) do
Portal.Door.start_link(color)
end
def transfer(left_door, right_door, data) do
# First add all data to the portal on the left
for item <- data do
Portal.Door.push(left_door, item)
end
# Returns a portal struct with the doors
%Portal{left: left_door, right: right_door}
end
def push_right(portal) do
# See if we can pop data from left. If so, push the
# popped data to the right. Otherwise, do nothing.
case Portal.Door.pop(portal.left) do
:error -> :ok
{:ok, h} -> Portal.Door.push(portal.right, h)
end
# Let's return the portal itself
portal
end
def close(portal) do
Portal.Door.stop(portal.left)
Portal.Door.stop(portal.right)
:ok
end
end
The Portal
modules defines a struct followed by a shoot/1
function.
The function is just a wrapper around Portal.Door.start_link/1
. Then
we define the transfer/3
function, which loads the given data into the
left door and returns a Portal struct. Finally, push_right/1
gets data
from the door on the left and puts it on the right door. Let’s give it a try:
Portal.shoot(:orange)
Portal.shoot(:blue)
portal = Portal.transfer(:orange, :blue, [1, 2, 3])
The above returns the %Portal{}
struct. We can check the data has been
loaded into the left door:
Portal.Door.get(:orange)
Note the list is reversed - and we knew that! - as we always add items on the top. But we will use that to our advantage soon. Let’s start pushing data to the right:
Portal.push_right(portal)
Portal.Door.get(:blue)
Since the list is reversed, we can see that we pushed the number 3
to the right, which is exactly what we expected. If you reevaluate the
cell above, you will see data moving to the right, as our portal doors
are stateful.
Our portal transfer seems to work as expected! Now let’s clean up and close the transfer:
Portal.close(portal)
We have made some good progress in our implementation, so now let’s work
a bit on the presentation. Currently, the Portal is printed as a struct:
%Portal{left: :orange, right: :blue}
. It would be nice if we actually
had a printed representation of the portal transfer, allowing us to see
the portal processes as we push data.
Inspecting portals with Protocols
We already know that Elixir data structures data can be printed by Livebook.
After all, when we type 1 + 2
, we get 3
back. However, can we customize
how our own data structures are printed?
Yes, we can! Elixir provides protocols, which allows behaviour to be extended
and implemented for any data type, like our Portal
struct, at any time.
For example, every time something is printed in Livebook, or in Elixir’s
terminal, Elixir uses the Inspect
protocol. Since protocols can be extended
at any time, by any data type, it means we can implement it for Portal
too.
We do so by calling defimpl/2
, passing the protocol name and the data
structure we want to implement the protocol for. Let’s do it:
defimpl Inspect, for: Portal do
def inspect(%Portal{left: left, right: right}, _) do
left_door = inspect(left)
right_door = inspect(right)
left_data = inspect(Enum.reverse(Portal.Door.get(left)))
right_data = inspect(Portal.Door.get(right))
max = max(String.length(left_door), String.length(left_data))
"""
#Portal<
#{String.pad_leading(left_door, max)} <=> #{right_door}
#{String.pad_leading(left_data, max)} <=> #{right_data}
>\
"""
end
end
In the snippet above, we have implemented the Inspect
protocol for the
Portal
struct. The protocol expects one function named inspect
to be
implemented. The function expects two arguments, the first is the Portal
struct itself and the second is a set of options, which we don’t care about
for now.
Then we call inspect
multiple times, to get a text representation of both
left
and right
doors, as well as to get a representation of the data
inside the doors. Finally, we return a string containing the portal presentation
properly aligned.
That’s all we need! Let’s start a new transfer and see how it goes:
Portal.shoot(:red)
Portal.shoot(:purple)
portal = Portal.transfer(:red, :purple, [1, 2, 3])
Sweet! Look how Livebook automatically picked up the new representation.
Now feel free to call push_right
and see what happens:
Portal.push_right(portal)
Feel free to reevaluate the cell above a couple times. Once you are done, run the cell below to clean it all up and close the portal:
Portal.close(portal)
There is just one topic left…
Distributed transfers
With our portals working, we are ready to give distributed transfers a try. However, before we start, there is one big disclaimer:
> The feature we are going to implement will allow us to share data across > two separate notebooks using the Erlang Distribution. This section is a great > experiment to understand how things work behind the scenes but sharing data > across notebooks as done here is a bad idea in practice. If this topic > interests you, we recommend picking up one of the many available learning > resources available for Elixir, which will provide a more solid ground to > leverage the Erlang VM and its distributed features.
Distribution 101
When Livebook executes the code in a notebook, it starts a separate Elixir runtime to do so. Since Livebook itself is also written in Elixir, it uses the Erlang Distribution to communicate with this Elixir runtime. We can get the name of the Elixir node our notebook is running on like this:
node()
By executing the code above, we can see node names are atoms. By default, Livebook only connects to nodes running on the same machine, but you can also configure it to connect to runtimes across machines.
We can also get a list of all nodes our runtime is connected to by calling
Node.list()
, let’s give it a try:
Node.list()
If you execute the code above, you get… an empty list!?
The reason why we get an empty list is because, by default, Erlang Distribution is a fully mesh. This means all nodes can see all nodes in the network. However, because we want the notebook runtimes to be isolated from each other, we start each runtime as a hidden node. We can ask Elixir to give us all hidden nodes instead:
Node.list(:hidden)
Much better! We see one node, which is the Livebook server itself.
Now there is one last piece of the puzzle: in order for nodes to connect to each other, they need to have the same cookie. The cookie serves as a very simple authentication mechanism. We can read the cookie of our current notebook runtime like this:
Node.get_cookie()
Now we have everything we need to connect across notebooks.
Notebook connections
In order to connect across notebooks, open up a new empty notebook in a separate tab, copy and paste the code below to this new notebook, and execute it:
IO.inspect node()
IO.inspect Node.get_cookie()
:ok
Now paste the result of the other node name and its cookie in the variables below:
other_node = :"name-of-the@other-node"
other_cookie = :"value-of-the-other-cookie"
With both variables defined, let’s connect to the other node with the given cookie:
Node.set_cookie(other_node, other_cookie)
Node.connect(other_node)
If it returns true, it means it connected as expected and we are ready to start a distributed transfer. The first step is to go to the other notebook and shoot a door. You can try calling the following there:
Portal.shoot(:blue)
However, if you try the above, it will fail! This happens because the
Portal code has been defined in this notebook but it is not available
in the other notebook. If we were working on an actual Elixir project,
this issue wouldn’t exist, because we would start multiple nodes on top
of the same codebase with the same modules, but we can’t do so here.
To work around this, copy the cell that defines the Portal.Door
module
from this notebook into the other notebook and execute it. Now in the other
node you should be able to start a door:
Portal.Door.start_link(:blue)
Cross-node references
Now that we have spawned a door on the other notebook, we can directly read
its content from this notebook. So far, we have been using atoms to represent
doors, such as :blue
, but we can also use the {name, node}
notation to refer
to a process in another node. Let’s give it a try:
blue = {:blue, other_node}
Portal.Door.get(blue)
It works! We could successfully read something from the other node. Now, let’s shoot a yellow portal on this node and start a transfer between them:
Portal.shoot(:yellow)
yellow = {:yellow, node()}
portal = Portal.transfer(yellow, blue, [1, 2, 3, 4])
Our distributed transfer was started. Now let’s push right:
Portal.push_right(portal)
If you go back to the other notebook and run Portal.Door.get(:blue)
,
you should see it has been updated with entries from this notebook!
The best part of all is that we enabled distributed transfers without
changing a single line of code!
Our distributed portal transfer works because the doors are just processes and accessing/pushing the data through doors is done by sending messages to those processes via the Agent API. We say sending a message in Elixir is location transparent: we can send messages to any PID regardless if it is in the same node as the sender or in different nodes of the same network.
Wrapping up
So we have reached the end of this notebook with a fast paced introduction to Elixir! It was a fun ride and we went from manually starting doors to distributed portal transfers.
To learn more about Elixir, we welcome you to explore our website, read our Getting Started guide, and many of the available learning resources.
Finally, huge thanks to Augie De Blieck Jr. for the drawings in this tutorial.
See you around!