Powered by AppSignal & Oban Pro
Would you like to see your link here? Contact us

Supervisors

supervisors.livemd

Supervisors

Mix.install([
  {:jason, "~> 1.4"},
  {:kino, "~> 0.9", override: true},
  {:youtube, github: "brooklinjazz/youtube"},
  {:hidden_cell, github: "brooklinjazz/hidden_cell"}
])

Navigation

Home Report An Issue Traffic Light ServerSupervisor Drills

Review Questions

  • What kinds of problems do supervisors solve?
  • How do we use supervisors to make our application fault-tolerant?
  • Why might we use each supervisor restart strategy (:one_for_one, :one_for_all, :rest_for_one)?

Fault Tolerance

Elixir and Open Telecom Platform (OTP) are fantastic for building highly concurrent and fault-tolerant systems which can handle runtime errors.

Errors happen. We often implement specific error handling for the errors we expect. However, it’s impossible to anticipate every possible error. For unexpected errors, Elixir follows a let it crash philosophy.

To achieve a fault-tolerant system in Elixir, we leverage concurrency and isolate our system into different processes. These processes share no memory, and a crash in one process cannot crash another.

Therefore, we can let a process crash, and it will not affect the rest of our system.

> it’s somewhat surprising that the core tool for error handling is concurrency. > In the BEAM world, two concurrent processes are > completely separated; they share no memory, and a crash in one process can’t by default compromise the execution flow of another. > Process isolation allows you to confine the negative effects of an error to a single process or a small group of related processes, > which keeps most of the system functioning normally. > > * Sasa Juric, Elixir in Action.

For example, We can spawn a process and raise an error. Unless we explicitly link the process with spawn_link/1 The process will die and leave the parent process unaffected.

pid =
  spawn(fn ->
    raise "error"
  end)

# Allow The Process Time To Crash
Process.sleep(100)

"I still run."

When a linked process crashes, it will also crashed the process that spawned it. Uncomment the code below, and you’ll see it crashes the current Livebook process. Re-comment the code when finished.

# spawn_link(fn -> Raise "error" End)

That means if we start a GenServer (or other process) unsupervised it will raise an error if it crashes.

defmodule UnsupervisedWorker do
  use GenServer

  def init(state) do
    {:ok, state}
  end

  def handle_info(:crash, _state) do
    raise "crash!"
  end
end

Uncomment and run the following code to see it crashes the current Livebook process, then re-comment the code.

# {:ok, pid} = GenServer.start_link(UnsupervisedWorker, [])
# Process.send(pid, :crash, [])

The same is true if we manually exit a process using Process.exit/2 with the :exit reason. Uncomment and run the following code to see it crashes the current Livebook process, then re-comment the code.

# {:ok, pid} = GenServer.start_link(UnsupervisedWorker, [])
# Process.exit(pid, :exit)

If you would like to learn more about fault tolerance and error handling, Joe Armstrong created an incredible video on the subject.

YouTube.new("https://www.youtube.com/watch?v=ZOsvgwOR6G0&ab_channel=KentComputing")

Supervisors

Instead of crashing the parent process, we can use Supervisors to monitor processes and restart them when they die.

The term restart is a bit misleading. To be clear, we cannot restart a process. Restart in this case means that we kill the process and start a new one in its place. However, it’s conceptually easier to think of it as restarting.

Why would we want to restart the process? Well, most technical issues are the result of state. Our system gets into a bad state and cannot recover. Turning a system on and off again clears the state and often resolves the issue. Using a Supervisor is like adding an automatic on/off switch to your application.

A Supervisor monitors one or more child processes. We generally refer to these processes as workers. The Supervisor can use different strategies for restarting its child workers. The :one_for_one strategy individually restarts any worker that dies.

flowchart
S[Supervisor]
W1[Worker]
W2[Worker]
W3[Worker]

S --> W1
S --> W2
S --> W3

Here, we define a simple Worker process which we’ll start under a supervisor.

defmodule Worker do
  use GenServer

  def start_link(_opts) do
    GenServer.start_link(__MODULE__, [], [])
  end

  def init(state) do
    {:ok, state}
  end
end

Every supervisor is also a process. The Supervisor.start_link/2 function accepts a list of child processes and starts the supervisor process. We provide each child as a map with an :id and a :start signature.

children = [
  %{
    id: :worker1,
    start: {Worker, :start_link, [1]}
  },
  %{
    id: :worker2,
    start: {Worker, :start_link, [2]}
  },
  %{
    id: :worker3,
    start: {Worker, :start_link, [3]}
  }
]

{:ok, supervisor_pid} = Supervisor.start_link(children, strategy: :one_for_one)

Your Turn

Here, we’ve used the livebook-exclusive Kino.Process.sup_tree/2 function to create a visualization of the above supervision tree.

Kino.Process.sup_tree(supervisor_pid)

When we kill one of the processes under the supervision tree, it will be restarted. For demonstration purposes only, you can use the :c.pid/3 function from erlang to manually create the pid from the diagram above where 727 should be a pid from the diagram above.

pid = :c.pid(0, 727, 0)

Then, kill the process using Process.exit/2.

Process.exit(pid, :exit)

Re-evaluate the diagram above and you should see a new Worker process started with a different pid. Repeat this process to kill each process under the supervisor and see them restarted.

Restart Strategies

Supervisors can have different restart strategies that change the behavior of how we handle restarting child processes when one of them dies.

We’ve seen the above :one_for_one strategy. This is the most common strategy, where we restart the child process that died under the supervisor.

However, sometimes we want different restart strategies such as :one_for_all and :rest_for_one.

One-For-One

Restart only the worker that crashed.

flowchart TD
    Supervisor
    Supervisor --> P1
    Supervisor --> P2
    Supervisor --> P3
    Supervisor --> ..
    Supervisor --> Pn

    classDef crashed fill:#fe8888;
    classDef restarted stroke:#0cac08,stroke-width:4px

    class P2 crashed
    class P2 restarted

In the diagram above, only P2 crashed, so only P2 will be restarted by the supervisor.

One-For-All

Restart all of the child workers.

flowchart TD
    Supervisor
    Supervisor --> P1
    Supervisor --> P2
    Supervisor --> P3
    Supervisor --> ..
    Supervisor --> Pn

    classDef crashed fill:#fe8888;
    classDef terminated fill:#fbab04;
    classDef restarted stroke:#0cac08,stroke-width:4px

    class P2 crashed
    class P1,P3,..,Pn terminated
    class P1,P2,P3,..,Pn restarted

P2 crashed, which led to all the other child processes to be terminated, then all the child processes are restarted.

Rest-For-One

Restart child workers in order after the crashed process.

flowchart TD
    Supervisor
    Supervisor --> P1
    Supervisor --> P2
    Supervisor --> P3
    Supervisor --> ..
    Supervisor --> Pn

    classDef crashed fill:#fe8888;
    classDef terminated fill:#fbab04;
    classDef restarted stroke:#0cac08,stroke-width:4px

    class P2 crashed
    class P3,..,Pn terminated
    class P2,P3,..,Pn restarted

P2 crashed, the child processes after it in the start order are terminated, then P2 and the other terminated child processes are restarted.

Restart Strategy Examples

To better understand how we can use a supervisor with different restart strategies, we’re going to supervise multiple Bomb processes that will crash after a specified amount of time.

Here, we’ve created a Bomb process that will send itself a message to cause it to crash after a specified :bomb_time.

defmodule Bomb do
  use GenServer

  def start_link(opts) do
    GenServer.start_link(__MODULE__, opts)
  end

  @impl true
  def init(opts) do
    bomb_time = Keyword.get(opts, :bomb_time, 2000)
    name = Keyword.get(opts, :name, 2000)

    Process.send_after(self(), :explode, bomb_time)
    {:ok, name}
  end

  @impl true
  def handle_info(:explode, name) do
    raise "#{name} Exploded!"
  end
end

Each Bomb will exist for a specified time and then crash (explode).

  • Bomb 1 will explode after 2 seconds.
  • Bomb 2 will explode after 3 seconds.
  • Bomb 3 will explode after 5 seconds.

We’ll reuse these same children in all of our examples.

children = [
  %{
    id: :bomb1,
    start: {Bomb, :start_link, [[name: "Bomb 1", bomb_time: 2000]]}
  },
  %{
    id: :bomb2,
    start: {Bomb, :start_link, [[name: "Bomb 2", bomb_time: 3000]]}
  },
  %{
    id: :bomb3,
    start: {Bomb, :start_link, [[name: "Bomb 3", bomb_time: 5000]]}
  }
]

One For One

We’ll start the three bombs using the :one_for_one strategy.

For the sake of example, we’ve increased :max_restarts to increase the limit of three crashes per five seconds to five crashes per five seconds.

You can go to Supervisor Options on HexDocs for a full list of configuration options.

Each bomb restarts individually without affecting the others, so the timeline of crashes should be:

  • 2 seconds: Bomb 1 crashes.
  • 3 seconds: Bomb 2 crashes.
  • 4 seconds: Bomb 1 crashes.
  • 5 seconds: Bomb 3 crashes.

Uncomment and run the cell below to see the timeline of crashes, then re-comment it.

# {:ok, supervisor_pid} = Supervisor.start_link(children, strategy: :one_for_one, max_restarts: 5)

One For All

If we change the restart strategy to :one_for_all, then all processes will be terminated and restarted when a single process crashes.

So the timeline of crashes will be:

  • 2 seconds: Bomb 1 crashes (Bomb 2 and Bomb 3 restarted)
  • 4 seconds: Bomb 1 crashes (Bomb 2 and Bomb 3 restarted)
  • 6 seconds: Bomb 1 crashes (Bomb 2 and Bomb 3 restarted)
  • 8 seconds: Bomb 1 crashes (Bomb 2 and Bomb 3 restarted)
  • 10 seconds: Bomb 1 crashes (Bomb 2 and Bomb 3 restarted)

Notice that Bomb2 and Bomb3 will never explode because their timers restart.

Uncomment the line below to observe the timeline of crashes, then re-comment it.

# {:ok, supervisor_pid} = Supervisor.start_link(children, strategy: :one_for_all, max_restarts: 5)

Rest For One

Rest for one works like a row of dominoes, where only children defined after the crashed process will be restarted.

If we change the restart strategy to :rest_for_one, the processes will restart any processes ordered after them in the supervisor.

At first, we won’t notice any change from the :one_for_all strategy. The timeline will still be:

  • 2 seconds: Bomb 1 crashes (Bomb 2 and Bomb 3 restarted)
  • 4 seconds: Bomb 1 crashes (Bomb 2 and Bomb 3 restarted)
  • 6 seconds: Bomb 1 crashes (Bomb 2 and Bomb 3 restarted)
  • 8 seconds: Bomb 1 crashes (Bomb 2 and Bomb 3 restarted)
  • 10 seconds: Bomb 1 crashes (Bomb 2 and Bomb 3 restarted)

Uncomment the line below to observe the timeline of crashes, then re-comment it.

# {:ok, supervisor_pid} = Supervisor.start_link(children, strategy: :rest_for_one, max_restarts: 5)

However, if we change the order of children in the supervisor, only children after the crashed process will restart.

children = [
  %{
    id: :bomb3,
    start: {Bomb, :start_link, [[name: "Bomb 3", bomb_time: 5000]]}
  },
  %{
    id: :bomb1,
    start: {Bomb, :start_link, [[name: "Bomb 1", bomb_time: 2000]]}
  },
  %{
    id: :bomb2,
    start: {Bomb, :start_link, [[name: "Bomb 2", bomb_time: 3000]]}
  }
]

So the timeline of crashes would be:

  • 2 seconds: Bomb1 crashes (Bomb2 restarted)
  • 4 seconds: Bomb1 crashes (Bomb2 restarted)
  • 5 seconds: Bomb3 crashes (Bomb1 and Bomb2 restarted)
  • 7 seconds: Bomb1 crashes (Bomb2 restarted)
  • 9 seconds: Bomb1 crashes (Bomb2 restarted)
# {:ok, Supervisor_pid} = Supervisor.start_link(children, Strategy: :rest_for_one, Max_restarts: 5)

Your Turn

Start four Bomb processes under a supervisor with different bomb times. Try each restart strategy :one_for_one, one_for_all, and :rest_for_one. Predict the crash timeline then evaluate your code. Did the result match your prediction?

Syntax Sugar

Instead of providing a map with the :id and :start keys, we can instead provide the child as a tuple. The name of the module will also be the :id, and the second value will be the argument passed to start_link/1.

children = [
  {Bomb, [name: "Syntax Sugar Bomb", bomb_time: 1000]}
]

Uncomment the following code to see the crash timeline, then re-comment it.

# Supervisor.start_link(children, strategy: :one_for_one)

Further Reading

Supervised Mix Projects

Supervised mix projects use an Application module that defines application callbacks.

The application module is configured in mix.exs in the application/0 function.

def application do
  [
    extra_applications: [:logger],
    mod: {MyApp.Application, []}
  ]
end

The application module typically lives in the project folder in a my_app/application.ex file and contains the logic to start the application. In a supervised application, this module starts a supervision tree.

defmodule MyApp.Application do
  # See https://hexdocs.pm/elixir/Application.html
  # for more information on OTP Applications
  @moduledoc false

  use Application

  @impl true
  def start(_type, _args) do
    children = [
      # Starts a worker by calling: RockPaperScissors.Worker.start_link(arg)
      # {RockPaperScissors.Worker, arg}
    ]

    # See https://hexdocs.pm/elixir/Supervisor.html
    # for other strategies and supported options
    opts = [strategy: :one_for_one, name: MyApp.Supervisor]
    Supervisor.start_link(children, opts)
  end
end

We can generate a mix project with all of the above scaffolding using the --sup flag, or add it manually to an existing project.

mix new my_app --sup

Commit Your Progress

DockYard Academy now recommends you use the latest Release rather than forking or cloning our repository.

Run git status to ensure there are no undesirable changes. Then run the following in your command line from the curriculum folder to commit your progress.

$ git add .
$ git commit -m "finish Supervisors reading"
$ git push

We’re proud to offer our open-source curriculum free of charge for anyone to learn from at their own pace.

We also offer a paid course where you can learn from an instructor alongside a cohort of your peers. We will accept applications for the June-August 2023 cohort soon.

Navigation

Home Report An Issue Traffic Light ServerSupervisor Drills