Powered by AppSignal & Oban Pro
Would you like to see your link here? Contact us

Supervision strategies

ch_5.2_supervision_strategies.livemd

Supervision strategies

Mix.install([
  {:kino, "~> 0.9.0"}
])

Navigation

Home Supervisors introductionRestart strategies

Supervision strategies

When starting a supervisor, we have the ability to specify a supervision strategy. This strategy determines the actions taken by the supervisor when one of its child processes crashes.

In the previous chapter, we started the supervisor for our Stack GenServer process using the Supervisor.start_link(children, strategy: :one_for_one) function call.

Here, the :strategy option passed to the supervisor refers to the supervision strategy being used.

Now, let’s explore each of the supervision strategies in detail.

To illustrate the different strategies, we’ll consider a simple GenServer that crashes if we send it a :boom message. This GenServer stores a random positive integer in its state.

defmodule CrashDummyServer do
  use GenServer

  def start_link(name) do
    random_state = System.unique_integer([:positive])
    GenServer.start_link(__MODULE__, {random_state, name}, name: name)
  end

  ## Callbacks

  @impl true
  def init({random_value, name}) do
    IO.inspect("#{name} starting up!")
    {:ok, random_value}
  end

  @impl true
  def handle_cast(:boom, state) do
    process_pid = self() |> inspect()
    raise "BOOM! CrashDummyServer process: #{process_pid} crashed!"
    {:noreply, state}
  end
end

In the examples so far, we started a supervisor by directly calling the Supervisor.start_link/2 function with the required options. However we can also define the supervisor as a module instead.

To do so we have to use the Supervisor otp behavior in our module.

defmodule CrashDummySupervisor do
  # Using this behaviour we will automatically define a child_spec/1 function
  use Supervisor

  def start_link(strategy) do
    Supervisor.start_link(__MODULE__, strategy, name: __MODULE__)
  end

  # We have to implement this `init/1` callback when using the "Supervisor" behaviour
  @impl true
  def init(strategy) do
    # Supervision tree
    children = [
      child_spec(:dummy1),
      child_spec(:dummy2),
      child_spec(:dummy3)
    ]

    # Notice the supervision strategy
    Supervisor.init(children, strategy: strategy)
  end

  defp child_spec(name) do
    Supervisor.child_spec({CrashDummyServer, name}, id: name)
  end
end

In the above code snippet, we define multiple instances of our “CrashDummyServer” GenServer within the supervision tree. When the supervisor is started, it automatically starts three instances (processes) of the CrashDummyServer with the names :dummy1, :dummy2, and :dummy3.

Since we want to start three processes of the same GenServer, we cannot use the {CrashDummyServer, name} child specification because it would assign the module name as the :id, resulting in the same :id being given to all three processes. To avoid this, we use the Supervisor.child_spec/2 function and explicitly pass a separate :id to each process.

The supervision strategy is passed as an argument to the start_link/1 function and init/1 callback so that we can restart the same supervisor with a different supervision strategy.

:one_for_one

With the “one_for_one” supervision strategy, if a child process terminates, only that specific process is restarted. In other words, if there are multiple child processes supervised by our supervisor and one of them crashes, only the crashed process is restarted while the other supervised processes continue running unaffected.

To observe the behavior of this strategy, we can start the supervisor and then intentionally crash one of the supervised processes to see the restart in action.

We will use Kino to draw the supervision tree before and after the crash.

{:ok, supervisor_pid} = CrashDummySupervisor.start_link(:one_for_one)

Process.info(supervisor_pid, :links) |> IO.inspect(label: "Supervisors links")

Supervisor.which_children(supervisor_pid) |> IO.inspect(label: "Supervisors Children")

:sys.get_state(GenServer.whereis(:dummy1)) |> IO.inspect(label: "Dummy 1 state")
:sys.get_state(GenServer.whereis(:dummy2)) |> IO.inspect(label: "Dummy 2 state")
:sys.get_state(GenServer.whereis(:dummy3)) |> IO.inspect(label: "Dummy 3 state")

Kino.Process.render_sup_tree(supervisor_pid)
# Makes the dummy2 child crash
GenServer.cast(:dummy2, :boom)
# Wait for the process to crash and be restarted
Process.sleep(200)

Supervisor.which_children(supervisor_pid) |> IO.inspect(label: "Supervisors Children")

:sys.get_state(GenServer.whereis(:dummy1)) |> IO.inspect(label: "Dummy 1 state")
:sys.get_state(GenServer.whereis(:dummy2)) |> IO.inspect(label: "Dummy 2 state")
:sys.get_state(GenServer.whereis(:dummy3)) |> IO.inspect(label: "Dummy 3 state")

Kino.Process.render_sup_tree(supervisor_pid)

Based on the example, we can confirm that when using the :one_for_one supervision strategy, only the :dummy2 GenServer process crashed and was subsequently restarted. As a result, the restarted process obtained a new process ID and its state was reset. On the other hand, the :dummy1 and :dummy3 processes continued to run without any interruption, maintaining their respective process IDs and states unchanged.

:one_for_all

Upon restarting the CrashDummySupervisor with the :one_for_all restart strategy, if any child process terminates, all other child processes will be terminated as well. Following that, all child processes, including the terminated one, will be restarted.

Let’s proceed with restarting the CrashDummySupervisor using the :one_for_all strategy.

# Stop the existing supervisor process
# We used the module name as the Supervisor process name so we can use the module name to stop
# the supervisor process.
# This will also terminate the supervision tree and all process running under our supervisor
Supervisor.stop(CrashDummySupervisor)

{:ok, supervisor_pid} = CrashDummySupervisor.start_link(:one_for_all)

Supervisor.which_children(supervisor_pid) |> IO.inspect(label: "Supervisors Children")

:sys.get_state(GenServer.whereis(:dummy1)) |> IO.inspect(label: "Dummy 1 state")
:sys.get_state(GenServer.whereis(:dummy2)) |> IO.inspect(label: "Dummy 2 state")
:sys.get_state(GenServer.whereis(:dummy3)) |> IO.inspect(label: "Dummy 3 state")

Kino.Process.render_sup_tree(supervisor_pid)
# Makes the dummy2 child crash
GenServer.cast(:dummy2, :boom)
# Wait for the process to crash and be restarted
Process.sleep(200)

Supervisor.which_children(supervisor_pid) |> IO.inspect(label: "Supervisors Children")

:sys.get_state(GenServer.whereis(:dummy1)) |> IO.inspect(label: "Dummy 1 state")
:sys.get_state(GenServer.whereis(:dummy2)) |> IO.inspect(label: "Dummy 2 state")
:sys.get_state(GenServer.whereis(:dummy3)) |> IO.inspect(label: "Dummy 3 state")

Kino.Process.render_sup_tree(supervisor_pid)

This time we can see that when the :dummy_2 process crashed the supervisor restarted all the child processes. So the all processes now have a different pid.

:rest_for_one

With the :rest_for_one strategy, if a child process terminates, not only the terminated child process but also the subsequent child processes that were started after it will be terminated and restarted.

This strategy is useful when you want to restart only a portion of your supervision tree. In this case, when a process crashes, only the processes dependent on the crashed process will be restarted.

Note:

The order in which child processes are specified in a supervision tree is crucial. A supervisor will attempt to start the child processes in the exact order specified in the supervisor child specification. Similarly, when a process crashes, the supervisor will restart the child processes in the same order.

When a supervisor shuts down, it terminates all children in the reverse order in which they are listed.

Let’s see this strategy in action with our example. When the :dummy2 process crashes, only the :dummy2 and :dummy3 processes will be restarted, while the :dummy1 process will continue running.

Supervisor.stop(CrashDummySupervisor)

{:ok, supervisor_pid} = CrashDummySupervisor.start_link(:rest_for_one)

Supervisor.which_children(supervisor_pid) |> IO.inspect(label: "Supervisors Children")

:sys.get_state(GenServer.whereis(:dummy1)) |> IO.inspect(label: "Dummy 1 state")
:sys.get_state(GenServer.whereis(:dummy2)) |> IO.inspect(label: "Dummy 2 state")
:sys.get_state(GenServer.whereis(:dummy3)) |> IO.inspect(label: "Dummy 3 state")

Kino.Process.render_sup_tree(supervisor_pid)
# Makes the dummy2 child crash
GenServer.cast(:dummy2, :boom)
# Wait for the process to crash and be restarted
Process.sleep(200)

Supervisor.which_children(supervisor_pid) |> IO.inspect(label: "Supervisors Children")

:sys.get_state(GenServer.whereis(:dummy1)) |> IO.inspect(label: "Dummy 1 state")
:sys.get_state(GenServer.whereis(:dummy2)) |> IO.inspect(label: "Dummy 2 state")
:sys.get_state(GenServer.whereis(:dummy3)) |> IO.inspect(label: "Dummy 3 state")

Kino.Process.render_sup_tree(supervisor_pid)

Resources

Navigation

Home Supervisors introductionRestart strategies