Supervision strategies
Mix.install([
{:kino, "~> 0.9.0"}
])
Navigation
Home Supervisors introductionRestart strategiesSupervision strategies
When starting a supervisor, we have the ability to specify a supervision strategy. This strategy determines the actions taken by the supervisor when one of its child processes crashes.
In the previous chapter, we started the supervisor for our Stack GenServer process using the Supervisor.start_link(children, strategy: :one_for_one)
function call.
Here, the :strategy
option passed to the supervisor refers to the supervision strategy being used.
Now, let’s explore each of the supervision strategies in detail.
To illustrate the different strategies, we’ll consider a simple GenServer that crashes if we send it a :boom
message. This GenServer stores a random positive integer in its state.
defmodule CrashDummyServer do
use GenServer
def start_link(name) do
random_state = System.unique_integer([:positive])
GenServer.start_link(__MODULE__, {random_state, name}, name: name)
end
## Callbacks
@impl true
def init({random_value, name}) do
IO.inspect("#{name} starting up!")
{:ok, random_value}
end
@impl true
def handle_cast(:boom, state) do
process_pid = self() |> inspect()
raise "BOOM! CrashDummyServer process: #{process_pid} crashed!"
{:noreply, state}
end
end
In the examples so far, we started a supervisor by directly calling the Supervisor.start_link/2
function with the required options. However we can also define the supervisor as a module instead.
To do so we have to use the Supervisor
otp behavior in our module.
defmodule CrashDummySupervisor do
# Using this behaviour we will automatically define a child_spec/1 function
use Supervisor
def start_link(strategy) do
Supervisor.start_link(__MODULE__, strategy, name: __MODULE__)
end
# We have to implement this `init/1` callback when using the "Supervisor" behaviour
@impl true
def init(strategy) do
# Supervision tree
children = [
child_spec(:dummy1),
child_spec(:dummy2),
child_spec(:dummy3)
]
# Notice the supervision strategy
Supervisor.init(children, strategy: strategy)
end
defp child_spec(name) do
Supervisor.child_spec({CrashDummyServer, name}, id: name)
end
end
In the above code snippet, we define multiple instances of our “CrashDummyServer” GenServer within the supervision tree. When the supervisor is started, it automatically starts three instances (processes) of the CrashDummyServer with the names :dummy1
, :dummy2
, and :dummy3
.
Since we want to start three processes of the same GenServer, we cannot use the {CrashDummyServer, name}
child specification because it would assign the module name as the :id
, resulting in the same :id
being given to all three processes. To avoid this, we use the Supervisor.child_spec/2
function and explicitly pass a separate :id
to each process.
The supervision strategy is passed as an argument to the start_link/1
function and init/1
callback so that we can restart the same supervisor with a different supervision strategy.
:one_for_one
With the “one_for_one” supervision strategy, if a child process terminates, only that specific process is restarted. In other words, if there are multiple child processes supervised by our supervisor and one of them crashes, only the crashed process is restarted while the other supervised processes continue running unaffected.
To observe the behavior of this strategy, we can start the supervisor and then intentionally crash one of the supervised processes to see the restart in action.
We will use Kino to draw the supervision tree before and after the crash.
{:ok, supervisor_pid} = CrashDummySupervisor.start_link(:one_for_one)
Process.info(supervisor_pid, :links) |> IO.inspect(label: "Supervisors links")
Supervisor.which_children(supervisor_pid) |> IO.inspect(label: "Supervisors Children")
:sys.get_state(GenServer.whereis(:dummy1)) |> IO.inspect(label: "Dummy 1 state")
:sys.get_state(GenServer.whereis(:dummy2)) |> IO.inspect(label: "Dummy 2 state")
:sys.get_state(GenServer.whereis(:dummy3)) |> IO.inspect(label: "Dummy 3 state")
Kino.Process.render_sup_tree(supervisor_pid)
# Makes the dummy2 child crash
GenServer.cast(:dummy2, :boom)
# Wait for the process to crash and be restarted
Process.sleep(200)
Supervisor.which_children(supervisor_pid) |> IO.inspect(label: "Supervisors Children")
:sys.get_state(GenServer.whereis(:dummy1)) |> IO.inspect(label: "Dummy 1 state")
:sys.get_state(GenServer.whereis(:dummy2)) |> IO.inspect(label: "Dummy 2 state")
:sys.get_state(GenServer.whereis(:dummy3)) |> IO.inspect(label: "Dummy 3 state")
Kino.Process.render_sup_tree(supervisor_pid)
Based on the example, we can confirm that when using the :one_for_one
supervision strategy, only the :dummy2
GenServer process crashed and was subsequently restarted. As a result, the restarted process obtained a new process ID and its state was reset. On the other hand, the :dummy1
and :dummy3
processes continued to run without any interruption, maintaining their respective process IDs and states unchanged.
:one_for_all
Upon restarting the CrashDummySupervisor with the :one_for_all
restart strategy, if any child process terminates, all other child processes will be terminated as well. Following that, all child processes, including the terminated one, will be restarted.
Let’s proceed with restarting the CrashDummySupervisor
using the :one_for_all
strategy.
# Stop the existing supervisor process
# We used the module name as the Supervisor process name so we can use the module name to stop
# the supervisor process.
# This will also terminate the supervision tree and all process running under our supervisor
Supervisor.stop(CrashDummySupervisor)
{:ok, supervisor_pid} = CrashDummySupervisor.start_link(:one_for_all)
Supervisor.which_children(supervisor_pid) |> IO.inspect(label: "Supervisors Children")
:sys.get_state(GenServer.whereis(:dummy1)) |> IO.inspect(label: "Dummy 1 state")
:sys.get_state(GenServer.whereis(:dummy2)) |> IO.inspect(label: "Dummy 2 state")
:sys.get_state(GenServer.whereis(:dummy3)) |> IO.inspect(label: "Dummy 3 state")
Kino.Process.render_sup_tree(supervisor_pid)
# Makes the dummy2 child crash
GenServer.cast(:dummy2, :boom)
# Wait for the process to crash and be restarted
Process.sleep(200)
Supervisor.which_children(supervisor_pid) |> IO.inspect(label: "Supervisors Children")
:sys.get_state(GenServer.whereis(:dummy1)) |> IO.inspect(label: "Dummy 1 state")
:sys.get_state(GenServer.whereis(:dummy2)) |> IO.inspect(label: "Dummy 2 state")
:sys.get_state(GenServer.whereis(:dummy3)) |> IO.inspect(label: "Dummy 3 state")
Kino.Process.render_sup_tree(supervisor_pid)
This time we can see that when the :dummy_2
process crashed the supervisor restarted all the child processes. So the all processes now have a different pid.
:rest_for_one
With the :rest_for_one
strategy, if a child process terminates, not only the terminated child process but also the subsequent child processes that were started after it will be terminated and restarted.
This strategy is useful when you want to restart only a portion of your supervision tree. In this case, when a process crashes, only the processes dependent on the crashed process will be restarted.
Note:
The order in which child processes are specified in a supervision tree is crucial. A supervisor will attempt to start the child processes in the exact order specified in the supervisor child specification. Similarly, when a process crashes, the supervisor will restart the child processes in the same order.
When a supervisor shuts down, it terminates all children in the reverse order in which they are listed.
Let’s see this strategy in action with our example. When the :dummy2
process crashes, only the :dummy2
and :dummy3
processes will be restarted, while the :dummy1
process will continue running.
Supervisor.stop(CrashDummySupervisor)
{:ok, supervisor_pid} = CrashDummySupervisor.start_link(:rest_for_one)
Supervisor.which_children(supervisor_pid) |> IO.inspect(label: "Supervisors Children")
:sys.get_state(GenServer.whereis(:dummy1)) |> IO.inspect(label: "Dummy 1 state")
:sys.get_state(GenServer.whereis(:dummy2)) |> IO.inspect(label: "Dummy 2 state")
:sys.get_state(GenServer.whereis(:dummy3)) |> IO.inspect(label: "Dummy 3 state")
Kino.Process.render_sup_tree(supervisor_pid)
# Makes the dummy2 child crash
GenServer.cast(:dummy2, :boom)
# Wait for the process to crash and be restarted
Process.sleep(200)
Supervisor.which_children(supervisor_pid) |> IO.inspect(label: "Supervisors Children")
:sys.get_state(GenServer.whereis(:dummy1)) |> IO.inspect(label: "Dummy 1 state")
:sys.get_state(GenServer.whereis(:dummy2)) |> IO.inspect(label: "Dummy 2 state")
:sys.get_state(GenServer.whereis(:dummy3)) |> IO.inspect(label: "Dummy 3 state")
Kino.Process.render_sup_tree(supervisor_pid)
Resources
- The images for the different restart stragies are taken from the erlang documentation