Powered by AppSignal & Oban Pro

Supervision

supervision.livemd

Supervision

In the chapter about links, we learned that processes are isolated: when one process crashes, others can run just fine. But what to do with the crashed process? We can use a Supervisor to have it restarted automatically!

To demonstrate how supervisors work, we’ll create two GenServers: Executor and Throttler. The executor is a simple GenServer that prints the data sent to it:

defmodule Executor do
  use GenServer

  def start_link(options \\ []) do
    # Here we pass a new option to the GenServer.start_link: name.
    # It's a wrapper over `Process.register/2` that we learned
    # in the chapter about message passing. It allows to communicate
    # with executor using the `:executor` name instead of a PID
    GenServer.start_link(__MODULE__, options, name: :executor)
  end

  def request(data) do
    # We use the name to communicate with the executor
    GenServer.cast(:executor, {:request, data})
  end

  @impl true
  def init(_options) do
    # We print that the executor is starting,
    # so we can see when it's restarted
    IO.puts("Starting Executor, PID: #{inspect(self())}")
    {:ok, %{}}
  end

  @impl true
  def handle_cast({:request, data}, state) do
    IO.puts(data)
    {:noreply, state}
  end
end

Throttler, on the other hand, limits the numbers of requests sent to the executor:

defmodule Throttler do
  use GenServer

  def start_link(options \\ []) do
    GenServer.start_link(__MODULE__, options, name: :throttler)
  end

  def request(data) do
    GenServer.call(:throttler, {:request, data})
  end

  @impl true
  def init(options) do
    IO.puts("Starting Throttler, PID: #{inspect(self())}")
    # throttle_time is the smallest time between two requests
    throttle_time = Keyword.get(options, :throttle_time, 1000)

    current_time = System.monotonic_time(:millisecond)

    # Cleverly set the init time, so we don't throttle the first request
    init_time = current_time - throttle_time

    {:ok, %{last_request_time: init_time, throttle_time: throttle_time}}
  end

  @impl true
  def handle_call({:request, data}, _from, state) do
    %{last_request_time: last_request_time, throttle_time: throttle_time} = state
    current_time = System.monotonic_time(:millisecond)

    if current_time - last_request_time > throttle_time do
      # If the previous request was at least `throttle_time`
      # earlier, pass it to the executor and reply `:ok`.
      # Because we named the executor earlier, we don't
      # need to know its PID.
      Executor.request(data)
      state = %{state | last_request_time: current_time}
      {:reply, :ok, state}
    else
      # Otherwise, reply with an error
      {:reply, {:error, :rate_limit_exceeded}, state}
    end
  end
end

We could spawn our GenServers by calling their start_link/1 functions. This time, we’ll start a supervisor and let it spawn our GenServers. To do that, we’ll create child specs.

A child spec is a map, that contains at least id and start keys:

  • id - identifies the process in the supervisor
  • start - a tuple with module, function and list of arguments, so-called MFA tuple. The supervisor calls this function to start the process. Here, we pass appropriate start_link/1 functions.
throttler_options = [throttle_time: 1000]

throttler_spec = %{
  id: Throttler,
  start: {Throttler, :start_link, [throttler_options]}
}

executor_spec = %{
  id: Executor,
  start: {Executor, :start_link, []}
}

Now we can call Supervisor.start_link/2 to spawn a supervisor, passing our child specs to it. Each child spec spawns a process - we call it a child processes of the supervisor. We must also pass the strategy option - we’ll explain it in a moment. Finally, for convenience, we register our supervisor under :my_app_supervisor name.

# Check if the supervisor is already running and if so, stop it.
# This makes it possible to avoid a name conflict when you rerun this cell.
if Process.whereis(:my_app_supervisor) do
  Supervisor.stop(:my_app_supervisor)
end

{:ok, pid} = Supervisor.start_link(
  [throttler_spec, executor_spec],
  strategy: :one_for_one,
  name: :my_app_supervisor
)

Let’s check if it works!

Throttler.request("hello")

💡 Call the above cell quickly a few times. What are the results?

Let’s see what happens when the executor crashes. Let’s pass it something that can’t be printed with IO.puts/1 - a tuple:

Throttler.request({:hello, :world})

# Wait for the executor to crash
Process.sleep(100)

As you can see, the executor process crashed. But, it was restarted! Let’s try requestning the throttler again:

Throttler.request("hello, world!")

As you can see, the executor works - we recovered from the failure!

Note that since the executor crashed and started again, it has now a new PID. However, since it’s registered as :executor, the throttler can still send requests to it. That’s why relying on PIDs is complex and error-prone when communicating with processes that can fail, and name registration is the way to go.

Strategy

In some situations, processes are more tightly coupled than in our case, and if a crash happens to a process, we need to restart others too. That’s when supervisor’s strategy comes into play. So far, we used :one_for_one which means that only crashed process are restarted. There’s also :one_for_all, which restarts all processes if any of them crashes:

# If the supervisor is still running, stop it
if Process.whereis(:my_app_supervisor) do
  Supervisor.stop(:my_app_supervisor)
end

{:ok, pid} = Supervisor.start_link(
  [throttler_spec, executor_spec],
  strategy: :one_for_all,
  name: :my_app_supervisor
)

Let’s send the malformed request again. You should see both processes restarted:

Throttler.request({:hello, :world})

# Wait for the executor to crash
Process.sleep(100)

There’s one more strategy available: :rest_for_all, that restarts only the children that are further in the spec list passed to Supervisor.start_link/2.

💡 Change the strategy to :rest_for_all. How does the behavior change? Reverse the order of the [throttle_spec, executor_spec] and try again.

Restart mode

Restart mode defines when the supervisor should restart given process. There are three options:

  • :permanent - the child process is always restarted (default).
  • :transient - the child process is restarted only if it terminates abnormally, i.e., with an exit reason other than :normal, :shutdown, or {:shutdown, term}.
  • :temporary - the child process is never restarted, regardless of the supervision strategy: any termination (even abnormal) is considered successful.

Restart mode is configured with the :restart key in the child spec. Let’s try changing the child spec for Executor:

executor_spec = %{
  id: Executor,
  start: {Executor, :start_link, []},
  restart: :temporary
}

Let’s restart our supervisor and try sending the malformed request again:

if Process.whereis(:my_app_supervisor) do
  Supervisor.stop(:my_app_supervisor)
end

{:ok, pid} = Supervisor.start_link(
  [throttler_spec, executor_spec],
  strategy: :one_for_all,
  name: :my_app_supervisor
)

Throttler.request({:hello, :world})

# Wait for the executor to crash
Process.sleep(100)

As you can see, the Executor wasn’t restarted and doesn’t work anymore:

Throttler.request("hello, world!")

💡 Try setting restart mode to :transient. Does it differ from :permanent in our case? Why?

The child_spec/1 function

So far, we constructed the child specs by hand. A common pattern is to implement a child_spec/1 function instead. It is called by the supervisor and returns the child spec. Let’s create a simple GenServer to demonstrate it:

defmodule MyServer do
  use GenServer

  def child_spec(options) do
    %{
      id: MyServer,
      start: {MyServer, :start_link, [options]},
    }
  end

  def start_link(options) do
    GenServer.start_link(MyServer, options)
  end

  @impl true
  def init(options) do
    message = Keyword.get(options, :message, "Hello from MyServer!")
    IO.puts(message)
    {:ok, %{}}
  end
end

Now we can pass the MyServer module to the Supervisor:

Supervisor.start_link([MyServer], strategy: :one_for_one)

We can also pass a tuple with module and options:

Supervisor.start_link([{MyServer, message: "Hi there!"}], strategy: :one_for_one)

child_spec/1 made using MyServer less verbose, but more importantly, it moved the responsibility of creating a child spec to the module implementation.

It is still possible to adjust the child spec using Supervisor.child_spec/2. It accepts what you would have passed to the supervisor and a keyword list of entries to be replaced in the generated child spec:

my_server_spec = Supervisor.child_spec({MyServer, message: "Hi there!"}, restart: :transient)
Supervisor.start_link([my_server_spec], strategy: :one_for_one)

Finally, it turns out we actually didn’t need to implement child_spec in MyServer by hand - it’s automatically generated by use GenServer! It means that we can already use it for spawning Throttler and Executor, as they’re GenServers too:

if Process.whereis(:my_app_supervisor) do
  Supervisor.stop(:my_app_supervisor)
end

{:ok, pid} = Supervisor.start_link(
  [{Throttler, throttle_time: 1000}, Executor],
  strategy: :one_for_all,
  name: :my_app_supervisor
)

Let’s confirm it works:

Throttler.request("hello world")

For details about the generated child_spec/1 function, consult GenServer‘s documentation, the part about use GenServer.

To learn more about supervision, check out the documentation for the Supervisor module.