Production Code Upgrades
Introduction
This is document and my learning node for Production Code Upgrades In Elixir Series
Part 1: How code Reloading in Elixir
From Hot Code Reloading in Elixir
1.1 The Erlang Code Server
Upgrading Modules
This part shows Erlang’s code server can run multiple versions of a module simultaneously.
defmodule Counter do
def count(n) do
:timer.sleep(1000)
IO.puts("- #{inspect(self())}: #{n}")
count(n + 0)
end
end
# Run the counter in a spawn process to not block terminal.
spawn(Counter, :count, [0])
Now, we update the counter module to increment the number by 2 instead of 1. \
- Reevaluate the changed module.
- Rerun the above code block to spawn an other process.
We could see two counters:
- the old one is still running with old value
- the new one is using new value.
The Erlang Code Server
- The server can keep two version of a module in memory, one is new (current), another is old.
- When a module is loaded, it becomes the current version.
- Exported function from the old version are replaced by the ones from the new version.
- If a process is already running when a new version of a module is loaded, it will stay on old version.
1.2 Hot Reloading GenServers
Hot Reloading GenServers
defmodule CountServer do
use GenServer
def start_link do
GenServer.start_link(__MODULE__, 0)
end
def init(state) do
Process.send_after(self(), :increment, 1000)
{:ok, state}
end
def handle_info(:increment, n) do
incremented = n + 0
IO.puts("- #{inspect(self())}: #{incremented}")
Process.send_after(self(), :increment, 1000)
{:noreply, incremented}
end
end
Start the counter
{:ok, pid} = CountServer.start_link()
Process.exit(pid, :kill)
Observe
- Change the counter update.
- Recompile the module.
- The previous running process started to use new value immediately.
- We do not need to start a new GenServer.
Why GenServer can start to use new value immediately?
- The GenServer’s module and its spawned state are run in seperate processes.
-
The state which was kep in the GenServer process, was updated by calling out to the
CounterServer
module. -
External function calls, like the GenServer process calling out to the
CounterServer
module, are always done on the current version of the module.
See Deconstructing Elixir’s GenServers for more details.
1.3 Transforming State
Although the state in the GenSever example got transformed correctly by the reloaded version of the CountServer module, there’s one more scenario to look at.
What happens when the new version of the implementation requires a different state?
We need to update the state when we upgrade the module to the new version.
defmodule CountServer do
use GenServer
def start_link do
GenServer.start_link(__MODULE__, 0)
end
def init(state) do
Process.send_after(self(), :increment, 1000)
{:ok, state}
end
def handle_info(:increment, n) do
incremented = n + 2
IO.puts("- #{inspect(self())}: #{incremented}")
Process.send_after(self(), :increment, 1000)
{:noreply, incremented}
end
# ===> added
def code_change(_old_vsn, state, _extra) when rem(state, 2) == 1 do
{:ok, state - 1}
end
def code_change(_old_vsn, state, _extra) do
{:ok, state}
end
end
{:ok, pid} = CountServer.start_link()
# If we test the module from iex, we may need this commands to help to better show the result.
# In Livebook, this is not needed since we could reevaluate each block.
:sys.suspend(pid)
:sys.change_code(pid, CountServer, nil, [])
:sys.resume(pid)
About Backward Compatibility
- When we update GenServer, we need to keep a clause to make it accept old messages so as to provide backward compatibility of the previous version to do a clean upgrade.
-
For instance, if we changed the
handle_info
callback clause without left a previous clause, our GenServer will crash and report “no function clause matching”.
1.4 Code Purge in Elixir
For testing example from: Should You Code Purge in Elixir?
CodePurge.pi()
We could recompile from Elixir code itself in Livebook.
# This use a simple module from `elixir_horizion`.
# So we are using attached node running from mix.
# Recompile the project in a separate shell
ExecCmd.run("mix compile")
# In our iex shell, we reload the module code:
:code.load_file(CodePurge)
Rerun the function, we could see the pi value changed.
CodePurge.pi()
If we reload this module once more, it has error: ‘{:error, :not_purged}’.
:code.load_file(CodePurge)
To solve the problem, we could use :code.purge
and :code.sofe_purge
.
- They are used to handle running old code.
-
:code.purge/1
kills processes running old code. -
:code.sofe_purge/1
fails if there are any processes running old code.
:code.purge(CodePurge)
:code.load_file(CodePurge)
:code.soft_purge(CodePurge)
:code.load_file(CodePurge)
1.5 How Not to Do a Code Upgrade
# lib/code_purge/pi.ex
defmodule CodePurgeV2.Pi do
def start_link do
spawn_link(&server/0)
end
def server do
receive do
{:get, from} ->
send(from, {:ok, 3.14})
CodePurgeV2.Pi.server()
end
end
def get(pid) do
send(pid, {:get, self()})
receive do
{:ok, value} ->
{:ok, value}
after
1000 ->
:error
end
end
end
pid = CodePurgeV2.Pi.start_link()
CodePurgeV2.Pi.get(pid)
# Now, reload the module once (without any actual changes to functions) and
# try to purge it so that you can do the next 'upgrade':
:code.load_file(CodePurgeV2.Pi)
# This will terminate the process of shell process (the Livebook code block will show "Aborted")
:code.purge(CodePurgeV2.Pi)
This because the server didn’t receive messages and so didn’t transition to the new code after the first upgrade.
Your code to follow OPT
behaviour to be safely upgraded.
That’s why you should generally avoid spawn
or spawn_link
because your home-brewed servers or other long-running processes didn’t use OTP
.
(see Demystifying processes in Elixir)
1.6 How To Do a Code Upgrade Using GenServer
defmodule CodePurgeV3 do
use GenServer
def start_link(value \\ 3.14) do
GenServer.start_link(__MODULE__, value)
end
def init(value) do
{:ok, value}
end
def handle_call(:get, _from, value) do
{:reply, value, value}
end
def get(pid) do
GenServer.call(pid, :get)
end
end
Try to upgrade/purge the code of a running process several times.
{:ok, pid} = CodePurgeV3.start_link()
CodePurgeV3.get(pid)
:code.load_file(CodePurgeV3)
:code.purge(CodePurgeV3)
1.7 Keep updating the states of GenServer processes
When update GenServer, we also need to update the state. This is covered in above section: “1.3 Transforming State”.
Part 2: Using Supervisors to Organize Your Elixir Application
From Using Supervisors to Organize Your Elixir Application.
In the previous chapter of this series, we looked at hot code reloading in Elixir and why we should use GenServer to implement long-running processes.
But to organize a whole application, we need one more building block — supervisors. Let’s take a look at supervisors in detail.
defmodule CounterV4 do
use GenServer
require Logger
@interval 100
def start_link(start_from, opts \\ []) do
GenServer.start_link(__MODULE__, start_from, opts)
end
def get(pid) do
GenServer.call(pid, :get)
end
def init(start_from) do
st = %{
current: start_from,
timer: :erlang.start_timer(@interval, self(), :tick)
}
{:ok, st}
end
def handle_call(:get, _from, st) do
{:reply, st.current, st}
end
def handle_info({:timeout, _timer_ref, :tick}, st) do
new_timer = :erlang.start_timer(@interval, self(), :tick)
:erlang.cancel_timer(st.timer)
{:noreply, %{st | current: st.current + 1, timer: new_timer}}
end
end
children = [
{CounterV4, 10000}
]
opts = [strategy: :one_for_one, name: MyApp.Supervisor]
Supervisor.start_link(children, opts)
Start CounterV4
with created supervisor.
[{_, pid, _, _}] = Supervisor.which_children(MyApp.Supervisor)
CounterV4.get(pid)
Kill the child which is CounterV4
.
Process.exit(pid, :shutdown)
We could see the CounterV4
is restarted by the supervisor.
Supervisor.which_children(MyApp.Supervisor)
Let’s see the supervision tree.
Kino.Process.render_sup_tree(MyApp.Supervisor)
2.1 Adding GenServers to Custom Supervisors
First, let us create callback module for MyApp.Supervisor
.
defmodule CounterSup do
use Supervisor
def start_link(start_numbers) do
Supervisor.start_link(__MODULE__, start_numbers, name: __MODULE__)
end
@impl true
def init(start_numbers) do
children =
for start_number <- start_numbers do
# We can't just use `{OurNewApp.Counter, start_number}`
# because we need different ids for children
Supervisor.child_spec({CounterV4, start_number}, id: start_number)
end
Supervisor.init(children, strategy: :one_for_one)
end
end
Update how we start MyApp.Supervisor
.
It doesn’t start CounterV4
directly. Instead, it start CounterSup
.
children = [
{CounterSup, [10000, 20000]}
]
opts = [strategy: :one_for_one, name: MyApp.SupervisorV2]
Supervisor.start_link(children, opts)
Supervisor.which_children(MyApp.SupervisorV2)
Supervisor.which_children(CounterSup)
Let’s visualize the supervision tree.
Kino.Process.render_sup_tree(MyApp.SupervisorV2)
Now, let’s add the 3rd child and visualize the supervision tree.
new_children_spec = Supervisor.child_spec({CounterV4, 30000}, id: 30000)
Supervisor.start_child(CounterSup, new_children_spec)
Supervisor.which_children(CounterSup)
Kino.Process.render_sup_tree(MyApp.SupervisorV2)
We could also remove a child from supervision tree.
How do I programatically to get pid from string? (see: How to convert a pid string from logs into pid values?)
problem_child = IEx.Helpers.pid("0.859.0")
Process.alive?(problem_child)
However, this is not the child_id.
The child_id is the one we specified using Supervisor.child_spec
.
# We have to first terminate a running child then delete it
Supervisor.terminate_child(CounterSup, 10000)
Supervisor.delete_child(CounterSup, 10000)
# Visualize again
Kino.Process.render_sup_tree(MyApp.SupervisorV2)
Let’s add another supervision tree under MyApp.SupervisorV2
children_specs = for n <- [10000, 20000, 30000], do: Supervisor.child_spec({CounterV4, n}, id: n)
second_sup_spec = %{
id: CraftSub,
start: {Supervisor, :start_link, [children_specs, [strategy: :one_for_one]]},
type: :supervisor,
restart: :permanent,
shutdown: 5000
}
Supervisor.start_child(MyApp.SupervisorV2, second_sup_spec)
Supervisor.which_children(MyApp.SupervisorV2)
Kino.Process.render_sup_tree(MyApp.SupervisorV2)
2.2 Examples of Custom Supervisor Usage
Stop all Counter instances by stop their supervisor
Consider a situation in which we want to stop all stoped Counter
instance.
We could do this by stop their supervisor.
defmodule CounterV5 do
use GenServer
require Logger
@interval 100
def start_link(start_from, opts \\ []) do
GenServer.start_link(__MODULE__, start_from, opts)
end
def get(pid) do
GenServer.call(pid, :get)
end
def init(start_from) do
# Updated here
Process.flag(:trap_exit, true)
st = %{
current: start_from,
timer: :erlang.start_timer(@interval, self(), :tick)
}
{:ok, st}
end
def handle_call(:get, _from, st) do
{:reply, st.current, st}
end
def handle_info({:timeout, _timer_ref, :tick}, st) do
new_timer = :erlang.start_timer(@interval, self(), :tick)
:erlang.cancel_timer(st.timer)
{:noreply, %{st | current: st.current + 1, timer: new_timer}}
end
# Updated here
def terminate(reason, st) do
Logger.info("terminating with #{inspect(reason)}, counter is #{st.current}")
end
end
defmodule CounterSupV2 do
use Supervisor
def start_link(start_numbers) do
Supervisor.start_link(__MODULE__, start_numbers, name: __MODULE__)
end
@impl true
def init(start_numbers) do
children =
for start_number <- start_numbers do
# We can't just use `{OurNewApp.Counter, start_number}`
# because we need different ids for children
Supervisor.child_spec({CounterV5, start_number}, id: start_number)
end
Supervisor.init(children, strategy: :one_for_one)
end
end
children = [
{CounterSupV2, [10000, 20000]}
]
opts = [strategy: :one_for_one, name: MyApp.SupervisorV3]
Supervisor.start_link(children, opts)
Kino.Process.render_sup_tree(MyApp.SupervisorV3)
Supervisor.stop(MyApp.SupervisorV3, :normal)
How to stop all Counters gracefully.
The condition of gracefulness is to count up until we reach numbers divisible by 10 (10, 20, 30, etc) before shutdown.
Of course, in our simple example, we may just send ticks to count to the nearest number divisible by 10 in terminate.
Instead, imagine that these events are external end emulate some metrics that we would prefer to aggregate consistently.
To achieve that, we need to do the following modification.
defmodule CounterV6 do
use GenServer
require Logger
@interval 100
def start_link(start_from) do
GenServer.start_link(__MODULE__, start_from)
end
def get(pid) do
GenServer.call(pid, :get)
end
def stop_gracefully(pid) do
GenServer.call(pid, :stop_gracefully)
end
def init(start_from) do
Process.flag(:trap_exit, true)
st = %{
current: start_from,
timer: :erlang.start_timer(@interval, self(), :tick),
terminator: nil
}
{:ok, st}
end
def handle_call(:get, _from, st) do
{:reply, st.current, st}
end
def handle_call(:stop_gracefully, from, st) do
if st.terminator do
{:reply, :already_stopping, st}
else
{:noreply, %{st | terminator: from}}
end
end
def handle_info({:timeout, _timer_ref, :tick}, st) do
:erlang.cancel_timer(st.timer)
new_current = st.current + 1
if st.terminator && rem(new_current, 10) == 0 do
# we are terminating
GenServer.reply(st.terminator, :ok)
{:stop, :normal, %{st | current: new_current, timer: nil}}
else
new_timer = :erlang.start_timer(@interval, self(), :tick)
{:noreply, %{st | current: new_current, timer: new_timer}}
end
end
def terminate(reason, st) do
Logger.info("terminating with #{inspect(reason)}, counter is #{st.current}")
end
end
Let’s test if it works for a single process.
{:ok, pid} = CounterV6.start_link(10000)
CounterV6.stop_gracefully(pid)
After confirm the Counter module could stop gracefull, let’s modify supervisor to make it work with our Counter’s new feature.
In elixir, the application has “prep_stop”, what is the similar callback for supervisor?
The prep_stop
callback is used in applications to perform graceful termination before the application shuts down. Supervisors, on the other hand, don’t have a built-in callback specifically for this purpose.
When a supervisor is stopped, either due to an error or a clean shutdown, it will stop all its child processes as well. This behavior is based on the linked supervision tree where a supervisor is linked to its child processes, and if a supervisor terminates, its children will also be terminated.
If you need to perform some cleanup or graceful termination logic for a supervised process before it is stopped, you can define your own termination function within the supervised process itself. When the process receives a termination signal (e.g., via Process.exit/2
), it can handle the termination and perform any necessary cleanup before exiting.
How to define custom termination function see above usage of terminate/2
.
Please note that if you have child supervisors within your main supervisor, they will also follow the same linked supervision model, and their children will be terminated in a similar way. Therefore, you can implement custom termination logic within individual processes or child supervisors as needed.
This means, we have to create reall application for testing.
The supervisor module has no different from previous example except a new name.
defmodule CounterSupV4 do
use Supervisor
def start_link(start_numbers) do
Supervisor.start_link(__MODULE__, start_numbers, name: __MODULE__)
end
@impl true
def init(start_numbers) do
children =
for start_number <- start_numbers do
# We can't just use `{OurNewApp.Counter, start_number}`
# because we need different ids for children
Supervisor.child_spec(
{CounterV6, start_number},
id: start_number,
restart: :transient
)
end
Supervisor.init(children, strategy: :one_for_one)
end
end
However, the below application callback module could not be tested by executing Livebook.
We need to either modify our existing application or create a new one to test.
defmodule App do
use Application
@impl true
def start(_type, _args) do
# Defines the children we need to start. Here, when App start, we want to start CounterSupV4
children = [
{CounterSupV4, [10000, 20000]}
]
opts = [strategy: :one_for_one, name: MyCounterAppSupervisor]
Supervisor.start_link(children, opts)
end
@impl true
def prep_stop(st) do
stop_tasks =
for {_, pid, _, _} <- Supervisor.which_children(CounterSupV4) do
Task.async(fn ->
:ok = CounterV6.stop_gracefully(pid)
end)
end
Task.await_many(stop_tasks)
st
end
end
Recap steps to add a supervisor to an application.
-
In your application’s
mix.exs
file define the callback module to use, such asdef application do [ extra_applications: [:logger], mod: {App, []} ] end
-
In your application callback module, the application callback’s job is to start a supervision tree.
-
Define
use Application
-
Define which supervisor you want to start by implementing the
start/2
.
-
Define
-
Make sure the supervisors are configed properly. We could start a supervisor in two ways:
-
Use
Supervisor.start_link
, withchildren
andopts
. -
Call supervisor’s
start_link
directly.
-
Use
More details check the application call back.
Part 3: Application Code Upgrades in Elixir
From Application Code Upgrades in Elixir
Must also read through this: Lear you some Erlang – Leveling Up in The Process Quest