Powered by AppSignal & Oban Pro
Would you like to see your link here? Contact us

Runtime introspection with Mermaid and VegaLite

vm_introspection.livemd

Runtime introspection with Mermaid and VegaLite

Mix.install([
  {:kino, "~> 0.14.0"},
  {:kino_vega_lite, "~> 0.1.11"}
])

Introduction

In this notebook, we will explore how to use Kino and VegaLite to learn how Elixir works and plot how our system behaves over time. If you are not familiar with VegaLite, read its introductory notebook.

Both Kino and VegaLite have been listed as dependencies in the setup cell. We will also import the Kino.Shorts module, which provides conveniences around the Kino API:

import Kino.Shorts

Processes all the way down

All Elixir code runs inside lightweight processes. You can get the identifier of the current process by calling self():

self()

We can create literally millions of those processes. They all run at the same time and they communicate via message passing:

parent = self()

child =
  spawn(fn ->
    receive do
      :ping -> send(parent, :pong)
    end
  end)

send(child, :ping)

receive do
  :pong -> :ponged!
end

The code above starts a child process. This processes waits for a :ping message and replies to the parent with :pong. The parent then returns :ponged! once it receives the pong.

Wouldn’t it be nice if we could also visualize how this communication happens? We can do so by using Kino.Process!

With Kino.Process.render_seq_trace/1, we can ask Kino to draw a Mermaid diagram with all messages to and from the current process:

parent = self()

Kino.Process.render_seq_trace(fn ->
  child =
    spawn(fn ->
      receive do
        :ping -> send(parent, :pong)
      end
    end)

  send(child, :ping)

  receive do
    :pong -> :ponged!
  end
end)

You can use render_seq_trace any time you want to peek under the hood and learn more about any Elixir code. For instance, Elixir has module called Task. This module contains functions that starts processes to perform temporary computations and return their results. What would happen if we were to spawn four tasks, sleep by a random amount, and collect their results?

Kino.Process.render_seq_trace(fn ->
  1..4
  |> Task.async_stream(fn i ->
    Process.sleep(Enum.random(100..300))
    i
  end)
  |> Stream.run()
end)

Every time you execute the above, you will get a different order, and you can visualize how Task.async_stream/2 is starting different processes which return at different times.

Supervisors: a special type of process

In the previous section we learned about one special type of processes, called tasks. There are other types of processes, such as agents and servers, but there is one particular category that stands out, which are the supervisors.

Supervisors are processes that supervisor other processes and restart them in case something goes wrong. We can start our own supervisor like this:

{:ok, supervisor} =
  Supervisor.start_link(
    [
      {Task, fn -> Process.sleep(:infinity) end},
      {Agent, fn -> [] end}
    ],
    strategy: :one_for_one
  )

The above started a supervision tree. Now let’s visualize it:

Kino.Process.render_sup_tree(supervisor)

In fact, you don’t even need to call render_sup_tree. If you simply return a supervisor, Livebook will return a tabbed output where one of the possible visualizations is the supervision tree:

supervisor

And we can go even further!

Sometimes a supervisor may be supervised by another supervisor, which may have be supervised by another supervisor, and so on. This defines a supervision tree. We then package those supervision trees into applications. Kino itself is an application and we can visualize its tree:

Kino.Process.render_app_tree(:kino)

Even Elixir itself is an application:

Kino.Process.render_app_tree(:elixir)

You can get the list of all currently running applications with:

Application.started_applications()

Feel free to dig deeper and visualize those applications at your own pace!

Connecting to a remote node

Processes have one more trick up their sleeve: processes can communicate with each other even if they run on different machines! However, to do so, we must first connect the nodes together.

In order to connect two nodes, we need to know their node name and their cookie. We can get this information for the current Livebook like this:

IO.puts node()
IO.puts Node.get_cookie()

Where does this information come from? Inside notebooks, Livebook takes care of setting those for you. In development, you would specify those directly in the command line:

iex --name my_app@IP -S mix TASK

In production, those are often part of your mix release configuration.

Now let’s connect the nodes together. We will capture the node name and cookie using Kino.Shorts.read_text/2. For convenience, we will also use the node and cookie of the current notebook as default values. This means that, if you don’t have a separate node, the runtime will connect and introspect itself. Let’s render the inputs and connect to the other node:

node =
  "Node"
  |> read_text(default: node())
  |> String.to_atom()

cookie =
  "Cookie"
  |> read_text(default: Node.get_cookie())
  |> String.to_atom()

Node.set_cookie(node, cookie)
true = Node.connect(node)

Having successfully connected, let’s try spawning a process on the remote node!

Node.spawn(node, fn ->
  IO.inspect(node())
end)

Inspecting remote processes

Now we are going to extract some information from the running node on our own!

Let’s get the list of all processes in the system:

remote_pids = :erpc.call(node, Process, :list, [])

Wait, what is this :erpc.call/4 thing? 🤔

Previously we used Node.spawn/2 to run a process on the other node and we used the IO module to get some output. However, now we actually care about the resulting value of Process.list/0!

We could still use Node.spawn/2 to send us the results, which we would receive, but doing that over and over can be quite tedious. Fortunately, :erpc.call/4 does essentially that - evaluates the given function on the remote node and returns its result.

Now, let’s gather more information about each process: 🕵️

processes =
  Enum.map(remote_pids, fn pid ->
    # Extract interesting process information
    info = :erpc.call(node, Process, :info, [pid, [:reductions, :memory, :status]])
    # The result of inspect(pid) is relative to the node
    # where it was called, that's why we call it on the remote node
    pid_inspect = :erpc.call(node, Kernel, :inspect, [pid])

    %{
      pid: pid_inspect,
      reductions: info[:reductions],
      memory: info[:memory],
      status: info[:status]
    }
  end)

Having all that data, we can now visualize it on a scatter plot using VegaLite:

alias VegaLite, as: Vl

Vl.new(width: 600, height: 400)
|> Vl.data_from_values(processes)
|> Vl.mark(:point, tooltip: true)
|> Vl.encode_field(:x, "reductions", type: :quantitative, scale: [type: "log", base: 10])
|> Vl.encode_field(:y, "memory", type: :quantitative, scale: [type: "log", base: 10])
|> Vl.encode_field(:color, "status", type: :nominal)
|> Vl.encode_field(:tooltip, "pid", type: :nominal)

From the plot we can easily see which processes do the most work and take the most memory.

Tracking memory usage

So far we have used VegaLite to draw static plots. However, we can use Kino to dynamically push data to VegaLite. Let’s use them together to plot the runtime memory usage over time.

There’s a very simple way to determine current memory usage in the VM:

:erlang.memory()

Now let’s build a dynamic VegaLite graph. Instead of returning the VegaLite specification as is, we will wrap it in Kino.VegaLite.new/1 to make it dynamic:

memory_plot =
  Vl.new(width: 600, height: 400, padding: 20)
  |> Vl.repeat(
    [layer: ["total", "processes", "atom", "binary", "code", "ets"]],
    Vl.new()
    |> Vl.mark(:line)
    |> Vl.encode_field(:x, "iter", type: :quantitative, title: "Measurement")
    |> Vl.encode_repeat(:y, :layer, type: :quantitative, title: "Memory usage (MB)")
    |> Vl.encode(:color, datum: [repeat: :layer], type: :nominal)
  )
  |> Kino.VegaLite.new()

Now we can use Kino.listen/2 to create a self-updating plot of memory usage over time on the remote node:

Kino.listen(200, fn i ->
  point =
    :erpc.call(node, :erlang, :memory, [])
    |> Enum.map(fn {type, bytes} -> {type, bytes / 1_000_000} end)
    |> Map.new()
    |> Map.put(:iter, i)

  Kino.VegaLite.push(memory_plot, point, window: 1000)
end)

Unless you connected to a production node, the memory usage most likely doesn’t change, so to emulate some spikes within the current notebook, you can run the following code:

Binary usage

for i <- 1..10_000 do
  String.duplicate("cat", i)
end

ETS usage

tid = :ets.new(:users, [:set, :public])

for i <- 1..1_000_000 do
  :ets.insert(tid, {i, "User #{i}"})
end

In this notebook, we learned how powerful Kino and Livebook are together. With them, we can augment existing Elixir constructs, such as supervisors, with rich information, but also to create new visualizations such as VegaLite charts.

In the next notebook, we will learn how to create our custom Kinos with Elixir and JavaScript!