Powered by AppSignal & Oban Pro
Would you like to see your link here? Contact us

Meta-programmable functional notebooks

talks/2023/06-lambda-days.livemd

Meta-programmable functional notebooks

Mix.install(
  [
    {:kino, "~> 0.9.0"},
    {:evision, "~> 0.1.21"},
    {:req, "~> 0.3"},
    {:kino_vega_lite, "~> 0.1.7"},
    {:kino_bumblebee, "~> 0.3.0"},
    {:exla, "~> 0.5.1"},
    {:kino_explorer, "~> 0.1.4"}
  ],
  config: [nx: [default_backend: EXLA.Backend]]
)

Intro to Elixir

Elixir is a functional and dynamic programming language that runs on the Erlang VM:

list = ["hello", 123, :banana]

Elixir data structures are immutable by default:

List.delete(list, 123)
list

Elixir supports pattern-matching, polymorphism via protocols, meta-programming, and more. But today, we will focus on its concurrency features. In the Erlang VM, all code runs inside lightweight threads called processes. We can literally create millions of them:

for _ <- 1..1_000_000 do
  spawn(fn -> :ok end)
end

Process communicate by sending messages between them:

child =
  spawn(fn ->
    receive do
      {:ping, caller} -> send(caller, :pong)
    end
  end)

send(child, {:ping, self()})

receive do
  :pong -> :it_worked!
end

And Livebook can helps us see how processes communicate between them:

Kino.Process.render_seq_trace(fn ->
  child =
    spawn(fn ->
      receive do
        {:ping, caller} -> send(caller, :pong)
      end
    end)

  send(child, {:ping, self()})

  receive do
    :pong -> :it_worked!
  end
end)

Maybe you want to see how Elixir can perform multiple tasks at once, scaling on both CPU and IO?

Kino.Process.render_seq_trace(fn ->
  1..4
  |> Task.async_stream(
    fn _ -> Process.sleep(Enum.random(100..300)) end,
    max_concurrency: 4
  )
  |> Stream.run()
end)

Messages can also be distributed across nodes. Let’s try to execute something in the Distributed module:

Distributed.hello_world()

But what if Distributed is defined on another notebook?

Every notebook gets its own Elixir runtime. Each runtime has its own node and cookie:

{node(), Node.get_cookie()}

To talk to other nodes, we need their node and cookie to connect to them:

node =
  Kino.Input.text("Node")
  |> Kino.render()
  |> Kino.Input.read()
  |> String.to_atom()

cookie =
  Kino.Input.text("Cookie")
  |> Kino.render()
  |> Kino.Input.read()
  |> String.to_atom()

Node.set_cookie(node, cookie)
:erpc.call(node, fn -> Distributed.hello_world() end)

Enough about Elixir, let’s talk Livebook!

Livebook: truly reproducible workflows

What makes notebooks hard to reproduce?

flowchart TD;
    root[Sources of irreproducibility];
    ooo[Out of order execution];
    gms[Global mutable state];
    root-->ooo;
    root-->gms;

For example, in Jupyter notebooks, the execution flow is:

flowchart LR;
  state((State));
  c1[Cell A];
  c2[Cell B];
  c3[Cell C];
  state--read #2 -->c1--write #2-->state;
  state--read #3 -->c2--write #3-->state;
  state--read #1 -->c3--write #1-->state;

Notebooks may linearize cells via static and dynamic analysis:

flowchart LR;
  state((State));
  c3[Cell C];
  c1[Cell A];
  c2[Cell B];

  state--read #1 -->c3--write #1-->state;
  state--read #2 -->c1;
  c2-- write #2 -->state;
  c1-->c2;

However, even if ordering is employed, they are still stateful. The following code, in most notebooks, will increment x:

x = 1
x = x + 1

Livebook execution model is fully sequential and there is no global mutable state. It looks like this:

flowchart TD;
  c1[Cell A];
  c2[Cell B];
  c3[Cell C];
  c1--Binding + Environment-->c2--Binding + Environment-->c3;

Now we can track inputs and outputs and, with a pinch of static analysis, we can cache evaluation results and notify when cells become stale.

Still… maybe a single execution flow is too limiting?

Livebook: branched sections

Branched sections allow you to run experiments from your main Livebook branch and also allow for concurrent execution within Livebooks:

flowchart TD;
  subgraph Section #1
    c1[Cell A];
    c2[Cell B];
    c1-->c2;
  end
  subgraph Branch from #1
    c5[Cell E];
    c6[Cell F];
    c2-->c5;
    c5-->c6;
  end
  subgraph Section #2
    c3[Cell C];
    c4[Cell D];
    c2-->c3;
    c3-->c4;
  end
frame = Kino.Frame.new() |> Kino.render()

for _ <- Stream.interval(1000) do
  Kino.Frame.render(frame, :erlang.memory())
end

Livebook: multiplayer runtime

Your code runs in a separate Erlang VM instance using the distribution channels we learned earlier:

flowchart LR;
  subgraph Clients
    b1((Browser #1));
    b2((Browser #2));
  end

  subgraph Livebook
    l[Session];
    b1--WebSockets-->l;
    b2--WebSockets-->l;
  end

  subgraph Runtime
    c[Code];
    bc[Branched code];
    l--Erlang distribution-->c;
    l--Erlang distribution-->bc;
  end

This brings a separation of concern where your code actually knows nothing about Livebook. The Kino library is the one responsible for connecting both sides and supporting additional features such as the rendering of outputs.

Smart cells: meta-programmable notebooks

The Erlang VM provides a great set of tools for observability. Let’s gather information about all processes:

processes =
  for pid <- Process.list() do
    info = Process.info(pid, [:reductions, :memory, :status])

    %{
      pid: inspect(pid),
      reductions: info[:reductions],
      memory: info[:memory],
      status: info[:status]
    }
  end

But how to plot it?

Inspired by the work on “mage” by Mary Beth Kery and co, Livebook has smart cells:

Or what if we want to run a machine learning model?

Smart cells run as part of your code and you can create any Smart cell that you want. They build on top of live outputs and share the same building blocks:

flowchart LR;
  subgraph Clients
    b1((Browser #1));
    b2((Browser #2));
  end

  subgraph Livebook
    l[Session];
    b1--WebSockets-->l;
    b2--WebSockets-->l;
  end

  subgraph Runtime
    c[Code];
    bc[Branched code];
    sc[Smart cell];
    l--Erlang distribution-->c;
    l--Erlang distribution-->bc;
    l--Erlang distribution-->sc;
  end

Erlang VM processes all the way down!

You can also publish Smart cells as packages to Hex.pm or installed them any other Elixir package.

PS: “no code” is a misnomer: it shouldn’t matter if you used a graphical interface or a text editor, as long as it solves the problem at hand, it is code.

Live programming: debugging

We have started exploring Live programming ideas only recently and we’ve already seen how Livebook interacts with your code and the runtime to generate sequential traces.

There is another feature we want to show, which is how can use Elixir pipelines and its dbg macro to manipulate code, as shown in “Unravel: A Fluent Code Explorer for Data Wrangling” by Nischal Shrestha and co:

"Elixir is cool!"
|> String.trim_trailing("!")
|> String.split()
|> Enum.reverse()
|> List.first()
|> dbg()

We have been very excited to see that our generalization scales to different use cases. Here is a community example. Let’s start with an image:

%{body: image} =
  Req.get!("https://raw.githubusercontent.com/pjreddie/darknet/master/data/dog.jpg")

Kino.Image.new(image, :jpeg)

And now let’s transform this image:

alias Evision, as: OpenCV
rotation = OpenCV.getRotationMatrix2D({512 / 2, 512 / 2}, 90, 1)

image
|> OpenCV.imdecode(OpenCV.Constant.cv_IMREAD_ANYCOLOR())
|> OpenCV.blur({9, 9})
|> OpenCV.warpAffine(rotation, {512, 512})
|> OpenCV.rectangle({50, 10}, {125, 60}, {255, 0, 0})
|> OpenCV.ellipse({300, 300}, {100, 200}, 30, 0, 360, {255, 255, 0}, thickness: 3)
|> dbg()

:ok

For more examples, see this Livebook by Ryo Wakabayashi.

Live programming: doctests

Doctests sit at the intersection of documentation and testing. Can we promote it as best-practice?

defmodule HelloWorld do
  @doc """

      iex> HelloWorld.my_addition(1, 2)
      3

      iex> HelloWorld.my_addition(1, 2)
      4

      iex> HelloWorld.my_addition(1, "2")
      3

  """
  def my_addition(a, b) do
    a + b
  end
end