Notesclub

created by hec & contributors

terms privacy

Liquid Neural Networks

notebooks/liquid_neural_networks.livemd

Bradley Fargo

@blasphemetheus

edifice

Share to X

Share to Bluesky

More notebooks

Liquid Neural Networks

Setup

Choose one of the two cells below depending on how you started Livebook.

Standalone (default)

Use this if you started Livebook normally (livebook server). Uncomment the EXLA lines for GPU acceleration.

edifice_dep =
  if File.dir?(Path.expand("~/edifice")) do
    {:edifice, path: Path.expand("~/edifice")}
  else
    {:edifice, "~> 0.2.0"}
  end

Mix.install([
  edifice_dep,
  # {:exla, "~> 0.10"},
  {:kino_vega_lite, "~> 0.1"},
  {:kino, "~> 0.14"}
])

# Nx.global_default_backend(EXLA.Backend)
alias VegaLite, as: Vl

Attached to project (recommended for Nix/CUDA)

Use this if you started Livebook via ./scripts/livebook.sh. See the Architecture Zoo notebook for full setup instructions.

Nx.global_default_backend(EXLA.Backend)
alias VegaLite, as: Vl
IO.puts("Attached mode — using EXLA backend from project node")

Introduction

Most neural networks are like fixed circuits — once trained, every input follows the same rigid path through the same static weights. A liquid neural network is different. It’s more like water flowing through a pipe that reshapes itself based on what’s flowing through it.

The key idea: instead of discrete state updates (like an LSTM or GRU), a liquid network models its hidden state as a continuous dynamical system governed by an ordinary differential equation (ODE):

dx/dt = (-x + f(x, input)) / tau

The state x doesn’t jump from one value to the next — it flows continuously toward a target, with tau controlling how fast it gets there. This makes liquid networks naturally suited to data that arrives at irregular intervals or varies smoothly in time.

Liquid Neural Networks were invented at MIT by Ramin Hasani et al. and published as “Liquid Time-constant Networks” at AAAI 2021. The research group later founded Liquid AI, which raised $250M from AMD to commercialize the technology.

What you’ll learn:

Why continuous-time matters — When discrete step-by-step processing breaks down, and why ODEs are the natural fix
The tau parameter — How a learnable time constant controls whether a neuron responds quickly (reacting to fast changes) or slowly (smoothing over noise)
ODE solvers — How to numerically integrate the liquid ODE, and the tradeoffs between speed and accuracy (Euler vs. RK4 vs. exact)
Building a liquid network — Using Edifice.build(:liquid, ...) to create one, and what each parameter does
When to use liquid vs. transformer/LSTM — The practical tradeoffs and the types of problems where liquid networks shine

Why Liquid?

When would you reach for a liquid network instead of a transformer, LSTM, or Mamba? Here are the scenarios where liquid architectures have a genuine advantage:

1. Irregular Time Series

Most sequence models assume data arrives at regular intervals — one token per step, one frame per tick. But real-world sensor data doesn’t work that way. A heart rate monitor might sample every 0.5 seconds, then every 2 seconds, then miss 10 seconds entirely. An event stream might fire 100 events in one second and zero in the next.

A liquid network’s continuous-time dynamics naturally handle this: the ODE integrates over whatever time interval the data provides. No resampling, no padding, no pretending irregular data is regular.

2. Compact Models

Liquid networks achieve surprisingly good temporal modeling with far fewer parameters than transformers or LSTMs. A 19-neuron liquid network famously learned to steer a self-driving car — a task that typically requires models with thousands of neurons.

The efficiency comes from the ODE dynamics: each neuron is a continuous dynamical system with its own time constant, giving it richer behavior than a simple multiply-and-add neuron. One liquid neuron does the work of many static ones.

3. Interpretability

Each neuron in a liquid network has a time constant tau that you can inspect. Fast tau neurons respond to rapid changes (think: edge detection in a signal). Slow tau neurons integrate over time (think: trend detection). You can literally read the network’s temporal attention by looking at the tau values.

4. Smooth Continuous Signals

Liquid networks excel at signals that vary smoothly in time — sensor readings, control signals, physical measurements. They struggle more with discrete, token-like sequences (text, categorical events) where the continuous-time assumption doesn’t help.

Quick Comparison

Feature	Transformer	LSTM/GRU	Mamba	Liquid
Irregular time series	Poor	Poor	Poor	Excellent
Long-range dependencies	Excellent	Moderate	Good	Moderate
Parameter efficiency	Low	Medium	Good	Excellent
Interpretability	Low	Low	Low	High
Discrete sequences (text)	Excellent	Good	Excellent	Moderate
Continuous signals	Moderate	Good	Good	Excellent
Training speed	Fast (parallel)	Slow (sequential)	Fast	Moderate

Build a Liquid Network

Let’s build our first liquid network using Edifice. We’ll start small and examine the structure.

# Build a small liquid network
# embed_dim: size of each input frame (e.g., 8 sensor readings per timestep)
# hidden_size: internal state dimension (how much "memory" each layer has)
# num_layers: how many LTC layers to stack (more = more processing power)
# window_size: expected sequence length (how many timesteps to process)
# solver: which ODE integration method to use

liquid_model = Edifice.build(:liquid,
  embed_dim: 8,        # 8 input features per timestep
  hidden_size: 32,     # 32-dimensional hidden state
  num_layers: 2,       # 2 LTC layers stacked
  window_size: 30,     # 30 timesteps per sequence
  dropout: 0.1,        # 10% dropout for regularization
  solver: :exact       # analytical solution (fastest, most stable)
)

IO.puts("Liquid model built!")
IO.puts("Input shape:  {batch, 30, 8}  — 30 timesteps, 8 features each")
IO.puts("Output shape: {batch, 32}     — 32-dimensional hidden state from last timestep")
IO.puts("")
IO.puts("Parameters to know:")
IO.puts("  embed_dim     — size of input features per timestep")
IO.puts("  hidden_size   — width of the continuous state (each neuron in this space")
IO.puts("                  has its own time constant tau)")
IO.puts("  num_layers    — stacked LTC blocks (each applies the ODE independently)")
IO.puts("  solver        — ODE integration method (:exact, :euler, :rk4, :dopri5)")
IO.puts("  window_size   — expected sequence length")

What to look for

The model takes {batch, 30, 8} tensors (30 timesteps of 8 features) and outputs {batch, 32} (the hidden state at the final timestep). Unlike a transformer which gives you an output at every position, a liquid network naturally produces a single summary vector — the state the ODE has converged to after processing the whole sequence.

Let’s also build the FFN variant, which interleaves feed-forward networks between LTC layers for more expressive power:

# The FFN variant adds SwiGLU feed-forward layers between LTC blocks.
# Think of it as: LTC processes the temporal dynamics, FFN processes
# the feature interactions. Similar to how transformers alternate
# attention + FFN.

liquid_ffn_model = Edifice.Liquid.build_with_ffn(
  embed_dim: 8,
  hidden_size: 32,
  num_layers: 2,
  window_size: 30,
  dropout: 0.1,
  solver: :exact
)

IO.puts("Liquid + FFN model built!")
IO.puts("Same input/output shapes, but more parameters from the FFN layers.")
IO.puts("Use this when you need more feature-processing power between")
IO.puts("temporal integration steps.")

The ODE Inside

The heart of a liquid network is the ODE that governs each neuron’s state:

dx/dt = (-x + f(x, input)) / tau

Let’s break this down piece by piece:

x is the neuron’s current state (a number that changes over time)
f(x, input) is the “target” — where the state wants to go, computed by a small neural network from the current state and the input
tau is the time constant — how quickly the state moves toward the target
-x + f(x, input) is the “error” — how far the state is from its target
Dividing by tau scales the speed of approach

This is a leaky integrator — without input, the state decays exponentially toward zero. With input, it tracks the target f(x, input) with a lag controlled by tau.

Let’s visualize how tau affects the dynamics.

What to look for

In the plot below, a small tau (like 0.5) means the neuron reacts almost instantly — it snaps to each new input value. A large tau (like 5.0) means the neuron is sluggish — it smooths over rapid changes, responding only to the overall trend. This is what makes liquid networks interpretable: you can inspect tau values to understand what timescale each neuron operates on.

# Simulate the LTC ODE: dx/dt = (-x + activation) / tau
# for different tau values, with a step-function input.
#
# The input jumps from 0 to 1 at t=5, then back to 0 at t=15.
# Watch how different tau values respond to this step change.

dt = 0.05
t_max = 25.0
steps = round(t_max / dt)
times = for i <- 0..steps, do: i * dt

# Step function input: 0 → 1 at t=5, 1 → 0 at t=15
activation_fn = fn t ->
  cond do
    t >= 5.0 and t < 15.0 -> 1.0
    true -> 0.0
  end
end

# Simulate for different tau values using Euler integration
# (We use plain Elixir here, not Nx, so we can easily plot each step)
tau_values = [0.5, 1.0, 2.0, 5.0]

sim_data =
  Enum.flat_map(tau_values, fn tau ->
    # Simulate the ODE: dx/dt = (-x + activation) / tau
    {trajectory, _} =
      Enum.map_reduce(times, 0.0, fn t, x ->
        activation = activation_fn.(t)
        # Euler step: x_new = x + dt * dx/dt
        dx_dt = (-x + activation) / tau
        x_new = x + dt * dx_dt
        {%{"time" => t, "state" => x, "tau" => "tau=#{tau}", "input" => activation}, x_new}
      end)

    trajectory
  end)

# Add the input signal to the plot
input_data =
  Enum.map(times, fn t ->
    %{"time" => t, "state" => activation_fn.(t), "tau" => "input signal", "input" => activation_fn.(t)}
  end)

all_data = input_data ++ sim_data

Vl.new(width: 700, height: 400, title: "LTC Neuron Response to Step Input — Effect of Tau")
|> Vl.data_from_values(all_data)
|> Vl.mark(:line, stroke_width: 2)
|> Vl.encode_field(:x, "time", type: :quantitative, title: "Time")
|> Vl.encode_field(:y, "state", type: :quantitative, title: "Neuron State")
|> Vl.encode_field(:color, "tau", type: :nominal, title: "")
|> Vl.encode_field(:stroke_dash, "tau", type: :nominal)

ODE Solvers: Speed vs. Accuracy

The LTC equation has an exact analytical solution (because it’s linear in x). But more complex ODEs — and real neural networks where f(x, input) is nonlinear — require numerical solvers. Edifice provides five:

Solver	How it works	Speed	When to use
`:exact`	Analytical formula: `x(t+dt) = f + (x - f) * exp(-dt/tau)`	Fastest	Default. Always stable. Works because the LTC ODE is linear in x.
`:euler`	Take one step in the direction of dx/dt	Very fast	Quick prototyping. Can go unstable if dt/tau > 2.
`:midpoint`	Evaluate dx/dt at the midpoint for better accuracy	Fast	Slightly more accurate than Euler, same stability limits.
`:rk4`	Classic 4th-order Runge-Kutta — four evaluations per step	Medium	Good accuracy. The go-to for non-trivial ODEs.
`:dopri5`	Dormand-Prince adaptive stepper — adjusts step size automatically	Slowest	Best accuracy. Useful when you don’t know the right step size.

Let’s compare them on a harder problem — a nonlinear oscillation — to see the accuracy difference.

# Compare ODE solver accuracy on the LTC equation.
# We use Nx here to match how Edifice actually computes things.
#
# Setup: a single neuron with tau=1.0, activation=1.0, starting at x=0.
# The exact solution is: x(t) = 1 - exp(-t)
# We'll compare each solver's answer at t=1.0.

alias Edifice.Utils.ODESolver

# Initial state and parameters
x0 = Nx.tensor([0.0])
activation = Nx.tensor([1.0])
tau = Nx.tensor([1.0])

# Exact analytical answer at t=1.0
exact_answer = 1.0 - :math.exp(-1.0)

# Test each solver (1 integration step, dt=1.0 — a large step to show differences)
solvers = [:exact, :euler, :midpoint, :rk4, :dopri5]

results =
  Enum.map(solvers, fn solver ->
    result = ODESolver.solve_ltc(x0, activation, tau, solver: solver, steps: 1)
    value = Nx.to_number(result[0])
    error = abs(value - exact_answer)

    %{
      "solver" => Atom.to_string(solver),
      "result" => Float.round(value, 6),
      "exact" => Float.round(exact_answer, 6),
      "error" => Float.round(error, 8)
    }
  end)

# Also test with more sub-steps for Euler (to show convergence)
euler_steps = [1, 2, 5, 10, 50]

step_data =
  Enum.map(euler_steps, fn steps ->
    result = ODESolver.solve_ltc(x0, activation, tau, solver: :euler, steps: steps)
    value = Nx.to_number(result[0])
    error = abs(value - exact_answer)
    %{"steps" => steps, "error" => error, "solver" => "euler"}
  end)

IO.puts("ODE Solver Comparison (1 step, dt=1.0)")
IO.puts("Exact answer: #{Float.round(exact_answer, 6)}")
IO.puts(String.duplicate("-", 55))

for r <- results do
  IO.puts("  #{String.pad_trailing(r["solver"], 10)} → #{r["result"]}  (error: #{r["error"]})")
end

IO.puts("\nEuler convergence (more steps → better accuracy):")
for s <- step_data do
  IO.puts("  #{s["steps"]} steps → error: #{Float.round(s["error"], 8)}")
end

What to look for

The exact solver should get the answer perfectly (or near-perfectly, up to floating point precision). Euler with 1 step will have visible error, but adding more sub-steps improves it. RK4 with 1 step is already very accurate. DOPRI5 adapts its step size to hit a target tolerance.

Practical advice: Use :exact (the default) unless you’re experimenting with solver effects. It’s the fastest and most accurate for the LTC equation. Switch to :rk4 or :dopri5 if you modify the ODE to be nonlinear.

Time Series Task: Sine Wave Prediction

Let’s train a liquid network on a real task: predicting the next value in a damped sine wave. This is a classic benchmark for temporal models because it requires capturing both the oscillation frequency and the exponential decay — exactly the kind of smooth, continuous signal that liquid networks are designed for.

We’ll also train an LSTM and GRU on the same data for comparison.

# Generate training data: damped sine waves with varying frequency and decay.
#
# Each sample is a damped sine: y(t) = exp(-decay * t) * sin(freq * t)
# We generate many samples with random frequency and decay parameters,
# so the model must learn the *general pattern* of damped oscillation,
# not just one specific curve.

IO.puts("Generating damped sine wave dataset...")

n_samples = 500
seq_len = 30        # 30 timesteps of input
n_features = 1      # univariate time series

# Generate samples with random parameters
key = Nx.Random.key(42)

{all_sequences, _key} =
  Enum.map_reduce(1..n_samples, key, fn _i, rng ->
    # Random frequency in [1.0, 4.0] and decay in [0.05, 0.3]
    {freq_t, rng} = Nx.Random.uniform(rng, 1.0, 4.0)
    {decay_t, rng} = Nx.Random.uniform(rng, 0.05, 0.3)
    freq = Nx.to_number(freq_t)
    decay = Nx.to_number(decay_t)

    # Generate seq_len + 1 points (last one is the target)
    points =
      for t <- 0..seq_len do
        time = t * 0.1
        :math.exp(-decay * time) * :math.sin(freq * time)
      end

    # Input: first seq_len points, Target: last point
    input = Enum.take(points, seq_len)
    target = List.last(points)

    {{input, target}, rng}
  end)

{inputs, targets} = Enum.unzip(all_sequences)

# Convert to tensors
# x shape: {n_samples, seq_len, 1} — univariate time series
x_data =
  inputs
  |> Enum.map(fn seq -> Enum.map(seq, &amp;[&amp;1]) end)
  |> Nx.tensor()

# y shape: {n_samples, 1} — single prediction target
y_data =
  targets
  |> Enum.map(&amp;[&amp;1])
  |> Nx.tensor()

# Train/test split (80/20)
n_train = round(n_samples * 0.8)
train_x = x_data[0..(n_train - 1)]
train_y = y_data[0..(n_train - 1)]
test_x = x_data[n_train..-1//1]
test_y = y_data[n_train..-1//1]

# Batch the training data
batch_size = 32

train_data =
  Enum.zip(
    Nx.to_batched(train_x, batch_size) |> Enum.to_list(),
    Nx.to_batched(train_y, batch_size) |> Enum.to_list()
  )

IO.puts("Dataset ready!")
IO.puts("  Train: #{n_train} samples, #{length(train_data)} batches")
IO.puts("  Test:  #{n_samples - n_train} samples")
IO.puts("  Input shape:  #{inspect(Nx.shape(train_x))}  (samples, timesteps, features)")
IO.puts("  Target shape: #{inspect(Nx.shape(train_y))}  (samples, 1)")

Let’s visualize a few examples to see what the model needs to learn.

# Plot a few example sequences from the dataset.
# The dot at the end of each line is the target value the model must predict.

sample_indices = [0, 50, 100, 150]

plot_data =
  Enum.flat_map(sample_indices, fn idx ->
    # Input sequence points
    input_points =
      for t <- 0..(seq_len - 1) do
        val = x_data[idx][t][0] |> Nx.to_number()
        %{"time" => t * 0.1, "value" => val, "sample" => "sample_#{idx}", "type" => "input"}
      end

    # Target point
    target_val = y_data[idx][0] |> Nx.to_number()
    target_point = %{"time" => seq_len * 0.1, "value" => target_val, "sample" => "sample_#{idx}", "type" => "target"}

    input_points ++ [target_point]
  end)

line_data = Enum.filter(plot_data, &amp;(&amp;1["type"] == "input"))
point_data = Enum.filter(plot_data, &amp;(&amp;1["type"] == "target"))

line_layer =
  Vl.new()
  |> Vl.data_from_values(line_data)
  |> Vl.mark(:line)
  |> Vl.encode_field(:x, "time", type: :quantitative, title: "Time")
  |> Vl.encode_field(:y, "value", type: :quantitative, title: "Amplitude")
  |> Vl.encode_field(:color, "sample", type: :nominal, title: "")

point_layer =
  Vl.new()
  |> Vl.data_from_values(point_data)
  |> Vl.mark(:circle, size: 80)
  |> Vl.encode_field(:x, "time", type: :quantitative)
  |> Vl.encode_field(:y, "value", type: :quantitative)
  |> Vl.encode_field(:color, "sample", type: :nominal)

Vl.new(width: 700, height: 350, title: "Damped Sine Waves — Lines are Input, Dots are Prediction Targets")
|> Vl.layers([line_layer, point_layer])

Build Three Models

Now we build three models on the same task: Liquid, LSTM, and GRU. All get the same hidden size and the same training setup — the only difference is the architecture.

# Build all three models with comparable sizes.
# Each takes {batch, 30, 1} input and outputs {batch, 1} prediction.

hidden_size = 32

# --- Liquid ---
liquid = Edifice.build(:liquid,
  embed_dim: n_features,
  hidden_size: hidden_size,
  num_layers: 2,
  window_size: seq_len,
  dropout: 0.05,
  solver: :exact
)
|> Axon.dense(1, name: "liquid_head")

# --- LSTM ---
lstm = Edifice.build(:lstm,
  embed_dim: n_features,
  hidden_size: hidden_size,
  num_layers: 2,
  window_size: seq_len,
  dropout: 0.05
)
|> Axon.dense(1, name: "lstm_head")

# --- GRU ---
gru = Edifice.build(:gru,
  embed_dim: n_features,
  hidden_size: hidden_size,
  num_layers: 2,
  window_size: seq_len,
  dropout: 0.05
)
|> Axon.dense(1, name: "gru_head")

IO.puts("Three models built:")
IO.puts("  Liquid — 2-layer LTC with exact solver, hidden=#{hidden_size}")
IO.puts("  LSTM   — 2-layer LSTM, hidden=#{hidden_size}")
IO.puts("  GRU    — 2-layer GRU, hidden=#{hidden_size}")
IO.puts("\nAll take {batch, #{seq_len}, #{n_features}} → {batch, 1}")

Train All Three

We use mean squared error (MSE) loss — appropriate for regression tasks where we’re predicting a continuous value.

# Training helper for regression (MSE loss)
defmodule SineTrainer do
  def train(model, train_data, opts \\ []) do
    epochs = Keyword.get(opts, :epochs, 10)
    lr = Keyword.get(opts, :lr, 1.0e-3)
    label = Keyword.get(opts, :label, "model")

    IO.puts("Training #{label}...")

    model
    |> Axon.Loop.trainer(
      :mean_squared_error,
      Polaris.Optimizers.adam(learning_rate: lr),
      log: 1
    )
    |> Axon.Loop.run(train_data, Axon.ModelState.empty(), epochs: epochs)
  end

  def evaluate(model, state, test_x, test_y) do
    {_init_fn, predict_fn} = Axon.build(model)
    preds = predict_fn.(state, test_x)

    mse =
      Nx.subtract(preds, test_y)
      |> Nx.pow(2)
      |> Nx.mean()
      |> Nx.to_number()

    # Also compute MAE for interpretability
    mae =
      Nx.subtract(preds, test_y)
      |> Nx.abs()
      |> Nx.mean()
      |> Nx.to_number()

    {mse, mae, preds}
  end
end

# Train all three models — 10 epochs each.
# On CPU this takes a few minutes total. Liquid may be slightly slower
# because the ODE integration processes timesteps sequentially.

IO.puts("=" |> String.duplicate(60))
liquid_state = SineTrainer.train(liquid, train_data, epochs: 10, lr: 1.0e-3, label: "Liquid")
IO.puts("")
lstm_state = SineTrainer.train(lstm, train_data, epochs: 10, lr: 1.0e-3, label: "LSTM")
IO.puts("")
gru_state = SineTrainer.train(gru, train_data, epochs: 10, lr: 1.0e-3, label: "GRU")
IO.puts("=" |> String.duplicate(60))

# Evaluate on test set
{liquid_mse, liquid_mae, liquid_preds} = SineTrainer.evaluate(liquid, liquid_state, test_x, test_y)
{lstm_mse, lstm_mae, lstm_preds} = SineTrainer.evaluate(lstm, lstm_state, test_x, test_y)
{gru_mse, gru_mae, gru_preds} = SineTrainer.evaluate(gru, gru_state, test_x, test_y)

IO.puts("\n--- Test Results ---")
IO.puts("  Model    MSE         MAE")
IO.puts("  " <> String.duplicate("-", 35))
IO.puts("  Liquid   #{Float.round(liquid_mse, 6) |> to_string() |> String.pad_trailing(11)} #{Float.round(liquid_mae, 6)}")
IO.puts("  LSTM     #{Float.round(lstm_mse, 6) |> to_string() |> String.pad_trailing(11)} #{Float.round(lstm_mae, 6)}")
IO.puts("  GRU      #{Float.round(gru_mse, 6) |> to_string() |> String.pad_trailing(11)} #{Float.round(gru_mae, 6)}")

What to Look For

On this smooth, continuous signal task (damped sine waves), the liquid network should perform competitively with or better than LSTM/GRU. The continuous-time ODE dynamics are a natural fit for signals that vary smoothly.

Key things to notice:

Liquid excels at smooth signals — the ODE naturally models continuous dynamics without discretization artifacts
LSTM/GRU may train faster — their discrete updates parallelize better on GPUs
If liquid’s error is higher, try increasing integration_steps to 2 or 3, or switching to :rk4 solver for more accurate dynamics

Let’s visualize predictions vs. ground truth.

# Scatter plot: predicted vs actual for all three models.
# Perfect predictions would fall on the diagonal line.

n_test = Nx.axis_size(test_y, 0)

scatter_data =
  Enum.flat_map(0..(n_test - 1), fn i ->
    actual = test_y[i][0] |> Nx.to_number()
    [
      %{"actual" => actual, "predicted" => liquid_preds[i][0] |> Nx.to_number(), "model" => "Liquid"},
      %{"actual" => actual, "predicted" => lstm_preds[i][0] |> Nx.to_number(), "model" => "LSTM"},
      %{"actual" => actual, "predicted" => gru_preds[i][0] |> Nx.to_number(), "model" => "GRU"}
    ]
  end)

# Diagonal reference line
diag_data = [
  %{"actual" => -1.0, "predicted" => -1.0, "model" => "perfect"},
  %{"actual" => 1.0, "predicted" => 1.0, "model" => "perfect"}
]

scatter_layer =
  Vl.new()
  |> Vl.data_from_values(scatter_data)
  |> Vl.mark(:circle, opacity: 0.5, size: 40)
  |> Vl.encode_field(:x, "actual", type: :quantitative, title: "Actual Value")
  |> Vl.encode_field(:y, "predicted", type: :quantitative, title: "Predicted Value")
  |> Vl.encode_field(:color, "model", type: :nominal, title: "")

diag_layer =
  Vl.new()
  |> Vl.data_from_values(diag_data)
  |> Vl.mark(:line, stroke_dash: [4, 4], color: "gray")
  |> Vl.encode_field(:x, "actual", type: :quantitative)
  |> Vl.encode_field(:y, "predicted", type: :quantitative)

Vl.new(width: 500, height: 500, title: "Predicted vs. Actual — Closer to Diagonal = Better")
|> Vl.layers([diag_layer, scatter_layer])

Experiment Suggestions

Now that you have a working liquid network, here are experiments to deepen your understanding:

1. Try Different Solvers

Change the solver: option and compare results. The :exact solver is analytical and perfect for the linear LTC equation. But try :euler with different integration_steps: values to see how numerical accuracy affects learning:

# Experiment: How does solver choice affect training?
# Try uncommenting different lines and re-running the training cell.
#
# solver: :exact                    # Analytical — fastest, most accurate
# solver: :euler, integration_steps: 1   # Crude — fast but may be inaccurate
# solver: :euler, integration_steps: 5   # Better — more sub-steps
# solver: :rk4                     # Good general-purpose solver
# solver: :dopri5                  # Adaptive — best accuracy, slowest

IO.puts("Experiment ideas:")
IO.puts("1. Change solver to :euler and see if accuracy drops")
IO.puts("2. Add integration_steps: 5 to euler and see if it recovers")
IO.puts("3. Try :rk4 — it should match :exact closely")
IO.puts("4. Try :dopri5 — adaptive stepping, most accurate for nonlinear ODEs")

2. Vary the Hidden Size

Liquid networks are known for working with very small hidden sizes. Try hidden_size: 8 or even hidden_size: 4 and see how accuracy compares to LSTM/GRU at the same size. The ODE dynamics give each neuron more expressiveness, so liquid networks often need fewer neurons.

3. Try `build_with_ffn` vs `build`

The FFN variant adds SwiGLU feed-forward layers between LTC blocks. This helps when the task requires complex feature transformations in addition to temporal modeling. Try replacing Edifice.build(:liquid, ...) with Edifice.Liquid.build_with_ffn(...) using the same options.

4. Irregular Sampling

The real superpower of liquid networks is handling irregular time series. Try modifying the data generation to use non-uniform time steps — random intervals between samples. LSTM/GRU will struggle because they assume uniform spacing; the liquid network’s ODE naturally handles variable dt.

5. Longer Sequences

Increase window_size to 60 or 120 and generate longer sequences. Watch how the different architectures handle longer-range dependencies. Liquid networks may struggle with very long sequences (the ODE state can “forget” early inputs), while LSTM’s explicit gates are designed for long-range memory.

What’s Next?

Architecture Zoo notebook — Compare liquid networks against 10+ other architectures on the same task
Sequence Modeling notebook — Deeper dive into temporal modeling with Mamba, RetNet, and other sequence models
The research doc (notebooks/research/architecture_landscape.md) — Where liquid networks fit in the broader ML landscape, and what Edifice should build next