Notesclub

created by hec & contributors

terms privacy

"compute" retries

lib/examples/compute_retries.livemd

@shipworthy

journey

Share to X

Share to Bluesky

More notebooks

“compute” retries

# [Optional] Setting Build Key, see https://gojourney.dev/your_keys
# (Using "Journey Livebook Demo" build key)
System.put_env("JOURNEY_BUILD_KEY", "B27AXHMERm2Z6ehZhL49v")

Mix.install(
  [
    {:ecto_sql, "~> 3.13"},
    {:postgrex, "~> 0.22"},
    {:jason, "~> 1.4"},
    {:journey, "~> 0.10"},
    {:kino, "~> 0.19"}
  ],
  start_applications: false
)

Application.put_env(:journey, :log_level, :warning)

# This livebook requires a PostgreSQL database.
# If you don't have one running, you can start one with Docker:
# docker run --rm --name postgres -p 5432:5432 -e POSTGRES_PASSWORD=postgres -d postgres:16

# Update this configuration to point to your database server
Application.put_env(:journey, Journey.Repo,
  database: "journey_compute_retries",
  username: "postgres",
  password: "postgres",
  hostname: "localhost",
  log: false,
  port: 5432
)

Application.put_env(:journey, :ecto_repos, [Journey.Repo])

Journey.Repo.__adapter__().storage_up(Journey.Repo.config())

Application.loaded_applications()
|> Enum.map(fn {app, _, _} -> app end)
|> Enum.each(&amp;Application.ensure_all_started/1)

DB Setup

This livebook requires a PostgreSQL database. If you don’t have one running, you can start one with Docker:

docker run --rm --name postgres -p 5432:5432 -e POSTGRES_PASSWORD=postgres -d postgres:16

What We’ll Cover

In this example, we’ll look into how Journey handles compute failures. What happens if a compute node’s function tries to send an email but the email service is down?

Spoiler alert: Journey will try a few times, and give up. Once the email service is back up, you can kick off another computation using a helper function.

In this livebook, we will create a simple graph with a compute node whose computation function returns an error, and observe Journey’s retry behavior:

the failing computation will be attempted by journey, up to max_retries times, which we set to 4 (default: 3),
once attempts are exhausted, the computation will fail,
once you fixed the underlying error (or think you fixed the underlying error;), you can kick the computation to try again, with Journey.Tools.retry_computation/2,
introspection tools (mermaid diagram - Journey.Tools.generate_mermaid_execution/1, execution textual introspection – Journey.Tools.introspect/1) show you the status,
execution itself has more metadata on computations, if you need more insight.

Define the Graph

import Journey.Node

graph = Journey.new_graph(
  "Welcome, but failing",
  "v1",
  [
    input(:name),
    compute(
      :greeting,
      [:name],
      fn values ->
        now = DateTime.utc_now() |> Calendar.strftime("%H:%M:%S UTC")
        welcome = "Hello, #{values.name}, at #{now}, 🤞!"
        IO.puts(welcome)
        {:error, "oh no, failed, #{now}"}
      end,
      # Overriding the default of 3 attempts.
      max_retries: 4
    )
  ]
); :ok

:ok

Visualize the graph:

  graph
  |> Journey.Tools.generate_mermaid_graph()
  |> Kino.Mermaid.new()

graph TD
    %% Graph
    subgraph Graph["🧩 'Welcome, but failing', version v1"]
        execution_id[execution_id]
        last_updated_at[last_updated_at]
        name[name]
        greeting[["greeting
(anonymous fn)"]]

        name -->  greeting
    end

    %% Styling
    classDef defaultNode fill:#f8f9fa,stroke:#495057,stroke-width:2px,color:#000000

    %% Apply styles to nodes
    class execution_id,last_updated_at,name,greeting defaultNode

Start an Execution

execution = Journey.start(graph); :ok

:ok

In the new execution the :greeting computation is waiting for :name to be set.

As seen on the diagram:

execution.id
|> Journey.Tools.generate_mermaid_execution()
|> Kino.Mermaid.new()

graph TD
    %% Graph
    subgraph Graph["🧩 'Welcome, but failing', version v1, EXECG5217Z92XXJA1BM8R3LG"]
        execution_id["✅ execution_id"]
        last_updated_at["✅ last_updated_at"]
        name["⬜ name"]
        greeting[["🚫 greeting
(anonymous fn)"]]

        name -->  greeting
    end

    %% Styling
    classDef setNode fill:#e1f5fe,stroke:#01579b,stroke-width:2px,color:#000000
    classDef computingNode fill:#fff8e1,stroke:#f57f17,stroke-width:2px,color:#000000
    classDef errorNode fill:#f8bbd0,stroke:#b71c1c,stroke-width:2px,color:#000000
    classDef neutralNode fill:#f8f9fa,stroke:#495057,stroke-width:2px,color:#000000

    %% Apply styles to nodes
    class last_updated_at,execution_id setNode
    class greeting,name neutralNode

As seen in the values:

Journey.values_all(execution)

%{
  name: :not_set,
  last_updated_at: {:set, 1776922650},
  execution_id: {:set, "EXECG5217Z92XXJA1BM8R3LG"},
  greeting: :not_set
}

As seen on the textual introspection:

Journey.Tools.introspect(execution.id) |> IO.puts()

Execution summary:
- ID: 'EXECG5217Z92XXJA1BM8R3LG'
- Graph: 'Welcome, but failing' | 'v1'
- Archived at: not archived
- Created at: 2026-04-23 05:37:30Z UTC | 0 seconds ago
- Last updated at: 2026-04-23 05:37:30Z UTC | 0 seconds ago
- Duration: 0 seconds
- Revision: 0
- # of Values: 2 (set) / 4 (total)
- # of Computations: 1

Values:
- Set:
  - execution_id: 'EXECG5217Z92XXJA1BM8R3LG' | :input
    set at 2026-04-23 05:37:30Z | rev: 0

  - last_updated_at: '1776922650' | :input
    set at 2026-04-23 05:37:30Z | rev: 0


- Not set:
  - greeting:  | :compute
  - name:  | :input  

Computations:
- Completed:


- Outstanding:
  - greeting: ⬜ :not_set (not yet attempted) | :compute
       🛑 :name | &provided?/1

:ok

`:name` is Set -> `:greeting` is Computing with Retries

We’ll set the value for :name, and watch the :greeting computation get unblocked, and fail after a few attempts.

execution = 
  execution
  |> Journey.set(:name, "Luigi"); :ok

:ok

Journey.get below waits for the result, and returns an error once the computation’s 4 attempts are exhausted:

(A side note: retries happen with a small randomized pause – a few seconds – between attempts. Proper backoff is on the roadmap.)

Journey.get(execution, :greeting, wait: :any, timeout: 120_000)

Hello, Luigi, at 05:37:30 UTC, 🤞!

22:37:30.132 [warning] Worker [EXECG5217Z92XXJA1BM8R3LG.CMPET6LMRBXMXGA771YJX9E.greeting] [Welcome, but failing]: async computation completed with an error
Hello, Luigi, at 05:37:31 UTC, 🤞!

22:37:31.706 [warning] Worker [EXECG5217Z92XXJA1BM8R3LG.CMP06YLV03DXGEVYLX46BD1.greeting] [Welcome, but failing]: async computation completed with an error
Hello, Luigi, at 05:37:32 UTC, 🤞!

22:37:32.980 [warning] Worker [EXECG5217Z92XXJA1BM8R3LG.CMP9Y72L34RL30VXH747J21.greeting] [Welcome, but failing]: async computation completed with an error
Hello, Luigi, at 05:37:41 UTC, 🤞!

22:37:41.175 [warning] Worker [EXECG5217Z92XXJA1BM8R3LG.CMP6RL4RJGG99AEBE4Z53T3.greeting] [Welcome, but failing]: async computation completed with an error

{:error, :computation_failed}

The computation is now failed, as seen on the diagram:

execution.id
|> Journey.Tools.generate_mermaid_execution()
|> Kino.Mermaid.new()

graph TD
    %% Graph
    subgraph Graph["🧩 'Welcome, but failing', version v1, EXECG5217Z92XXJA1BM8R3LG"]
        execution_id["✅ execution_id"]
        last_updated_at["✅ last_updated_at"]
        name["✅ name"]
        greeting[["❌ greeting
(anonymous fn)"]]

        name -->  greeting
    end

    %% Styling
    classDef setNode fill:#e1f5fe,stroke:#01579b,stroke-width:2px,color:#000000
    classDef computingNode fill:#fff8e1,stroke:#f57f17,stroke-width:2px,color:#000000
    classDef errorNode fill:#f8bbd0,stroke:#b71c1c,stroke-width:2px,color:#000000
    classDef neutralNode fill:#f8f9fa,stroke:#495057,stroke-width:2px,color:#000000

    %% Apply styles to nodes
    class name,last_updated_at,execution_id setNode
    class greeting errorNode

No :greeting value has been set:

Journey.values_all(execution)

%{
  name: {:set, "Luigi"},
  last_updated_at: {:set, 1776922650},
  execution_id: {:set, "EXECG5217Z92XXJA1BM8R3LG"},
  greeting: :not_set
}

And introspect/1 shows the failed computation attempts, including the errors reported by each of the computations:

Journey.Tools.introspect(execution.id) |> IO.puts()

Execution summary:
- ID: 'EXECG5217Z92XXJA1BM8R3LG'
- Graph: 'Welcome, but failing' | 'v1'
- Archived at: not archived
- Created at: 2026-04-23 05:37:30Z UTC | 14 seconds ago
- Last updated at: 2026-04-23 05:37:41Z UTC | 3 seconds ago
- Duration: 11 seconds
- Revision: 9
- # of Values: 3 (set) / 4 (total)
- # of Computations: 4

Values:
- Set:
  - last_updated_at: '1776922650' | :input
    set at 2026-04-23 05:37:30Z | rev: 1

  - name: '"Luigi"' | :input
    set at 2026-04-23 05:37:30Z | rev: 1

  - execution_id: 'EXECG5217Z92XXJA1BM8R3LG' | :input
    set at 2026-04-23 05:37:30Z | rev: 0


- Not set:
  - greeting:  | :compute  

Computations:
- Completed:
  - :greeting (CMP6RL4RJGG99AEBE4Z53T3): ❌ :failed | :compute | rev 9
    started: 2026-04-23 05:37:41Z | completed: 2026-04-23 05:37:41Z (0s)
    inputs used:
       :name (rev 1)
    error: "oh no, failed, 05:37:41 UTC"

  - :greeting (CMP9Y72L34RL30VXH747J21): ❌ :failed | :compute | rev 7
    started: 2026-04-23 05:37:32Z | completed: 2026-04-23 05:37:32Z (0s)
    inputs used:
       :name (rev 1)
    error: "oh no, failed, 05:37:32 UTC"

  - :greeting (CMP06YLV03DXGEVYLX46BD1): ❌ :failed | :compute | rev 5
    started: 2026-04-23 05:37:31Z | completed: 2026-04-23 05:37:31Z (0s)
    inputs used:
       :name (rev 1)
    error: "oh no, failed, 05:37:31 UTC"

  - :greeting (CMPET6LMRBXMXGA771YJX9E): ❌ :failed | :compute | rev 3
    started: 2026-04-23 05:37:30Z | completed: 2026-04-23 05:37:30Z (0s)
    inputs used:
       :name (rev 1)
    error: "oh no, failed, 05:37:30 UTC"

- Outstanding:

:ok

Underlying Problem Solved? Invoke Another [re-]Computation (Spoiler: It Wasn’t Solved)

Now, let’s say you think you fixed the root cause of the failure, and want to retry the computation. retry_computation/2 to the rescue.

Calling retry_computation/2 creates another computation attempt:

execution = Journey.Tools.retry_computation(execution.id, :greeting); :ok

:ok

Journey.get(execution, :greeting, wait: {:newer_than, execution.revision}, timeout: 120_000)

Hello, Luigi, at 05:37:44 UTC, 🤞!

22:37:44.837 [warning] Worker [EXECG5217Z92XXJA1BM8R3LG.CMPBTHXTZ61EG875VG2Y9HB.greeting] [Welcome, but failing]: async computation completed with an error

{:error, :computation_failed}

Not surprisingly, the computation is still failing.

Journey.values_all(execution)

%{
  name: {:set, "Luigi"},
  last_updated_at: {:set, 1776922650},
  execution_id: {:set, "EXECG5217Z92XXJA1BM8R3LG"},
  greeting: :not_set
}

execution.id
|> Journey.Tools.generate_mermaid_execution()
|> Kino.Mermaid.new()

graph TD
    %% Graph
    subgraph Graph["🧩 'Welcome, but failing', version v1, EXECG5217Z92XXJA1BM8R3LG"]
        execution_id["✅ execution_id"]
        last_updated_at["✅ last_updated_at"]
        name["✅ name"]
        greeting[["❌ greeting
(anonymous fn)"]]

        name -->  greeting
    end

    %% Styling
    classDef setNode fill:#e1f5fe,stroke:#01579b,stroke-width:2px,color:#000000
    classDef computingNode fill:#fff8e1,stroke:#f57f17,stroke-width:2px,color:#000000
    classDef errorNode fill:#f8bbd0,stroke:#b71c1c,stroke-width:2px,color:#000000
    classDef neutralNode fill:#f8f9fa,stroke:#495057,stroke-width:2px,color:#000000

    %% Apply styles to nodes
    class name,last_updated_at,execution_id setNode
    class greeting errorNode

Introspection now includes one more failed computation:

Journey.Tools.introspect(execution.id) |> IO.puts()

Execution summary:
- ID: 'EXECG5217Z92XXJA1BM8R3LG'
- Graph: 'Welcome, but failing' | 'v1'
- Archived at: not archived
- Created at: 2026-04-23 05:37:30Z UTC | 15 seconds ago
- Last updated at: 2026-04-23 05:37:44Z UTC | 1 seconds ago
- Duration: 14 seconds
- Revision: 11
- # of Values: 3 (set) / 4 (total)
- # of Computations: 5

Values:
- Set:
  - last_updated_at: '1776922650' | :input
    set at 2026-04-23 05:37:30Z | rev: 1

  - name: '"Luigi"' | :input
    set at 2026-04-23 05:37:30Z | rev: 1

  - execution_id: 'EXECG5217Z92XXJA1BM8R3LG' | :input
    set at 2026-04-23 05:37:30Z | rev: 0


- Not set:
  - greeting:  | :compute  

Computations:
- Completed:
  - :greeting (CMPBTHXTZ61EG875VG2Y9HB): ❌ :failed | :compute | rev 11
    started: 2026-04-23 05:37:44Z | completed: 2026-04-23 05:37:44Z (0s)
    inputs used:
       :name (rev 1)
    error: "oh no, failed, 05:37:44 UTC"

  - :greeting (CMP6RL4RJGG99AEBE4Z53T3): ❌ :failed | :compute | rev 9
    started: 2026-04-23 05:37:41Z | completed: 2026-04-23 05:37:41Z (0s)
    inputs used:
       :name (rev 1)
    error: "oh no, failed, 05:37:41 UTC"

  - :greeting (CMP9Y72L34RL30VXH747J21): ❌ :failed | :compute | rev 7
    started: 2026-04-23 05:37:32Z | completed: 2026-04-23 05:37:32Z (0s)
    inputs used:
       :name (rev 1)
    error: "oh no, failed, 05:37:32 UTC"

  - :greeting (CMP06YLV03DXGEVYLX46BD1): ❌ :failed | :compute | rev 5
    started: 2026-04-23 05:37:31Z | completed: 2026-04-23 05:37:31Z (0s)
    inputs used:
       :name (rev 1)
    error: "oh no, failed, 05:37:31 UTC"

  - :greeting (CMPET6LMRBXMXGA771YJX9E): ❌ :failed | :compute | rev 3
    started: 2026-04-23 05:37:30Z | completed: 2026-04-23 05:37:30Z (0s)
    inputs used:
       :name (rev 1)
    error: "oh no, failed, 05:37:30 UTC"

- Outstanding:

:ok

If the information you get via introspection tools is not sufficient, you can load the execution itself, and examine it by hand. Here are a few most recent computations in this execution:

execution = Journey.load(execution.id)
execution.computations |> Enum.take(3)

[
  %Journey.Persistence.Schema.Execution.Computation{
    __meta__: #Ecto.Schema.Metadata<:loaded, "computations">,
    id: "CMPBTHXTZ61EG875VG2Y9HB",
    execution_id: "EXECG5217Z92XXJA1BM8R3LG",
    execution: #Ecto.Association.NotLoaded,
    node_name: :greeting,
    computation_type: :compute,
    state: :failed,
    ex_revision_at_start: 10,
    ex_revision_at_completion: 11,
    scheduled_time: nil,
    start_time: 1776922664,
    completion_time: 1776922664,
    deadline: 1776922724,
    last_heartbeat_at: nil,
    heartbeat_deadline: 1776922904,
    error_details: "\"oh no, failed, 05:37:44 UTC\"",
    computed_with: %{name: 1},
    inserted_at: 1776922664,
    updated_at: 1776922664
  },
  %Journey.Persistence.Schema.Execution.Computation{
    __meta__: #Ecto.Schema.Metadata<:loaded, "computations">,
    id: "CMP6RL4RJGG99AEBE4Z53T3",
    execution_id: "EXECG5217Z92XXJA1BM8R3LG",
    execution: #Ecto.Association.NotLoaded,
    node_name: :greeting,
    computation_type: :compute,
    state: :failed,
    ex_revision_at_start: 8,
    ex_revision_at_completion: 9,
    scheduled_time: nil,
    start_time: 1776922661,
    completion_time: 1776922661,
    deadline: 1776922721,
    last_heartbeat_at: nil,
    heartbeat_deadline: 1776922901,
    error_details: "\"oh no, failed, 05:37:41 UTC\"",
    computed_with: %{name: 1},
    inserted_at: 1776922652,
    updated_at: 1776922661
  },
  %Journey.Persistence.Schema.Execution.Computation{
    __meta__: #Ecto.Schema.Metadata<:loaded, "computations">,
    id: "CMP9Y72L34RL30VXH747J21",
    execution_id: "EXECG5217Z92XXJA1BM8R3LG",
    execution: #Ecto.Association.NotLoaded,
    node_name: :greeting,
    computation_type: :compute,
    state: :failed,
    ex_revision_at_start: 6,
    ex_revision_at_completion: 7,
    scheduled_time: nil,
    start_time: 1776922652,
    completion_time: 1776922652,
    deadline: 1776922712,
    last_heartbeat_at: nil,
    heartbeat_deadline: 1776922892,
    error_details: "\"oh no, failed, 05:37:32 UTC\"",
    computed_with: %{name: 1},
    inserted_at: 1776922651,
    updated_at: 1776922652
  }
]

Summary

In this Livebook, we setup a graph whose compute node’s function returns an error, and we observed journey retrying the computation, subject to the node’s retry policy (the max_retries: 4 in the graph definition overrode the default value of 3).

We also looked at the state of the execution, by rendering its mermaid graph, looking at its values, and doing in-depth introspection with Journey.Tools.introspect/1.

We also kicked off a recomputation on a failed node, with Journey.Tools.retry_computation/2, which, given the nature of our failure mode (a hardcoded error;), predictably did not fix the problem.

We also took a glimpse at the computation portion of the complete execution structure.