ATP Benchmark Runner smoke workflow

examples/benchmark_smoke.livemd

@penthooose

atp_benchmark_runner

Share to X

Share to Bluesky

More notebooks

ATP Benchmark Runner smoke workflow

This notebook demonstrates the intended local/Livebook workflow:

configure local TPTP storage or Livebook cache
select/download TPTP problems
inspect and optionally build prover Apptainer images on HPC
submit a small benchmark run
monitor progress and render a report

It is intentionally conservative: cells default to bundled smoke examples and dry-run planning until you provide a real hpc_connect session.

Install local project

Mix.install([
  {:atp_benchmark_runner, path: Path.expand("..", __DIR__)}
])

Prepare TPTP problems

Use the Kino panel when running in Livebook. Outside Livebook, the function returns plain fallback data.

AtpBenchmarkRunner.tptp_panel(
  install_examples: true,
  limit: 20,
  cache_subdir: "tptp"
)

For a deterministic smoke run, install the bundled examples and load them directly:

tptp_root =
  AtpBenchmarkRunner.Config.tptp_dir(
    env_file: Path.expand("../.env", __DIR__)
  )

AtpBenchmarkRunner.install_tptp_examples!(root_dir: tptp_root)

problems =
  AtpBenchmarkRunner.load_tptp_problems(
    root_dir: tptp_root,
    include_axioms?: true,
    limit: 10
  )

Enum.map(problems, &amp;{&amp;1.name, &amp;1.logic, &amp;1.rating, &amp;1.expected_status})

Create an HPC session

Fill in your FAU/NHR username and connection details. The session shape follows hpc_connect; no connection is opened until a function needs SSH.

session =
  HpcConnect.new_session(:fritz,
    username: System.fetch_env!("HPC_USER"),
    work_dir: System.get_env("HPC_WORK_DIR", "$HOME/.cache/hpc_connect"),
    vault_dir: System.get_env("HPC_VAULT_DIR", "$HPCVAULT")
  )

Inspect prover images

The ATP runner owns the prover .def files under priv/provers/*/apptainer.def. It uploads those via hpc_connect and reuses the generic hpc_connect build_sif.sh script for building .sif files.

provers = [:tableaux, :vampire, :eprover, :cvc5] |> Enum.map(&amp;AtpBenchmarkRunner.Prover.builtin!/1)

AtpBenchmarkRunner.image_build_plan(session, provers)

Only run the next cell when you want to upload/build images on the cluster:

# AtpBenchmarkRunner.upload_prover_definitions!(session, provers)
# AtpBenchmarkRunner.build_prover_images!(session, provers)
# AtpBenchmarkRunner.smoke_validate_images!(session, provers)

Sync problems and create a run

remote_problems =
  AtpBenchmarkRunner.sync_tptp_problems!(session, problems,
    remote_tptp_dir: System.get_env("ATP_BENCHMARK_RUNNER_REMOTE_TPTP_DIR", "$HPCVAULT/atp_benchmark_runner/tptp")
  )

run =
  AtpBenchmarkRunner.new_run(
    title: "ATP smoke benchmark",
    cluster: :fritz,
    partition: "singlenode",
    walltime: "00:30:00",
    problem_timeout_seconds: 30,
    max_parallel_jobs: 4,
    problems: remote_problems,
    provers: provers,
    remote_root: System.get_env("ATP_BENCHMARK_RUNNER_REMOTE_RUN_DIR", "$HPCVAULT/atp_benchmark_runner/runs")
  )

AtpBenchmarkRunner.save_run!(run)

Dry-run plan and submit

Start with a dry-run plan to inspect generated scripts and remote paths.

AtpBenchmarkRunner.plan(session, run)

Submit only after the plan and image paths look correct:

# submitted_run = AtpBenchmarkRunner.submit(session, run)

Monitor and collect results

# AtpBenchmarkRunner.monitor_panel(session, submitted_run)

# results = AtpBenchmarkRunner.collect_results(session, submitted_run, include_raw_output: false)
# AtpBenchmarkRunner.save_results!(submitted_run, results)
# AtpBenchmarkRunner.report_panel(results, submitted_run)

Other notebooks:

@DeSchoel

Elixir_Curriculum

Habit Tracker

habit_tracker.livemd

tutorial beginner jason kino youtube hidden_cell

2026-1-10
@chops

elixir-phoenix

Dialyzer Error Debugger

dialyzer_debugger.livemd

tutorial advanced debugging kino kino_vega_lite jason

2025-11-11
aar2dee2
@aar2dee2

elixir-web-scraper

Web Scraper

main.livemd

tutorial advanced intermediate kino floki jason httpoison hackney

2023-7-31
@DockYard-Academy

curriculum

Polymorphism

deprecated_polymorphism.livemd

tutorial intermediate jason kino youtube hidden_cell

2023-3-21
Chris Martin
@trbngr

elixir_cqrs_tools

Using cqrs_tools with Commanded

commanded.livemd

tutorial advanced gen-server otp commanded cqrs_tools ecto etso jason

2022-8-18
@DeSchoel

Elixir_Curriculum

Mailbox Server

deprecated_mailbox_server.livemd

tutorial advanced gen-server jason kino youtube hidden_cell

2026-5-23
Ryo Wakabayashi
@RyoWakabayashi

elixir-learning

Japanese embedding with Ollama Ruri

ruri_embedding_with_ollama.livemd

advanced ai nx exla kino hnswlib ollama

2026-5-31

Back