ATP Benchmark Runner smoke workflow
This notebook demonstrates the intended local/Livebook workflow:
- configure local TPTP storage or Livebook cache
- select/download TPTP problems
- inspect and optionally build prover Apptainer images on HPC
- submit a small benchmark run
- monitor progress and render a report
It is intentionally conservative: cells default to bundled smoke examples and dry-run planning until you provide a real hpc_connect session.
Install local project
Mix.install([
{:atp_benchmark_runner, path: Path.expand("..", __DIR__)}
])
Prepare TPTP problems
Use the Kino panel when running in Livebook. Outside Livebook, the function returns plain fallback data.
AtpBenchmarkRunner.tptp_panel(
install_examples: true,
limit: 20,
cache_subdir: "tptp"
)
For a deterministic smoke run, install the bundled examples and load them directly:
tptp_root =
AtpBenchmarkRunner.Config.tptp_dir(
env_file: Path.expand("../.env", __DIR__)
)
AtpBenchmarkRunner.install_tptp_examples!(root_dir: tptp_root)
problems =
AtpBenchmarkRunner.load_tptp_problems(
root_dir: tptp_root,
include_axioms?: true,
limit: 10
)
Enum.map(problems, &{&1.name, &1.logic, &1.rating, &1.expected_status})
Create an HPC session
Fill in your FAU/NHR username and connection details. The session shape follows hpc_connect; no connection is opened until a function needs SSH.
session =
HpcConnect.new_session(:fritz,
username: System.fetch_env!("HPC_USER"),
work_dir: System.get_env("HPC_WORK_DIR", "$HOME/.cache/hpc_connect"),
vault_dir: System.get_env("HPC_VAULT_DIR", "$HPCVAULT")
)
Inspect prover images
The ATP runner owns the prover .def files under priv/provers/*/apptainer.def. It uploads those via hpc_connect and reuses the generic hpc_connect build_sif.sh script for building .sif files.
provers = [:tableaux, :vampire, :eprover, :cvc5] |> Enum.map(&AtpBenchmarkRunner.Prover.builtin!/1)
AtpBenchmarkRunner.image_build_plan(session, provers)
Only run the next cell when you want to upload/build images on the cluster:
# AtpBenchmarkRunner.upload_prover_definitions!(session, provers)
# AtpBenchmarkRunner.build_prover_images!(session, provers)
# AtpBenchmarkRunner.smoke_validate_images!(session, provers)
Sync problems and create a run
remote_problems =
AtpBenchmarkRunner.sync_tptp_problems!(session, problems,
remote_tptp_dir: System.get_env("ATP_BENCHMARK_RUNNER_REMOTE_TPTP_DIR", "$HPCVAULT/atp_benchmark_runner/tptp")
)
run =
AtpBenchmarkRunner.new_run(
title: "ATP smoke benchmark",
cluster: :fritz,
partition: "singlenode",
walltime: "00:30:00",
problem_timeout_seconds: 30,
max_parallel_jobs: 4,
problems: remote_problems,
provers: provers,
remote_root: System.get_env("ATP_BENCHMARK_RUNNER_REMOTE_RUN_DIR", "$HPCVAULT/atp_benchmark_runner/runs")
)
AtpBenchmarkRunner.save_run!(run)
Dry-run plan and submit
Start with a dry-run plan to inspect generated scripts and remote paths.
AtpBenchmarkRunner.plan(session, run)
Submit only after the plan and image paths look correct:
# submitted_run = AtpBenchmarkRunner.submit(session, run)
Monitor and collect results
# AtpBenchmarkRunner.monitor_panel(session, submitted_run)
# results = AtpBenchmarkRunner.collect_results(session, submitted_run, include_raw_output: false)
# AtpBenchmarkRunner.save_results!(submitted_run, results)
# AtpBenchmarkRunner.report_panel(results, submitted_run)