Powered by AppSignal & Oban Pro

ATP Benchmark Runner smoke workflow

examples/benchmark_smoke.livemd

ATP Benchmark Runner smoke workflow

This notebook demonstrates the intended local/Livebook workflow:

  1. configure local TPTP storage or Livebook cache
  2. select/download TPTP problems
  3. inspect and optionally build prover Apptainer images on HPC
  4. submit a small benchmark run
  5. monitor progress and render a report

It is intentionally conservative: cells default to bundled smoke examples and dry-run planning until you provide a real hpc_connect session.

Install local project

Mix.install([
  {:atp_benchmark_runner, path: Path.expand("..", __DIR__)}
])

Prepare TPTP problems

Use the Kino panel when running in Livebook. Outside Livebook, the function returns plain fallback data.

AtpBenchmarkRunner.tptp_panel(
  install_examples: true,
  limit: 20,
  cache_subdir: "tptp"
)

For a deterministic smoke run, install the bundled examples and load them directly:

tptp_root =
  AtpBenchmarkRunner.Config.tptp_dir(
    env_file: Path.expand("../.env", __DIR__)
  )

AtpBenchmarkRunner.install_tptp_examples!(root_dir: tptp_root)

problems =
  AtpBenchmarkRunner.load_tptp_problems(
    root_dir: tptp_root,
    include_axioms?: true,
    limit: 10
  )

Enum.map(problems, &{&1.name, &1.logic, &1.rating, &1.expected_status})

Create an HPC session

Fill in your FAU/NHR username and connection details. The session shape follows hpc_connect; no connection is opened until a function needs SSH.

session =
  HpcConnect.new_session(:fritz,
    username: System.fetch_env!("HPC_USER"),
    work_dir: System.get_env("HPC_WORK_DIR", "$HOME/.cache/hpc_connect"),
    vault_dir: System.get_env("HPC_VAULT_DIR", "$HPCVAULT")
  )

Inspect prover images

The ATP runner owns the prover .def files under priv/provers/*/apptainer.def. It uploads those via hpc_connect and reuses the generic hpc_connect build_sif.sh script for building .sif files.

provers = [:tableaux, :vampire, :eprover, :cvc5] |> Enum.map(&AtpBenchmarkRunner.Prover.builtin!/1)

AtpBenchmarkRunner.image_build_plan(session, provers)

Only run the next cell when you want to upload/build images on the cluster:

# AtpBenchmarkRunner.upload_prover_definitions!(session, provers)
# AtpBenchmarkRunner.build_prover_images!(session, provers)
# AtpBenchmarkRunner.smoke_validate_images!(session, provers)

Sync problems and create a run

remote_problems =
  AtpBenchmarkRunner.sync_tptp_problems!(session, problems,
    remote_tptp_dir: System.get_env("ATP_BENCHMARK_RUNNER_REMOTE_TPTP_DIR", "$HPCVAULT/atp_benchmark_runner/tptp")
  )

run =
  AtpBenchmarkRunner.new_run(
    title: "ATP smoke benchmark",
    cluster: :fritz,
    partition: "singlenode",
    walltime: "00:30:00",
    problem_timeout_seconds: 30,
    max_parallel_jobs: 4,
    problems: remote_problems,
    provers: provers,
    remote_root: System.get_env("ATP_BENCHMARK_RUNNER_REMOTE_RUN_DIR", "$HPCVAULT/atp_benchmark_runner/runs")
  )

AtpBenchmarkRunner.save_run!(run)

Dry-run plan and submit

Start with a dry-run plan to inspect generated scripts and remote paths.

AtpBenchmarkRunner.plan(session, run)

Submit only after the plan and image paths look correct:

# submitted_run = AtpBenchmarkRunner.submit(session, run)

Monitor and collect results

# AtpBenchmarkRunner.monitor_panel(session, submitted_run)
# results = AtpBenchmarkRunner.collect_results(session, submitted_run, include_raw_output: false)
# AtpBenchmarkRunner.save_results!(submitted_run, results)
# AtpBenchmarkRunner.report_panel(results, submitted_run)