Runloop Benchmark API
Define, configure, and run Benchmarks against your agents. Runloop ships SWE-Bench Verified and SWE-smith out of the box; the Benchmark API also supports custom benchmarks built from your own scenarios and scorers. Resources include benchmarks, benchmark runs (with start, cancel, complete lifecycle), benchmark jobs, scenario runs, and downloadable run logs.