pybench¶
While Pytest does deterministic tests, Pybench does statistical tests.
pybench discovers benchmark functions, runs them across many seeds, and
statistically detects regressions against a saved baseline.
It reruns each benchmark on the same stored seeds as its baseline, so the comparison is paired (far more sensitive than a two-sample test), and judges the whole benchmark with a within-seed sign-flip permutation test that respects correlation across metrics and steps.