# pybench While Pytest does deterministic tests, Pybench does statistical tests. `pybench` discovers benchmark functions, runs them across many seeds, and **statistically detects regressions** against a saved baseline. It reruns each benchmark on the *same* stored seeds as its baseline, so the comparison is **paired** (far more sensitive than a two-sample test), and judges the whole benchmark with a within-seed sign-flip permutation test that respects correlation across metrics and steps. ```{toctree} :maxdepth: 2 getting_started user_guide how_it_works cli ```