API reference¶
Auto-generated from the package’s Google-style docstrings (via autodoc +
napoleon). One section per module, in the order they compose a run.
pybench.discovery¶
Discover bench_* functions by importing Python files under a path.
- exception pybench.discovery.DiscoveryError¶
Raised when discovery cannot satisfy the request.
- class pybench.discovery.Benchmark(name, fn, config, file)¶
A discovered benchmark function and its resolved configuration.
- Parameters:
name (str)
fn (Callable[[int], object])
config (BenchConfig)
file (Path)
- pybench.discovery.import_file(file)¶
Import a Python file as an anonymous module.
- Parameters:
file (Path) – Path to the
.pyfile.- Returns:
The imported module object.
- Raises:
DiscoveryError – If the file cannot be loaded as a module.
- Return type:
ModuleType
- pybench.discovery.discover(path, names=None)¶
Find
bench_*functions defined underpath.A file is any
.pyfile (recursively, whenpathis a directory). Only functions defined in the imported file are collected, so abench_*imported from elsewhere is ignored.- Parameters:
path (Path) – A benchmark file or a directory to walk.
names (Sequence[str] | None) – If given, keep only these benchmark names (
--bench).
- Returns:
Benchmarks sorted by name.
- Raises:
DiscoveryError – If
pathdoes not exist, a benchmark name is defined twice, or a requestednamesentry is not found.- Return type:
list[Benchmark]
pybench.normalizer¶
Coerce benchmark return values into a canonical scores mapping.
- pybench.normalizer.Scores¶
{step: {metric: value}}.- Type:
Canonical normalized form
alias of
dict[int,dict[str,float]]
- exception pybench.normalizer.NormalizationError¶
Raised when a benchmark return value has an unsupported shape.
- pybench.normalizer.normalize(result)¶
Coerce any accepted benchmark return value to canonical
Scores.- Parameters:
result (object) – A
float,dictof metrics, orlistof step dicts.- Returns:
{step: {metric: value}}; scalars and bare dicts use step0, and a bare scalar is stored under the metric namescore.- Raises:
NormalizationError – If the value has an unsupported shape or type.
- Return type:
dict[int, dict[str, float]]
pybench.validator¶
Alignment checks between a run’s scores and the stored baseline.
- exception pybench.validator.StepKeyMismatchError(bench, current, baseline)¶
Raised when the run and baseline step-key sets differ.
- Parameters:
bench (str)
current (set[int])
baseline (set[int])
- Return type:
None
- exception pybench.validator.MetricKeyMismatchError(bench, step, current, baseline)¶
Raised when the run and baseline metric-key sets differ for a step.
- Parameters:
bench (str)
step (int)
current (set[str])
baseline (set[str])
- Return type:
None
- pybench.validator.validate_alignment(bench, current, baseline)¶
Assert the run and baseline share identical step and metric keys.
- Parameters:
bench (str) – Benchmark name, for error messages.
current (Mapping[int, Mapping[str, object]]) – Freshly normalized run scores (keyed by step then metric).
baseline (Mapping[int, Mapping[str, object]]) – Stored baseline scores (keyed by step then metric).
- Raises:
StepKeyMismatchError – If the step-key sets differ.
MetricKeyMismatchError – If any step’s metric-key sets differ.
- Return type:
None
pybench.runner¶
Run a benchmark over a set of seeds and collect per-seed scores.
- pybench.runner.SeedScores¶
{step: {metric: [value for each seed]}}.- Type:
Per-seed raw scores
alias of
dict[int,dict[str,list[float]]]
- exception pybench.runner.RunShapeError¶
Raised when a benchmark returns different keys across seeds.
- pybench.runner.sample_seeds(n, rng)¶
Sample
ndistinct-enough random integer seeds.- Parameters:
n (int) – Number of seeds to draw.
rng (Generator) – Random generator.
- Returns:
A list of
nPython ints in[0, 2**32).- Return type:
list[int]
- pybench.runner.run_benchmark(bench, seeds, *, on_seed=None)¶
Run
benchon each seed and collect aligned per-seed scores.Runs serially when
workers == 1; otherwise fans the seeds out across a process pool (each worker re-imports the benchmark file by path).- Parameters:
bench (Benchmark) – The benchmark to run.
seeds (list[int]) – Seeds to run, in order; output lists align position-by-position.
on_seed (Callable[[], None] | None) – Optional callback invoked once per completed seed (progress).
- Returns:
Per-seed scores
{step: {metric: [value for each seed]}}.- Raises:
RunShapeError – If a later seed yields different step/metric keys than the first.
- Return type:
dict[int, dict[str, list[float]]]
pybench.stats¶
Statistical comparison: paired t-test slots + sign-flip permutation meta-test.
For each (step, metric) slot a one-sided paired t-test (in goodness space,
i.e. after the min: sign flip) decides whether the current run regressed.
The benchmark verdict is the within-seed sign-flip permutation p-value of a
continuous severity statistic (see SPECIFICATIONS.md §3).
- pybench.stats.SeedScores¶
{step: {metric: [value for each seed]}}.- Type:
Per-seed raw scores
alias of
dict[int,dict[str,list[float]]]
- class pybench.stats.SlotResult(step, metric, baseline_mean, baseline_std, current_mean, current_std, effect_size, p_value, flagged, denom_at_floor)¶
Comparison outcome for one
(step, metric)slot, in raw units.- Parameters:
step (int)
metric (str)
baseline_mean (float)
baseline_std (float)
current_mean (float)
current_std (float)
effect_size (float)
p_value (float)
flagged (bool)
denom_at_floor (bool)
- denom_at_floor: bool¶
True when the baseline mean is so small that effect_size is unreliable.
- class pybench.stats.Comparison(slots, n_flagged, n_slots, meta_p, passed)¶
Full benchmark comparison result.
- Parameters:
slots (list[SlotResult])
n_flagged (int)
n_slots (int)
meta_p (float)
passed (bool)
- pybench.stats.check_alpha_detectable(n_seeds, alpha)¶
Reject an
alphathat no regression could ever satisfy.The within-seed sign-flip meta-test has only
2**n_seedsarrangements, so the smallest achievablemeta_pis1 / 2**n_seeds. Whenalpha <= 1 / 2**n_seedsthe verdictmeta_p < alphais unsatisfiable — even a maximally severe regression yields a PASS — so flag it loudly rather than report a vacuous green.- Raises:
ValueError – If
alphais unreachable at this seed count.- Parameters:
n_seeds (int)
alpha (float)
- Return type:
None
- pybench.stats.compare(baseline, current, *, alpha=0.05, min_effect=None, n_perm=4096, rng=None)¶
Compare a paired current run against a baseline.
- Parameters:
baseline (dict[int, dict[str, list[float]]]) – Stored per-seed baseline scores.
current (dict[int, dict[str, list[float]]]) – Per-seed current scores, on the same seeds (paired).
alpha (float) – Per-slot and overall significance threshold.
min_effect (float | None) – Optional minimum relative goodness drop to flag a slot.
n_perm (int) – Number of sign-flip permutations for the meta-test.
rng (Generator | None) – Random generator; a fresh default one is used when
None.
- Returns:
A
Comparisonwith per-slot detail and the overall verdict.- Raises:
ValueError – If baseline and current have mismatched seed counts, or if
alphais unreachable at this seed count (seecheck_alpha_detectable()).- Return type:
pybench.store¶
Read and write the JSONL baseline store.
- pybench.store.SeedScores¶
{step: {metric: [value for each seed]}}.- Type:
Per-seed raw scores
alias of
dict[int,dict[str,list[float]]]
- class pybench.store.BaselineRecord(bench, timestamp, git_commit, git_dirty, seeds, scores)¶
One benchmark’s stored baseline.
- Parameters:
bench (str)
timestamp (str)
git_commit (str | None)
git_dirty (bool | None)
seeds (list[int])
scores (dict[int, dict[str, list[float]]])
- pybench.store.parse_baselines(text)¶
Parse JSONL baseline content into records keyed by benchmark name.
- Parameters:
text (str) – Raw JSONL content (e.g. a file’s text or
git showoutput).- Returns:
Mapping of benchmark name to record; blank lines are skipped.
- Return type:
dict[str, BaselineRecord]
- pybench.store.read_baselines(path)¶
Load all baseline records keyed by benchmark name.
- Parameters:
path (Path) – Path to the JSONL store.
- Returns:
Mapping of benchmark name to record; empty if the file is absent.
- Return type:
dict[str, BaselineRecord]
- pybench.store.write_baselines(path, records)¶
Rewrite the JSONL store with the given records, one line each.
- Parameters:
path (Path) – Path to the JSONL store; parent directories are created.
records (Iterable[BaselineRecord]) – Records to write (full rewrite).
- Return type:
None
pybench.git¶
Capture git provenance (short SHA + dirty flag) with graceful fallback.
- class pybench.git.GitInfo(commit, dirty)¶
Git provenance recorded with a baseline write.
- Parameters:
commit (str | None)
dirty (bool | None)
- pybench.git.git_metadata(cwd=None)¶
Return the short HEAD SHA and dirty flag, or nulls if git is unavailable.
- Parameters:
cwd (Path | None) – Directory to inspect; defaults to the current working directory.
- Returns:
GitInfo(commit, dirty). Both areNonewhencwdis not a git repository or git is not installed.- Return type:
- pybench.git.file_history(path)¶
Return commits that touched
path, oldest first.- Parameters:
path (Path) – File whose history to inspect (git is run in its directory).
- Returns:
[(short_sha, date), ...]chronological,[]if the file has no commits, orNoneif not a git repo / git is unavailable.- Return type:
list[tuple[str, str]] | None
- pybench.git.file_at_commit(commit, path)¶
Return the content of
pathas ofcommitviagit show.- Parameters:
commit (str) – Commit-ish (e.g. a short SHA).
path (Path) – File to read; git is run in its directory.
- Returns:
The file’s text at that commit, or
Noneif git is unavailable or the path did not exist there.- Return type:
str | None
pybench.config¶
Per-benchmark configuration: keyword defaults plus CLI overrides.
- class pybench.config.BenchConfig(n_seeds=30, alpha=0.05, min_effect=None, workers=1)¶
Resolved configuration for one benchmark.
- Parameters:
n_seeds (int)
alpha (float)
min_effect (float | None)
workers (int)
- pybench.config.extract_config(fn)¶
Read a benchmark’s keyword-only config defaults from its signature.
- Parameters:
fn (Callable[[...], object]) – The
bench_*function to inspect.- Returns:
A
BenchConfig; any ofn_seeds,alpha,min_effect,workersnot declared onfnkeep their package default.- Return type:
- pybench.config.apply_overrides(config, *, alpha=None, min_effect=None)¶
Return
configwith non-NoneCLI overrides applied.- Parameters:
config (BenchConfig) – The benchmark’s resolved configuration.
alpha (float | None) – CLI
--alphaoverride, orNoneto keep the benchmark’s.min_effect (float | None) – CLI
--min-effectoverride, orNoneto keep it.
- Returns:
A new
BenchConfigwith the overrides merged in.- Return type:
pybench.reporter¶
Terminal output for a benchmark run (Rich, colored).
- class pybench.reporter.BenchOutcome(name, status, n_steps, n_metrics, comparison)¶
One benchmark’s result, ready to render.
- Parameters:
name (str)
status (str)
n_steps (int)
n_metrics (int)
comparison (Comparison | None)
- pybench.reporter.report(console, outcomes, *, elapsed, verbose)¶
Render the full run report.
- Parameters:
console (Console) – Rich console to write to.
outcomes (list[BenchOutcome]) – One outcome per benchmark, in display order.
elapsed (float) – Wall-clock seconds for the whole run.
verbose (bool) – Expand the per-slot table under each failing benchmark.
- Return type:
None
- pybench.reporter.report_update(console, updated)¶
Render the summary of a
pybench update.- Parameters:
console (Console) – Rich console to write to.
updated (list[tuple[str, int]]) –
(name, n_seeds)for each rewritten benchmark.
- Return type:
None
- pybench.reporter.report_show(console, records)¶
Render the current baseline stats for each benchmark.
- Parameters:
console (Console) – Rich console to write to.
records (dict[str, BaselineRecord]) – Baseline records keyed by benchmark name.
- Return type:
None
- pybench.reporter.report_history(console, history)¶
Render per-benchmark baseline history across commits.
- Parameters:
console (Console) – Rich console to write to.
history (dict[str, list[tuple[str, str, BaselineRecord]]]) –
{bench: [(short_sha, date, record), ...]}chronological.
- Return type:
None
pybench.cli¶
Click-based CLI entry point.