API reference¶

Auto-generated from the package’s Google-style docstrings (via autodoc + napoleon). One section per module, in the order they compose a run.

pybench.discovery¶

Discover bench_* functions by importing Python files under a path.

exception pybench.discovery.DiscoveryError¶: Raised when discovery cannot satisfy the request.

class pybench.discovery.Benchmark(name, fn, config, file)¶

A discovered benchmark function and its resolved configuration.

Parameters:

name (str)
fn (Callable[[int], object])
config (BenchConfig)
file (Path)

pybench.discovery.import_file(file)¶

Import a Python file as an anonymous module.

Parameters:: file (Path) – Path to the .py file.
Returns:: The imported module object.
Raises:: DiscoveryError – If the file cannot be loaded as a module.
Return type:: ModuleType

pybench.discovery.discover(path, names=None)¶

Find bench_* functions defined under path.

A file is any .py file (recursively, when path is a directory). Only functions defined in the imported file are collected, so a bench_* imported from elsewhere is ignored.

Parameters:

path (Path) – A benchmark file or a directory to walk.
names (Sequence[str] | None) – If given, keep only these benchmark names (--bench).

Returns:

Benchmarks sorted by name.

Raises:

DiscoveryError – If path does not exist, a benchmark name is defined twice, or a requested names entry is not found.

Return type:

list[Benchmark]

pybench.normalizer¶

Coerce benchmark return values into a canonical scores mapping.

pybench.normalizer.Scores¶

{step: {metric: value}}.

Type:: Canonical normalized form

alias of dict[int, dict[str, float]]

exception pybench.normalizer.NormalizationError¶: Raised when a benchmark return value has an unsupported shape.

pybench.normalizer.normalize(result)¶

Coerce any accepted benchmark return value to canonical Scores.

Parameters:: result (object) – A float, dict of metrics, or list of step dicts.
Returns:: {step: {metric: value}}; scalars and bare dicts use step 0, and a bare scalar is stored under the metric name score.
Raises:: NormalizationError – If the value has an unsupported shape or type.
Return type:: dict[int, dict[str, float]]

pybench.validator¶

Alignment checks between a run’s scores and the stored baseline.

exception pybench.validator.StepKeyMismatchError(bench, current, baseline)¶

Raised when the run and baseline step-key sets differ.

Parameters:

bench (str)
current (set[int])
baseline (set[int])

Return type:

None

exception pybench.validator.MetricKeyMismatchError(bench, step, current, baseline)¶

Raised when the run and baseline metric-key sets differ for a step.

Parameters:

bench (str)
step (int)
current (set[str])
baseline (set[str])

Return type:

None

pybench.validator.validate_alignment(bench, current, baseline)¶

Assert the run and baseline share identical step and metric keys.

Parameters:

bench (str) – Benchmark name, for error messages.
current (Mapping[int, Mapping[str, object]]) – Freshly normalized run scores (keyed by step then metric).
baseline (Mapping[int, Mapping[str, object]]) – Stored baseline scores (keyed by step then metric).

Raises:

StepKeyMismatchError – If the step-key sets differ.
MetricKeyMismatchError – If any step’s metric-key sets differ.

Return type:

None

pybench.runner¶

Run a benchmark over a set of seeds and collect per-seed scores.

pybench.runner.SeedScores¶

{step: {metric: [value for each seed]}}.

Type:: Per-seed raw scores

alias of dict[int, dict[str, list[float]]]

exception pybench.runner.RunShapeError¶: Raised when a benchmark returns different keys across seeds.

pybench.runner.sample_seeds(n, rng)¶

Sample n distinct-enough random integer seeds.

Parameters:

n (int) – Number of seeds to draw.
rng (Generator) – Random generator.

Returns:

A list of n Python ints in [0, 2**32).

Return type:

list[int]

pybench.runner.run_benchmark(bench, seeds, *, on_seed=None)¶

Run bench on each seed and collect aligned per-seed scores.

Runs serially when workers == 1; otherwise fans the seeds out across a process pool (each worker re-imports the benchmark file by path).

Parameters:

bench (Benchmark) – The benchmark to run.
seeds (list[int]) – Seeds to run, in order; output lists align position-by-position.
on_seed (Callable[[], None] | None) – Optional callback invoked once per completed seed (progress).

Returns:

Per-seed scores {step: {metric: [value for each seed]}}.

Raises:

RunShapeError – If a later seed yields different step/metric keys than the first.

Return type:

dict[int, dict[str, list[float]]]

pybench.stats¶

Statistical comparison: paired t-test slots + sign-flip permutation meta-test.

For each (step, metric) slot a one-sided paired t-test (in goodness space, i.e. after the min: sign flip) decides whether the current run regressed. The benchmark verdict is the within-seed sign-flip permutation p-value of a continuous severity statistic (see SPECIFICATIONS.md §3).

pybench.stats.SeedScores¶

{step: {metric: [value for each seed]}}.

Type:: Per-seed raw scores

alias of dict[int, dict[str, list[float]]]

class pybench.stats.SlotResult(step, metric, baseline_mean, baseline_std, current_mean, current_std, effect_size, p_value, flagged, denom_at_floor)¶

Comparison outcome for one (step, metric) slot, in raw units.

Parameters:

step (int)
metric (str)
baseline_mean (float)
baseline_std (float)
current_mean (float)
current_std (float)
effect_size (float)
p_value (float)
flagged (bool)
denom_at_floor (bool)

denom_at_floor: bool¶: True when the baseline mean is so small that effect_size is unreliable.

class pybench.stats.Comparison(slots, n_flagged, n_slots, meta_p, passed)¶

Full benchmark comparison result.

Parameters:

slots (list[SlotResult])
n_flagged (int)
n_slots (int)
meta_p (float)
passed (bool)

pybench.stats.check_alpha_detectable(n_seeds, alpha)¶

Reject an alpha that no regression could ever satisfy.

The within-seed sign-flip meta-test has only 2**n_seeds arrangements, so the smallest achievable meta_p is 1 / 2**n_seeds. When alpha <= 1 / 2**n_seeds the verdict meta_p < alpha is unsatisfiable — even a maximally severe regression yields a PASS — so flag it loudly rather than report a vacuous green.

Raises:

ValueError – If alpha is unreachable at this seed count.

Parameters:

n_seeds (int)
alpha (float)

Return type:

None

pybench.stats.compare(baseline, current, *, alpha=0.05, min_effect=None, n_perm=4096, rng=None)¶

Compare a paired current run against a baseline.

Parameters:

baseline (dict[int, dict[str, list[float]]]) – Stored per-seed baseline scores.
current (dict[int, dict[str, list[float]]]) – Per-seed current scores, on the same seeds (paired).
alpha (float) – Per-slot and overall significance threshold.
min_effect (float | None) – Optional minimum relative goodness drop to flag a slot.
n_perm (int) – Number of sign-flip permutations for the meta-test.
rng (Generator | None) – Random generator; a fresh default one is used when None.

Returns:

A Comparison with per-slot detail and the overall verdict.

Raises:

ValueError – If baseline and current have mismatched seed counts, or if alpha is unreachable at this seed count (see check_alpha_detectable()).

Return type:

Comparison

pybench.store¶

Read and write the JSONL baseline store.

pybench.store.SeedScores¶

{step: {metric: [value for each seed]}}.

Type:: Per-seed raw scores

alias of dict[int, dict[str, list[float]]]

class pybench.store.BaselineRecord(bench, timestamp, git_commit, git_dirty, seeds, scores)¶

One benchmark’s stored baseline.

Parameters:

bench (str)
timestamp (str)
git_commit (str | None)
git_dirty (bool | None)
seeds (list[int])
scores (dict[int, dict[str, list[float]]])

pybench.store.parse_baselines(text)¶

Parse JSONL baseline content into records keyed by benchmark name.

Parameters:: text (str) – Raw JSONL content (e.g. a file’s text or git show output).
Returns:: Mapping of benchmark name to record; blank lines are skipped.
Return type:: dict[str, BaselineRecord]

pybench.store.read_baselines(path)¶

Load all baseline records keyed by benchmark name.

Parameters:: path (Path) – Path to the JSONL store.
Returns:: Mapping of benchmark name to record; empty if the file is absent.
Return type:: dict[str, BaselineRecord]

pybench.store.write_baselines(path, records)¶

Rewrite the JSONL store with the given records, one line each.

Parameters:

path (Path) – Path to the JSONL store; parent directories are created.
records (Iterable[BaselineRecord]) – Records to write (full rewrite).

Return type:

None

pybench.git¶

Capture git provenance (short SHA + dirty flag) with graceful fallback.

class pybench.git.GitInfo(commit, dirty)¶

Git provenance recorded with a baseline write.

Parameters:

commit (str | None)
dirty (bool | None)

pybench.git.git_metadata(cwd=None)¶

Return the short HEAD SHA and dirty flag, or nulls if git is unavailable.

Parameters:: cwd (Path | None) – Directory to inspect; defaults to the current working directory.
Returns:: GitInfo(commit, dirty). Both are None when cwd is not a git repository or git is not installed.
Return type:: GitInfo

pybench.git.file_history(path)¶

Return commits that touched path, oldest first.

Parameters:: path (Path) – File whose history to inspect (git is run in its directory).
Returns:: [(short_sha, date), ...] chronological, [] if the file has no commits, or None if not a git repo / git is unavailable.
Return type:: list[tuple[str, str]] | None

pybench.git.file_at_commit(commit, path)¶

Return the content of path as of commit via git show.

Parameters:

commit (str) – Commit-ish (e.g. a short SHA).
path (Path) – File to read; git is run in its directory.

Returns:

The file’s text at that commit, or None if git is unavailable or the path did not exist there.

Return type:

str | None

pybench.config¶

Per-benchmark configuration: keyword defaults plus CLI overrides.

class pybench.config.BenchConfig(n_seeds=30, alpha=0.05, min_effect=None, workers=1)¶

Resolved configuration for one benchmark.

Parameters:

n_seeds (int)
alpha (float)
min_effect (float | None)
workers (int)

pybench.config.extract_config(fn)¶

Read a benchmark’s keyword-only config defaults from its signature.

Parameters:: fn (Callable[[...], object]) – The bench_* function to inspect.
Returns:: A BenchConfig; any of n_seeds, alpha, min_effect, workers not declared on fn keep their package default.
Return type:: BenchConfig

pybench.config.apply_overrides(config, *, alpha=None, min_effect=None)¶

Return config with non-None CLI overrides applied.

Parameters:

config (BenchConfig) – The benchmark’s resolved configuration.
alpha (float | None) – CLI --alpha override, or None to keep the benchmark’s.
min_effect (float | None) – CLI --min-effect override, or None to keep it.

Returns:

A new BenchConfig with the overrides merged in.

Return type:

BenchConfig

pybench.reporter¶

Terminal output for a benchmark run (Rich, colored).

class pybench.reporter.BenchOutcome(name, status, n_steps, n_metrics, comparison)¶

One benchmark’s result, ready to render.

Parameters:

name (str)
status (str)
n_steps (int)
n_metrics (int)
comparison (Comparison | None)

pybench.reporter.report(console, outcomes, *, elapsed, verbose)¶

Render the full run report.

Parameters:

console (Console) – Rich console to write to.
outcomes (list[BenchOutcome]) – One outcome per benchmark, in display order.
elapsed (float) – Wall-clock seconds for the whole run.
verbose (bool) – Expand the per-slot table under each failing benchmark.

Return type:

None

pybench.reporter.report_update(console, updated)¶

Render the summary of a pybench update.

Parameters:

console (Console) – Rich console to write to.
updated (list[tuple[str, int]]) – (name, n_seeds) for each rewritten benchmark.

Return type:

None

pybench.reporter.report_show(console, records)¶

Render the current baseline stats for each benchmark.

Parameters:

console (Console) – Rich console to write to.
records (dict[str, BaselineRecord]) – Baseline records keyed by benchmark name.

Return type:

None

pybench.reporter.report_history(console, history)¶

Render per-benchmark baseline history across commits.

Parameters:

console (Console) – Rich console to write to.
history (dict[str, list[tuple[str, str, BaselineRecord]]]) – {bench: [(short_sha, date, record), ...]} chronological.

Return type:

None

pybench.cli¶

Click-based CLI entry point.