Performance Baselines¶

This page defines the reproducible benchmark workflow used for #39 optimization slices across Pull and Push drain paths.

Scope¶

Current benchmarks cover:

dequeue + ack (single)
dequeue + ack (batch size 15, sustained-drain profile)
dequeue + ack (batch size 32)
dequeue + nack (single)
dequeue + repeated extend on an active lease (single)
duplicate ack/nack retry path under parallel load (contention profile)
mixed ingress + pull drain profile with latency percentiles (p95_ms, p99_ms)
mixed ingress + push drain saturation profile with ingress reject and delivery counts
mixed ingress + push skewed-target saturation profile (fast + slow target) for cross-target drain fairness checks
adaptive backpressure A/B runtime harness (off vs on) for mixed-focused saturation analysis (with optional pull reference), final metrics/health artifacts, and side-by-side comparison tables including Pull ACK conflict ratio

Each benchmark runs against both queue backends:

memory
sqlite

Benchmarks are implemented in:

internal/pullapi/bench_test.go
internal/dispatcher/bench_test.go
scripts/adaptive-ab.sh (runtime A/B harness, non-go test)

Reproducible Runbook¶

Run all commands from the repository root.

Capture a baseline before changes:

make bench-pull-baseline

This writes ./.bench/pull-baseline.txt.

Apply code changes.
Capture current results:

make bench-pull

This writes ./.bench/pull.txt.

Compare baseline vs current:

make bench-pull-compare

This writes ./.bench/pull-compare.txt and prints a benchstat diff table.

Isolated Extend Check¶

When you want to validate only the active-lease extend path:

make bench-pull-extend
make bench-pull-extend-compare

This uses a longer, higher-count run to reduce variance from unrelated Pull-path benchmarks.

Sustained Drain Check (Batch 15)¶

For an issue-#39-style pull workload (dequeue + batch ack with batch=15):

make bench-pull-drain-baseline
make bench-pull-drain
make bench-pull-drain-compare

This writes:

./.bench/pull-drain-baseline.txt
./.bench/pull-drain.txt
./.bench/pull-drain-compare.txt

The drain profile uses a longer run (-benchtime=5s, -count=10) for lower variance.

ACK/NACK Contention Check¶

For high-parallel duplicate-retry pressure on Pull ack/nack:

make bench-pull-contention-baseline
make bench-pull-contention
make bench-pull-contention-compare

This writes:

./.bench/pull-contention-baseline.txt
./.bench/pull-contention.txt
./.bench/pull-contention-compare.txt

The contention profile runs with GOMAXPROCS=4 and -cpu 1,4 to expose scaling behavior and conflict-path costs.

Mixed Ingress + Drain Tail-Latency Check¶

For a mixed workload (concurrent ingress writes while pull workers dequeue+ack in the background):

make bench-pull-mixed-baseline
make bench-pull-mixed
make bench-pull-mixed-compare

This writes:

./.bench/pull-mixed-baseline.txt
./.bench/pull-mixed.txt
./.bench/pull-mixed-compare.txt

BenchmarkMixedIngressDrain also reports custom metrics per backend:

p95_ms
p99_ms
ingress_rejects
drain_errors

Push Ingress + Drain Saturation Check¶

For push-mode saturation behavior (ingress while dispatcher drains a single-target route):

make bench-push-mixed-baseline
make bench-push-mixed
make bench-push-mixed-compare

This writes:

./.bench/push-mixed-baseline.txt
./.bench/push-mixed.txt
./.bench/push-mixed-compare.txt

BenchmarkPushIngressDrainSaturation reports:

ingress_rejects
ingress_rejects_queue_full
ingress_rejects_adaptive_backpressure
ingress_rejects_memory_pressure
ingress_rejects_other
p95_ms
p99_ms
deliveries

Push Ingress + Skewed-Target Saturation Check¶

For push-mode cross-target fairness under saturation (single route with one fast and one slow target):

make bench-push-skewed-baseline
make bench-push-skewed
make bench-push-skewed-compare

This writes:

./.bench/push-skewed-baseline.txt
./.bench/push-skewed.txt
./.bench/push-skewed-compare.txt

BenchmarkPushIngressDrainSkewedTargets reports:

ingress_rejects
ingress_rejects_queue_full
ingress_rejects_adaptive_backpressure
ingress_rejects_memory_pressure
ingress_rejects_other
p95_ms
p99_ms
deliveries_fast
deliveries_slow

Adaptive Backpressure A/B Runtime Check (Issues #53/#54/#55/#56)¶

For explicit adaptive_backpressure.enabled=off vs on runs (same load profile):

make adaptive-ab

make adaptive-ab defaults to the currently open validation scope (mixed).

Scenario-specific runs:

make adaptive-ab-pull
make adaptive-ab-mixed
make adaptive-ab-all
make adaptive-ab-mixed-saturation

Guardrail checks on an existing mixed run:

make adaptive-ab-guardrail-check RUN_ROOT=.artifacts/adaptive-ab/<run-id>
make adaptive-ab-lag-guardrail-check RUN_ROOT=.artifacts/adaptive-ab/<run-id>

One-shot calibrated run + guardrail check:

make adaptive-ab-mixed-guardrail
make adaptive-ab-mixed-lag-guardrail

make adaptive-ab-all executes:

pull-off, pull-on (reference profile)
mixed-off, mixed-on (remaining decision profile)

make adaptive-ab-mixed-saturation is a calibrated high-pressure profile for issue validation (#53/#54/#55/#56):

duration: 30s per mode
ingress workers: 256
mixed drain workers: 8
dequeue batch: 5
queue max depth: 2000

Use it when baseline make adaptive-ab does not reach sustained pressure on your host.

Decision note: these runs are intended for relative same-host A/B evidence and tuning guidance. They are not a standalone basis for global default policy across heterogeneous production hardware.

Artifacts are written under:

./.artifacts/adaptive-ab/<run-id>/<scenario>-<mode>/

Each run directory includes:

final-metrics.txt
final-health.json
monitor-output.log
run-meta.json (binary hash/version, git revision, runtime profile)
summary.env and summary.json

Comparison tables are generated as:

./.artifacts/adaptive-ab/<run-id>/comparison-pull.md
./.artifacts/adaptive-ab/<run-id>/comparison-mixed.md
./.artifacts/adaptive-ab/<run-id>/comparison.md
./.artifacts/adaptive-ab/<run-id>/guardrail-mixed.md (when guardrail target/script is used)
./.artifacts/adaptive-ab/<run-id>/guardrail-lag-mixed.md (when lag/age guardrail target/script is used)

The comparison table includes:

hookaido_ingress_adaptive_backpressure_applied_total
hookaido_ingress_rejected_by_reason_total{reason="adaptive_backpressure",status="503"}
hookaido_ingress_rejected_by_reason_total{reason="queue_full",status="503"}
hookaido_queue_ready_lag_seconds
hookaido_queue_oldest_queued_age_seconds
ingress p95_ms / p99_ms
accepted request rate (requests/sec)
hookaido_pull_acked_total (sum across routes)
hookaido_pull_ack_conflict_total (sum across routes)
hookaido_pull_nack_conflict_total (sum across routes)
pull_ack_conflict_ratio_percent (ack_conflict / acked * 100)

Guardrail defaults for #55:

aggregate pull_ack_conflict_ratio_percent <= 5.0
minimum aggregate pull_acked_total >= 100 per mode (mixed-off, mixed-on)
per-route pull_ack_conflict_ratio_percent <= 5.0 when route pull_acked_total >= 50

Guardrail defaults for #56:

aggregate queue_ready_lag_seconds <= 30 per mode (mixed-off, mixed-on)
aggregate queue_oldest_queued_age_seconds <= 30 per mode (mixed-off, mixed-on)
delta (on-off) queue_ready_lag_seconds <= 10
delta (on-off) queue_oldest_queued_age_seconds <= 10
minimum accepted_total >= 100 per mode

Reproducibility Defaults¶

The Make targets enforce:

GOMAXPROCS=1
-cpu 1
-count=5 and -benchtime=3s for the default pull suite
-count=10 and -benchtime=5s for isolated extend/drain profiles
GOMAXPROCS=4 and -cpu 1,4 for the contention profile
GOMAXPROCS=4 and -cpu 4 for the mixed ingress+drain profile
GOMAXPROCS=4 and -cpu 4 for the push saturation profile
GOMAXPROCS=4 and -cpu 4 for the push skewed-target profile
adaptive A/B harness defaults: duration=120s, ingress_workers=16, mixed_drain_workers=8, dequeue_batch=15, queue_max_depth=50000

This reduces host variance and gives stable median trends across runs.

Interpreting Results¶

Focus first on sec/op deltas for the same benchmark/backend pair.
Use B/op and allocs/op to catch regressions hidden by throughput changes.
For SQLite, compare both single and batch paths; batch wins should show up most clearly in AckBatch32.
For mixed profile runs, track p95_ms/p99_ms first, then check ingress_rejects and drain_errors to interpret latency shifts.
For push saturation runs, track p95_ms/p99_ms and ingress_rejects_queue_full first, then compare deliveries.
For push skewed-target runs, track p95_ms/p99_ms, deliveries_slow, and ingress_rejects_queue_full; improving slow-target drain without growing queue-full rejects or tail latency indicates better cross-target fairness.
For adaptive A/B runs, first confirm adaptive_applied_total=0 in off, then compare queue_full delta and latency/rate trade-offs in on.
For mixed A/B (#55), track pull_ack_conflict_ratio_percent alongside ingress metrics; large conflict-ratio regressions can hide behind stable ingress acceptance.
For #55 regression acceptance, use guardrail-mixed.md as the pass/fail artifact and inspect the per-route drill-down section to localize conflict spikes.
For lag/age regression acceptance (#56), use guardrail-lag-mixed.md and investigate sustained queue lag/age when absolute or delta thresholds fail.
Keep policy decisions tied to workload SLOs: same-host gains do not imply cross-environment default changes.

Notes¶

Benchmark artifacts are written under ./.bench/ and ignored by git.
Adaptive A/B artifacts are written under ./.artifacts/adaptive-ab/ and ignored by git.
bench-pull-compare uses a pinned benchstat module version in Makefile to avoid tool drift.
For production threshold tuning of adaptive ingress pressure, use Adaptive Backpressure Tuning.