Mutation Testing Baseline
Hunch uses cargo-mutants to measure assertion
quality, not just code coverage. Mutation testing mutates the source
(flips == to !=, replaces + with -, etc.) and runs the test suite
against each mutated build. A mutation that survives all tests means
no test would actually catch that bug — the line might be 100% covered
yet still fail to detect a real regression.
This complements code coverage (#145): coverage tells us which lines run; mutation testing tells us which lines have strong assertions.
How it runs
Run cargo mutants locally during test-quality work or when adding
fixtures around a tricky function. The mutation-killing PRs landed
during the v1.1.x → v2.0.0 cycle (#180–#185) used this exact loop.
The nightly
mutants.ymlworkflow that previously ran on a schedule was removed in #216 along with the rest of the over-engineered CI for a hobby-scale crate. The tooling and the local workflow are unchanged; the surviving-mutants triage in this doc still applies when you runcargo mutantslocally.
You can still capture results in the same shape the old job produced — see Local usage below.
First nightly run results (2026-04-18)
First real run after #169/#170 landed: run 24615983143.
12 minutes wall-clock on ubuntu-latest with --jobs 4.
| Outcome | Count |
|---|---|
| ✅ Caught | 115 |
| ⚠️ Missed | 30 |
| ⏱️ Timeout | 0 |
| 🚫 Unviable | 11 |
| Total | 156 |
Overall kill rate: 73.7% (target: ≥ 80%) — below baseline but with a clear story.
Per-file breakdown
| File | Caught | Missed | Unviable | Kill rate |
|---|---|---|---|---|
src/properties/title/clean.rs | 82 | 16 | 1 | 83.7% ✅ already over target |
src/pipeline/mod.rs | 33 | 14 | 10 | 70.2% ⚠️ drags the average |
title/clean.rs already exceeds the 80% target — the PR-C #138
kitchen-sink coverage was effective. pipeline/mod.rs is the laggard;
the 14 surviving mutants there are the highest-leverage triage target
for the next coverage-improvement loop.
Categories of the 30 surviving mutants
Grouped by mutation kind for batch-fixing efficiency:
| Category | Count | Examples | Likely fix |
|---|---|---|---|
Comparison-operator boundaries (< ↔ <=, > ↔ >=) | 13 | pipeline/mod.rs:333:39, title/clean.rs:154:30 | Add fixtures at boundary values |
Logical operator (&& ↔ ||) | 4 | title/clean.rs:154:34, :225:28, :306:27, :492:9 | Test both branches independently |
Arithmetic (+/-/*) | 4 | title/clean.rs:304:26, pipeline/mod.rs:422:33, :555:35 | Assert exact computed values, not just non-zero |
Logical negation deletion (delete !) | 2 | pipeline/mod.rs:325:16, :391:12 | Test the inverse-condition path |
Function-stub replacements (returns 0, 1, -1, "") | 5 | title/clean.rs:372:9 (casing_score 3×), :303:5 (strip_extension) | Assert specific return values, not just non-empty/non-zero |
Equality (== ↔ !=) | 2 | pipeline/mod.rs:565:51, title/clean.rs:502:65 | Test the negative case |
Full surviving-mutant list is in mutants.out/missed.txt (downloadable
as the mutants-out artifact).
Hot spot: pick_better_casing::casing_score
Three mutations to this function survived (all three function-stub
replacements: return 0, return 1, return -1). Plus its caller
at :388:24 lost its >= boundary check. The function’s tests
don’t actually pin its return value — they presumably check that
the right branch is selected downstream, but never assert what the
score IS. This is the single highest-leverage fix in the surviving
set: pinning casing_score’s output for half a dozen representative
inputs would kill 4 mutants in one tiny PR.
Triage actions (deferred to follow-up PRs)
- Pin
casing_scorereturn values — kills 4 mutants in one PR - Add boundary-value fixtures for
pipeline/mod.rsPass 1/Pass 2</>checks — kills ~6 mutants - Independent-branch tests for the four
&&survivors — kills 4 - Assertion-tightening pass on
strip_extension(assert exact output, not just non-empty) — kills 4
Scope (first slice)
The full crate has ~2,876 mutants and would take ~10 hours single-threaded. This first slice scopes the nightly run to two highest-value targets identified in the Mutation testing epic (#146):
| File | Mutants | Why |
|---|---|---|
src/pipeline/mod.rs | ~57 | Orchestration core — every property runs through here |
src/properties/title/clean.rs | ~99 | Busiest property module; PR-C #138 added kitchen-sink coverage |
Combined run with --jobs 4 on a GitHub-hosted ubuntu runner: ~12–15 min.
Roadmap
Long-term ideas, not actively planned post-#216:
- Re-enable a nightly workflow if the project ever grows past hobby-scale (multi-developer, downstream library users filing regression-class bugs). The triage protocol below is the workflow.
- Hard kill-rate gate — only meaningful with a recurring run.
- Diff-only PR check — useful with a CI cadence; manual on demand for now.
Local usage
Install once (note: requires --locked so the version matches CI):
cargo install cargo-mutants --locked
Run against one file (~5 min for a small file):
cargo mutants --file src/properties/year.rs --no-shuffle
Run against the same scope CI uses:
cargo mutants --no-shuffle --jobs 4 \
--file src/pipeline/mod.rs \
--file src/properties/title/clean.rs
Outputs land in ./mutants.out/:
| File | Contents |
|---|---|
outcomes.json | Machine-readable per-mutant results + counts |
missed.txt | Surviving mutants (the interesting ones) |
caught.txt | Killed mutants (good — your tests work) |
timeout.txt | Tests that hung — usually infinite-loop mutations |
unviable.txt | Mutants that didn’t compile (rare, ignorable) |
mutants.out/ is gitignored.
Worked example: src/properties/year.rs
A pre-PR smoke run on year.rs (20 mutants, ~5 min) produced 3 surviving
mutants that demonstrate the categories we’ll see in nightly results:
Equivalent mutation (accepted survival)
src/properties/year.rs:19:15: replace < with <= in find_matches
#![allow(unused)]
fn main() {
let mut pos = 0;
while pos < input.len() { // mutation: pos <= input.len()
let Some(m) = YEAR_RE.find_at(input, pos) else {
break;
};
}
When pos == input.len(), Regex::find_at returns None and the loop
exits via the else branch on the next line — so < and <= produce
identical observable behaviour. Equivalent mutation; document and
move on.
Real test gaps (backlog — file as follow-up issues)
src/properties/year.rs:26:22: replace > with < in find_matches
src/properties/year.rs:29:20: replace < with > in find_matches
#![allow(unused)]
fn main() {
// Boundary: no digit before or after.
if m.start() > 0 && bytes[m.start() - 1].is_ascii_digit() { // L26
continue;
}
if m.end() < bytes.len() && bytes[m.end()].is_ascii_digit() { // L29
continue;
}
}
Both mutations bypass the boundary check (the inverted comparison
short-circuits via && so the check never runs). They survive because
no test exercises a year touching the start or end of the input string.
Trivial fix: add fixtures like 2020 (year alone), 12020.mkv (digit
prefix), 20201.mkv (digit suffix) and assert the boundary rejection.
These two are not fixed in this PR — that’s deliberate. This PR sets up the infrastructure to find findings; fixing them is the next loop.
Triage protocol
When a local cargo mutants run produces surviving mutants:
- Equivalent mutation? (the mutation produces identical observable behaviour) → add a one-line entry to the “Accepted equivalents” table below with the mutation string + a one-sentence rationale.
- Real test gap? → file a
tech-debtissue with the mutation string in the title, or fix it directly in the same PR if scope allows. - Tool bug / unviable mis-classification? → file upstream at https://github.com/sourcefrog/cargo-mutants.
Accepted equivalents
| Mutation | Why it’s equivalent | Accepted on |
|---|---|---|
src/properties/year.rs:19:15: replace < with <= in find_matches | find_at(input, input.len()) returns None; < and <= produce identical loop behaviour. | 2026-04-18 (smoke run) |
(Future entries get appended as they’re triaged.)
References
cargo-mutantsbook- Epic #146
- Sibling: code coverage #145 /
coverage.md - Industry benchmark: 80% kill rate is the rough north star for parser code (mature mutation-tested Rust crates land 75–90%).