Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Mutation Testing Baseline

Hunch uses cargo-mutants to measure assertion quality, not just code coverage. Mutation testing mutates the source (flips == to !=, replaces + with -, etc.) and runs the test suite against each mutated build. A mutation that survives all tests means no test would actually catch that bug — the line might be 100% covered yet still fail to detect a real regression.

This complements code coverage (#145): coverage tells us which lines run; mutation testing tells us which lines have strong assertions.

How it runs

Run cargo mutants locally during test-quality work or when adding fixtures around a tricky function. The mutation-killing PRs landed during the v1.1.x → v2.0.0 cycle (#180–#185) used this exact loop.

The nightly mutants.yml workflow that previously ran on a schedule was removed in #216 along with the rest of the over-engineered CI for a hobby-scale crate. The tooling and the local workflow are unchanged; the surviving-mutants triage in this doc still applies when you run cargo mutants locally.

You can still capture results in the same shape the old job produced — see Local usage below.

First nightly run results (2026-04-18)

First real run after #169/#170 landed: run 24615983143. 12 minutes wall-clock on ubuntu-latest with --jobs 4.

OutcomeCount
✅ Caught115
⚠️ Missed30
⏱️ Timeout0
🚫 Unviable11
Total156

Overall kill rate: 73.7% (target: ≥ 80%) — below baseline but with a clear story.

Per-file breakdown

FileCaughtMissedUnviableKill rate
src/properties/title/clean.rs8216183.7% ✅ already over target
src/pipeline/mod.rs33141070.2% ⚠️ drags the average

title/clean.rs already exceeds the 80% target — the PR-C #138 kitchen-sink coverage was effective. pipeline/mod.rs is the laggard; the 14 surviving mutants there are the highest-leverage triage target for the next coverage-improvement loop.

Categories of the 30 surviving mutants

Grouped by mutation kind for batch-fixing efficiency:

CategoryCountExamplesLikely fix
Comparison-operator boundaries (<<=, >>=)13pipeline/mod.rs:333:39, title/clean.rs:154:30Add fixtures at boundary values
Logical operator (&&||)4title/clean.rs:154:34, :225:28, :306:27, :492:9Test both branches independently
Arithmetic (+/-/*)4title/clean.rs:304:26, pipeline/mod.rs:422:33, :555:35Assert exact computed values, not just non-zero
Logical negation deletion (delete !)2pipeline/mod.rs:325:16, :391:12Test the inverse-condition path
Function-stub replacements (returns 0, 1, -1, "")5title/clean.rs:372:9 (casing_score 3×), :303:5 (strip_extension)Assert specific return values, not just non-empty/non-zero
Equality (==!=)2pipeline/mod.rs:565:51, title/clean.rs:502:65Test the negative case

Full surviving-mutant list is in mutants.out/missed.txt (downloadable as the mutants-out artifact).

Hot spot: pick_better_casing::casing_score

Three mutations to this function survived (all three function-stub replacements: return 0, return 1, return -1). Plus its caller at :388:24 lost its >= boundary check. The function’s tests don’t actually pin its return value — they presumably check that the right branch is selected downstream, but never assert what the score IS. This is the single highest-leverage fix in the surviving set: pinning casing_score’s output for half a dozen representative inputs would kill 4 mutants in one tiny PR.

Triage actions (deferred to follow-up PRs)

  • Pin casing_score return values — kills 4 mutants in one PR
  • Add boundary-value fixtures for pipeline/mod.rs Pass 1/Pass 2 </> checks — kills ~6 mutants
  • Independent-branch tests for the four && survivors — kills 4
  • Assertion-tightening pass on strip_extension (assert exact output, not just non-empty) — kills 4

Scope (first slice)

The full crate has ~2,876 mutants and would take ~10 hours single-threaded. This first slice scopes the nightly run to two highest-value targets identified in the Mutation testing epic (#146):

FileMutantsWhy
src/pipeline/mod.rs~57Orchestration core — every property runs through here
src/properties/title/clean.rs~99Busiest property module; PR-C #138 added kitchen-sink coverage

Combined run with --jobs 4 on a GitHub-hosted ubuntu runner: ~12–15 min.

Roadmap

Long-term ideas, not actively planned post-#216:

  • Re-enable a nightly workflow if the project ever grows past hobby-scale (multi-developer, downstream library users filing regression-class bugs). The triage protocol below is the workflow.
  • Hard kill-rate gate — only meaningful with a recurring run.
  • Diff-only PR check — useful with a CI cadence; manual on demand for now.

Local usage

Install once (note: requires --locked so the version matches CI):

cargo install cargo-mutants --locked

Run against one file (~5 min for a small file):

cargo mutants --file src/properties/year.rs --no-shuffle

Run against the same scope CI uses:

cargo mutants --no-shuffle --jobs 4 \
  --file src/pipeline/mod.rs \
  --file src/properties/title/clean.rs

Outputs land in ./mutants.out/:

FileContents
outcomes.jsonMachine-readable per-mutant results + counts
missed.txtSurviving mutants (the interesting ones)
caught.txtKilled mutants (good — your tests work)
timeout.txtTests that hung — usually infinite-loop mutations
unviable.txtMutants that didn’t compile (rare, ignorable)

mutants.out/ is gitignored.

Worked example: src/properties/year.rs

A pre-PR smoke run on year.rs (20 mutants, ~5 min) produced 3 surviving mutants that demonstrate the categories we’ll see in nightly results:

Equivalent mutation (accepted survival)

src/properties/year.rs:19:15: replace < with <= in find_matches
#![allow(unused)]
fn main() {
let mut pos = 0;
while pos < input.len() {       // mutation: pos <= input.len()
    let Some(m) = YEAR_RE.find_at(input, pos) else {
        break;
    };
}

When pos == input.len(), Regex::find_at returns None and the loop exits via the else branch on the next line — so < and <= produce identical observable behaviour. Equivalent mutation; document and move on.

Real test gaps (backlog — file as follow-up issues)

src/properties/year.rs:26:22: replace > with < in find_matches
src/properties/year.rs:29:20: replace < with > in find_matches
#![allow(unused)]
fn main() {
// Boundary: no digit before or after.
if m.start() > 0 && bytes[m.start() - 1].is_ascii_digit() {  // L26
    continue;
}
if m.end() < bytes.len() && bytes[m.end()].is_ascii_digit() { // L29
    continue;
}
}

Both mutations bypass the boundary check (the inverted comparison short-circuits via && so the check never runs). They survive because no test exercises a year touching the start or end of the input string. Trivial fix: add fixtures like 2020 (year alone), 12020.mkv (digit prefix), 20201.mkv (digit suffix) and assert the boundary rejection.

These two are not fixed in this PR — that’s deliberate. This PR sets up the infrastructure to find findings; fixing them is the next loop.

Triage protocol

When a local cargo mutants run produces surviving mutants:

  1. Equivalent mutation? (the mutation produces identical observable behaviour) → add a one-line entry to the “Accepted equivalents” table below with the mutation string + a one-sentence rationale.
  2. Real test gap? → file a tech-debt issue with the mutation string in the title, or fix it directly in the same PR if scope allows.
  3. Tool bug / unviable mis-classification? → file upstream at https://github.com/sourcefrog/cargo-mutants.

Accepted equivalents

MutationWhy it’s equivalentAccepted on
src/properties/year.rs:19:15: replace < with <= in find_matchesfind_at(input, input.len()) returns None; < and <= produce identical loop behaviour.2026-04-18 (smoke run)

(Future entries get appended as they’re triaged.)

References

  • cargo-mutants book
  • Epic #146
  • Sibling: code coverage #145 / coverage.md
  • Industry benchmark: 80% kill rate is the rough north star for parser code (mature mutation-tested Rust crates land 75–90%).