Changelog
This page is rendered from CHANGELOG.md
at the root of the repository (single source of truth). Format follows
Keep a Changelog and the project adheres
to Semantic Versioning per the
API Stability Policy.
Changelog
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[Unreleased]
2.0.1 - 2026-04-26
Fixed
- False
AVCHDvideo profile for bareAVCtoken.AVCis the codec name for H.264 and carries no profile information on its own.AVCHD(Advanced Video Codec High Definition) is a specific consumer camcorder delivery format and should only fire on the literalavchdtoken. The incorrect mapping caused filenames containing bareAVC(e.g. multi-audio CJK releases) to gain a spuriousvideo_profile: "Advanced Video Codec High Definition"field. Fixed by removing theavcentry fromvideo_profile.toml’s[exact]table while keepingavchd. Regression fixture added totests/fixtures/community.yml. (#237, #238)
Docs
-
Documented the D2 boundary (vocabulary in TOML, logic in Rust) with a decision table in DESIGN.md and per-module “Why this lives in Rust” header docstrings on the 14 inline-regex property modules (
date,episodes,release_group,title,part,website,episode_count,bonus,uuid,year,version,crc32,aspect_ratio,size,bit_rate). Closes the audit thread from the now-resolved #143 epic. Pure docs — no behavior change. Net diff: +153 lines across 16 files. -
README polish. Replaced the stale
Coveragebadge (the underlying CI job was deleted in #216 — the 94.34% number is frozen forever) with the standard four-badge row: CI status, crates.io version, docs.rs, and license. Scaled back the “Real-world accuracy” section to point at the live compatibility report only — the prior personal-library anecdote (“99.8% across 7,838 files”) was a single ad-hoc data point, not a reproducible measurement, and the section header now matches what’s actually claimed (“Accuracy”). Dropped the inline Contributing and License sections — the new license badge links toLICENSE, andCONTRIBUTING.mdstays in the repo root next to the README. Net diff: −13 lines.
CI
cargo semver-checksis now a required CI gate (previously advisory). Any PR that introduces a SemVer-incompatible public API change will now hard-fail CI rather than emit a warning. Enforced viaobi1kenobi/cargo-semver-checks-action. (#229)
Dependencies
- Bumped
dependabot/fetch-metadata2.5.0 → 3.1.0 - Bumped
taiki-e/install-action2.75.17 → 2.75.20 - Bumped
obi1kenobi/cargo-semver-checks-action2.8 → 2.9 - Bumped Rust minor/patch toolchain group (Dependabot auto-merge)
2.0.0 - 2026-04-20
Removed
benches/directory and thecargo benchharness. The Criterion setup (5 micro-benches) was over-engineered for a hobby-scale filename parser — the benchmark workflow it served was deleted in #217. Dropping the harness now (along with thecriteriondev-dependency, the[[bench]]Cargo entry, and the dependent mdbook pages:benchmarks.md,benchmark-dashboard.md,release-trajectory.md) eliminates ~100 LOC of dev infra plus four doc pages whose content was stale the moment the workflow stopped publishing snapshots. (#218 follow-up)fuzz/directory and thecargo-fuzzinfrastructure. Two fuzz targets (parse_filename,parse_with_context) plus corpus seeds plus thecontributor-guide/fuzzing.mdmdbook page. The fuzzing workflow was deleted in #217; manual contributor fuzzing isn’t being done in practice. The library is small and deterministic enough that the existing 612-test integration suite is the right testing layer for our scale. (#218 follow-up, #222)- CI workflow over-engineering. The
coverage(cargo-llvm-cov),api-surface(cargo-public-api drift gate), andmutants(nightly cargo-mutants) jobs were all dropped in #216 alongside the entiremutants.ymlworkflow. Thebenchmark.ymlandfuzz.ymlworkflows were dropped in #217. Rationale: a single-author hobby crate does not need 7 quality-gate workflows running on every PR. The four jobs that matter —fmt,clippy,test(Linux/macOS/Windows),audit— remain. Thesemveradvisory job also survives. The mutation-test work that landed in #180–#185 left permanent regression coverage intests/, so that quality investment outlives the workflow.
Changed
#[must_use]onHunchResultandPipeline. Catches the easy mistake of dropping a parsed result or constructing a Pipeline without ever calling.run(). Also added explicit#[must_use]on the fourHunchResultaccessors that return non-must-use types (confidence(),is_movie(),is_episode(),is_extra()). The remaining accessors returnOption<T>/Vec<T>which are already#[must_use]in std — no need to repeat. (#205, bundled in #218)
Refactored
- Moved
rules/tosrc/rules/for compile-time co-location. The 21 TOML data files are embedded into the binary at compile time viainclude_str!frompipeline/rule_registry.rs— they’re not external configuration, not user-tunable at runtime, and have no purpose outside this crate. Top-levelrules/was misleading (reading as nginx-style runtime config when it’s actually frozen Rust data). The../../rules/X.tomlpaths inrule_registry.rswere the universal “this should be local” code smell pointing here. Pure restructure: zero behavior change, all 21include_str!paths- 17 doc-comment refs updated, file history preserved via
git mv. (#223)
- 17 doc-comment refs updated, file history preserved via
Docs
-
Slimmed
README.mdfrom 178 → 89 lines (-50%). Now that we have a proper mdbook at https://lijunzh.github.io/hunch, the README can stop trying to be canonical for everything. The verbose--batch -rtip and the four “Known Limitations” subsections (~60 lines of edge-case essays) moved to the newdocs/src/user-guide/known-limitations.mdmdbook page; the README links to it. Documentation table tightened: dropped dead bench dashboard row (page deleted), added Migration Guide + Known Limitations rows. (#224) -
New
docs/src/about/migration-v2.mdpage consolidating the v2.0.0 breaking changes (Property::BitRateremoval + deep-import deprecation) in one mdbook destination, so callers don’t have to scrape the changelog. Linked fromSUMMARY.md. (#201, bundled in #218) -
DESIGN.mdpipeline module map updated from the stale 5-file list to the actual 9 files (mod,matching,context,token_context,zone_rules,invariance,pass2_helpers,proper_count,rule_registry). (#200, bundled in #218) -
DESIGN.mdD9 now documents the third class of property matchers: derived properties (computed at result-build time from another property’s value). Currently the only one isProperty::Mimetype, derived fromContainer. (#203, bundled in #218) -
README.mdno longer duplicates the guessit pass-rate stats that live in the live compatibility report. The README now links and the per-property numbers stay in their single source of truth (regenerated fromcargo test -- --ignored guessit_compat). The hard-coded# 295 testscomment in the contribution snippet is also gone — it had drifted to ~612 and the count was never load-bearing. (#202, #204, bundled in #218)
Fixed
Show/Extras/Bonus.mkvno longer inherits unrelated sibling titles via the ancestor cache. The CLI’s inheritance-blocking predicate (previouslyis_sample_dir, nowis_inheritance_blocking_dir) coveredsample/samples/subs/subtitles/featurettesbut missed the equally commonextras/extra/specials/bonus. In--batch -rmode, that gap let an unrelated movie title at the batch root leak into Extras subtrees of an adjacent show. (#208)- CJK fansub patterns
[Nth - NN]and[总第NN]are now parsed as episode markers instead of being absorbed into the title. Catches real-world filenames from the Re:Zero / 12 Kingdoms / similar fansub release groups. (#212, #213) - Ancestor-path
Sourcematches are dropped when the filename itself carries a Source token. Prevents directory-level source hints (e.g.BluRay/Show.S01E01.WEB-DL.mkvresolving toBluRay) from overriding the more specific filename-level signal in--batch -rmode. (#212, #215)
Security
list_media_filesnow skips symlinks, mirroring the hardening already applied towalk_dir_innerfor--batch -r. The function backs both--contextmode and--batch <dir>(without-r); the previous use ofPath::is_file()followed symlinks, allowing an attacker who controls files inside the user-chosen directory to inject crafted basenames from outside the directory into the parser. Hunch only reads basenames (not file contents), so the impact was low — but matchingwalk_dir’s defense story keeps both CLI entry points consistent. (#209)
Added
HunchResult::is_movie(),is_episode(),is_extra()convenience methods. Pure derived getters over the existingmedia_type()typed accessor. All three returnfalsewhen media type is unknown rather than defaulting to a guess — callers needing to distinguish “definitely not X” from “unknown” should still usemedia_type()directly. (#156)Property::AudioBitRate,Property::VideoBitRate,Property::Mimetypevariants with matchingHunchResult::audio_bit_rate(),video_bit_rate(),mimetype()accessors. The bit-rate split is classified by unit (Kbps→ audio,Mbps→ video); mimetype is a pure derivation from container extension (mp4 → video/mp4, mkv → video/x-matroska, etc.; unknown →None, never fabricated). All three properties moved from 0% to 100% accuracy on the compatibility corpus. (#158, #165)- DVD region codes R0–R6 in the property exact-match table. Previously only R5 was recognized. R7–R9 are intentionally omitted to limit false positives on niche release-group tokens. (#156)
Changed
-
⚠️ BREAKING: removed
Property::BitRatevariant. Deprecated in this same release wave (#165) and unreachable from any parser path since the bit-rate split landed: the regex captures[KkMm]and both branches map toProperty::AudioBitRate(Kbps) orProperty::VideoBitRate(Mbps). The previous “defensive fallback” was dead code. Removing it now (under the v2.0.0 major bump) avoids forcing a v3.0.0 just to delete one variant later.Migration: if your code matches on
Property::BitRate, switch to the unit-typed variants. The#[non_exhaustive]annotation already requires a wildcard arm, so the diff is usually a one-liner:#![allow(unused)] fn main() { match prop { // Before: Property::BitRate => handle_either(value), // After: Property::AudioBitRate => handle_audio(value), Property::VideoBitRate => handle_video(value), _ => {} } }The matching
bit_rateJSON output key is also gone; downstream JSON consumers should readaudio_bit_rate/video_bit_rate. (#144, #165) -
⚠️ BREAKING: public module surface dramatically reduced. Four sub-modules were demoted from
pub modtopub(crate) mod:matcher,properties,tokenizer,zone_map. The intended public API —hunch(),hunch_with_context(),Pipeline,HunchResult,Confidence,MediaType,Property— is unchanged and remains reachable at the crate root via the existingpub usere-exports insrc/lib.rs.What this breaks: any downstream code using deep import paths like
use hunch::matcher::span::Property;oruse hunch::tokenizer::Token;.Migration: switch deep imports to the crate-root re-exports:
#![allow(unused)] fn main() { // Before (v1.x): use hunch::matcher::span::Property; use hunch::matcher::span::MatchSpan; // no longer reachable // After (v2.0.0): use hunch::Property; // re-exported at crate root // MatchSpan is now internal — use HunchResult accessors instead }Public surface impact: 853 lines → 202 lines (76% reduction). Internal helpers like
matcher::engine::resolve_conflicts,regex_utils::{CharClass, BoundarySpec, BoundedRegex},tokenizer::{Token, Segment, BracketGroup}, andzone_map::ZoneMapare no longer part of the SemVer contract.Why: the
pub moddeclarations were leaking ~188 internal items into the public API by accident. Locking these in as v2.0.0 commitments would have made every internal refactor a SemVer hazard. The audit also surfaced legitimate dead code (4 unused methods, 2 unused re-exports, 6 unused fields) which is removed or marked#[allow(dead_code)]with an explanatory note. (#144) -
⚠️ BREAKING:
MatchSpanbuilder methods renamedas_*→with_*.as_extension→with_extension,as_path_based→with_path_based,as_reclaimable→with_reclaimable. These were never user-facing (nowpub(crate)) so the rename only affects internal callers; no migration needed for downstream code. The rename brings them in line with the existingwith_priority/with_sourcebuilders and resolves theclippy::wrong_self_conventionlint (consuming builders conventionally usewith_*). (#144) -
⚠️ BREAKING: public enums now carry
#[non_exhaustive]. Affected enums:Property,MediaType,Confidence,SegmentKind,Source,ZoneScope,Separator,BracketKind,CharClass(every public enum reachable from the crate). Downstream code that matches exhaustively on these enums must add a wildcard arm:#![allow(unused)] fn main() { match prop { Property::Title => ..., // ... existing arms ... _ => ..., // ← now required } }Why: this lets future minor releases add new variants (the bit-rate split in #165 was the immediate trigger) without re-breaking the API every time.
ConfidenceandSegmentKindwere caught by the v2.0.0 prerelease audit (#196) — everypub enumin the crate is now consistently#[non_exhaustive]. (#172, #196)
Fixed
- Website false-positives on country-code TLDs inside language
abbreviations. Filenames like
Community.s02e20.rus.eng.720p.mkvno longer extracts02e20.ruas a website. The TLD alternation now requires a trailing word boundary, so.rucannot match inside.rus,.cominside.community, etc. (#163, #167) - Anime-release bit-rate notation (
kbit,mbits) now parsed correctly via suffix alternation. (#165) DD5.1.448kbps-style filenames no longer mis-parse the leading digits as part of the bit-rate (regex bound tightened to\d{1,2}). (#165)
Internal / Infrastructure
This release lands a substantial documentation investment motivated by the project moving from “experimental, no users” to “users filing real bug reports.” None of the items below change parser behavior, but they meaningfully improve the project’s ability to catch regressions before they ship.
What survived to v2.0.0:
- Documentation portal at https://lijunzh.github.io/hunch/ built with mdbook. (#188, #190)
- Release pipeline hardening — PR-time CI now also runs on release branches; release workflow is more defensive. (#150, #151, #152, #159)
- Misc test additions pinning behaviors against future regressions:
TitleStrategyfallback ordering (#154, #161),cli_walk_dirsafety boundaries (#153, #162), parse-torrent-name corpus pins (#157, #164). - Mutation-killing test additions from the cargo-mutants triage
pass survive as permanent regression coverage in
tests/even though the nightlymutants.ymlworkflow itself was dropped: 29 mutants killed across #175, #180, #181, #182, #183, #184, #185.
What was added during the cycle and then rolled back (see Removed):
The CI infrastructure burst between v1.1.x and v2.0.0 — cargo-llvm-cov coverage tracking (#145, #168), nightly cargo-mutants (#146, #169, #170, #173), cargo-fuzz (#147, #174), continuous benchmarking via criterion + github-action-benchmark (#148, #176, #177, #178, #179, #186, #189, #191, #192, #194), and the cargo-public-api surface tripwire (#144, #171) — all got built and then trimmed in #216, #217, #222 once we acknowledged this is a single-developer hobby crate. The investment paid for itself in permanent test additions (above) and in the public API audit it drove (#197), but the workflows themselves were over-engineered for the project’s actual scale.
1.1.8 - 2026-04-17
Changed
--batch -rnow bounds recursion depth and skips symlinks. Recursive directory walks (hunch --batch <dir> -r) cap at 32 levels deep and silently skip symbolic links — both regular files and directories. Defends against denial-of-service via deeply nested trees (stack overflow) and symlink loops (infinite recursion). Users with curated libraries that rely on symlinks (e.g., aMovies/directory built from NAS symlinks) will see fewer or zero results in v1.1.8 — either follow the symlinks before invoking hunch, or run hunch on the original directory tree. (#137)
Fixed
- Anime titles containing
" - "and"Part N"— in[Group] Show - Sub Part 2 - 13 [tags]style filenames, the title is now extracted as the fullShow - Sub Part 2. Previously the parser truncated at the first" - "and incorrectly extractedPart 2as a standalonepartproperty. (#124, #127)
Refactored
- Pipeline
rule_registryextracted frompipeline/mod.rsinto its own module. Centralizes the legacy / TOML rule registration so the pipeline orchestration stays at the orchestration layer of abstraction. (#134) - Title
find_title_boundaryrenamed for clarity, with documented semantics and a pinned caveat preventing accidental re-introduction of the pre-rename behavior. (#128 Debt #4, #133) - Title fallback extractors unified behind a new
TitleStrategytrait. The 5–6 ad-hoc extractor functions are now first-class strategy types inproperties/title/strategies/, registered in a single ordered fallback list. (#128 Debt #1, #132) - Part reclaimable when Episode present.
Part Nmatches in the same set as anEpisodematch are now marked reclaimable so the existing title-absorption step can fold them into the title uniformly. Replaces the bespokeabsorb_part_into_titlepost-hoc corrector (in line with the D10 “no post-hoc correctors” tripwire). (#128 Debt #3, #131) clean_titledecomposed into composable transforms (strip_*,normalize_separators,trim_trailing_punct,strip_trailing_keywords,clean_title_preserve_dashes,DashPolicy). Each transform is individually testable and composable;clean_titlebecomes a thin orchestrator. (#128 Debt #2, #130)mark_reclaimable_when_episode_presentvisibility tightened frompubtopub(crate). Internal-only helper; never intended as part of the public API surface. (release-prep)
Tests
- Three regression scenarios pinned as named tests in dedicated files: flat-batch warning hint, parent-context propagation, and wrong-type path inference. Prevents silent regression of behaviors that previously had only ad-hoc coverage. (#138)
tests/cli_walk_dir_safety.rsadded alongside #137 with four scenarios: deep-tree depth bound (40 levels, control file at depth 1); realistic-depth happy path (depth 6);cfg(unix)symlink-loop containment (counts occurrences to prove non-following); outside-root symlink-escape rejection. (#137)
Docs
SECURITY.mdadded at repo root with threat model, vulnerability reporting procedure (private GitHub Security Advisories), and explicit in-scope / out-of-scope categorization. (#139)- API Stability Policy added to
CONTRIBUTING.mddocumenting the hard vs. soft public-API contract:hunch::Pipeline,HunchResult,MediaType,Confidence,Property, and the top-levelhunch()/hunch_with_context()functions are SemVer-stable;properties::*submodules are explicitly unstable. (#139) DESIGN.mdpromoted to a root-level document (wasdocs/design.md). Adds D10 “Refactor before accreting” with three concrete tripwire rules: no post-hoc correctors, no parallel matchers, no growing dispatchers. (#129, #135)docs/user_manual.mdupdated to document-rrecursion behavior: symlinks are skipped (loop-safe), traversal stops at 32 levels deep. (release-prep, paired with #137)- Doc drift cleanup — README, CONTRIBUTING, user_manual, and compatibility cross-references audited and refreshed against current source state. (#136)
- Compatibility report refreshed: 1072 / 1311 fixtures pass (81.8%), up from 1071 / 1309 in v1.1.7 (two fixtures added, one new pass). (release-prep)
CI
cargo-semver-checksPR-time gate added. Detects accidental SemVer-incompatible changes to the public Rust API by comparing PR head against the latest crates.io release. Blocks breaking changes within a major version line. (#142)- Cross-OS PR matrix —
CheckandTestjobs now run on ubuntu-latest, macos-latest, and windows-latest. Catches platform-conditional compile errors and path-handling differences before release time. (#141) - Security hardening of CI workflows. All third-party actions SHA-pinned
with version comments (defends against tag-republishing supply-chain
attacks).
cargo auditnow hard-fails on RUSTSEC vulnerabilities (was silenced by|| true). Dependabot auto-merge metadata-gated to patches-only and dev/CI-tooling minor bumps; major bumps and runtime-dep minor bumps now require manual review. Two yanked transitive dev-deps refreshed (js-sys 0.3.88→0.3.95,wasm-bindgen 0.2.111→0.2.118). Defaultpermissions: contents: readonci.yml. (#140)
Repository governance
.gitignorehardened with broad patterns for accidental secret / credential commits (.env*,*.pem,*.key,id_rsa*,secrets*,credentials.json,service-account*.json). (#139)
1.1.7 - 2026-03-23
Fixed
- Bracket metadata leakage — bracketed metadata in CJK/anime filenames no
longer leaks into
episode_title, and release-group extraction now prefers the actual first bracket group instead of bracket fragments. (#92) - Generic category directories — library/category directories like
English/,Japanese/,Anime/, and CJK bonus folders are filtered more aggressively so they do not become titles. (#95) - Parent-context fallback in batch mode — files in sparse extras/specials subdirectories now fall back to parent-directory context more reliably during recursive batch parsing. (#96)
- Empty intermediate directory propagation — recursive batch parsing now preserves useful parent context through empty/intermediate directory layers instead of dropping title hints. (#98)
- Explicit movie signals override
tv/path hints — filenames and parent directories containing strong movie cues such asThe Movie,... Movie, and劇場版now classify astype=movieeven inside TV-oriented directory trees. (#99) - Natural-language first brackets — filenames like
[Kimetsu no Yaiba Mugen Ressha Hen][JPN+ENG]...now treat the first bracket astitlewhen it looks like natural language instead of a release group. (#100)
Docs
- Added a README Known Limitations section documenting the main remaining edge-case categories and their tradeoffs. (#103)
1.1.6 - 2026-03-22
Added
MediaType::Extra— new media type variant for supplementary content (NCED, NCOP, OP, ED, SP, PV, CM, OVA, OAD, ONA, Menu, Tokuten). Files withepisode_detailsbut no episode/season/date markers now returntype=extrainstead oftype=episode. The specific marker remains accessible viaepisode_details(). (#89)- Recursive
--batch -r— new-r/--recursiveflag walks the full directory tree and groups siblings per-directory. Enables cross-file title extraction for deeply nested libraries (tv/Show/Season 1/01.mkv→title: "Show"). (#66) - Library ergonomics —
Propertyre-exported at crate root (use hunch::Property); 10 new typed accessors onHunchResult(episode_details(),language(),languages(),subtitle_language(),subtitle_languages(),bonus(),date(),film(),disc(),media_type());MatchSpan::valueimplementsAsRef<str>. (#73) - Flat
--batchwarning — when--batch <dir>is used without-rand subdirectories contain media files being skipped, hunch prints a hint to stderr suggesting--batch -r. (#74)
Fixed
- “Movie N” parsed as episode —
Detective.Conan.Movie.10...in amovie/directory now returnstype=movie. Bare number matches at HEURISTIC priority lose to movie-directory path context; strong S/E markers still win. (#88) - Missing anime bonus markers — SP, OVA, OAD, ONA, OP, ED, and MENU
tokens now emit
episode_details, fixing classification of common anime BD bonus content. (#68) - Batch mode parent dir fallback —
--batchnow passesparent_dir/filenameto the pipeline soextract_title_from_parent()has directory context. Fixes ~860 files that previously parsed without a title. (#62) - Batch siblings invariance — siblings passed to the invariance engine now include the parent directory path so the invariant title text (e.g., “Paw Patrol”) is correctly identified and suppressed from episode titles. (#63)
Changed
- Named priority constants — new
src/priority.rsmodule exposesSTRUCTURAL,KEYWORD,VOCABULARY,DEFAULT,HEURISTIC,POSITIONALtiers (and others) as named constants. Replaces magic integers throughout the codebase. (#85) - Named zone rules — zone rules are now referred to by descriptive
names (e.g.,
language_in_title_zone) instead of numbers (Rule 1, Rule 2, …). (#86)
Docs
- Added
--batch -rflag to CLI help, README, and user manual. (#69) - Added P5 principle (surface ambiguity) and updated D6 in design.md. (#76)
- Restructured design.md: separated principles, decisions, and boundaries into distinct sections. (#77, #78)
- Added Mission section to design.md — hunch is not a guessit port. (#79)
- Scoped D7 to reflect reality; acknowledged D9 matcher classes. (#84)
Tests
- Added CLI integration tests for the flat-batch subdirectory warning. (#75)
1.1.5 - 2026-03-20
Added
- CJK episode markers (
第N話,第N集,第N回,第N话) — structural pattern recognition for Japanese and Chinese episode numbering. Full-width digit normalization (0-9 → 0-9) included. (#46) - Anime bonus vocabulary — NCOP, NCED, PV, CM tokens emit
EpisodeDetails, correctly classifying bonus content as episodes. (#46) - Path-based type inference — directory names (
tv/,anime/,donghua/,Season N/,sN/) forceMediaType::Episodeeven when the filename alone lacks episode markers. (#46) - InvarianceReport with year/episode signal detection — cross-file sequential analysis identifies bare numbers as episodes and suppresses invariant years from metadata. (#47, #48)
- Source tagging (
Structural,Context,Heuristic) on allMatchSpans — heuristic-only results cap confidence at Medium. (#47, #48) - 28 new integration tests (370 → 386 total) covering CJK markers, path inference, invariance signals, cross-feature interactions, and panic safety edge cases.
Changed
find_invariant_textnow returns(usize, String)— pre-computed byte offset eliminates fragileinput.find()re-search that could match the wrong occurrence for short/repeated title strings.find_invariant_textaccepts&[&[UnclaimedGap]]instead of cloning all gap Vecs (zero-copy).- Year signal expansion sorts signals by
.startbefore the loop, preventing non-adjacent text from being glued into titles. - Heuristic eviction guard —
apply_invariance_signalsnow checks for non-heuristic overlaps before evicting heuristic matches, preventing data loss when a codec or screen-size match occupies the same span. - Trailing Part regex hoisted to
LazyLock<Regex>(was compiled per-call in episode title extraction). is_episode_directoryusesstrip_prefix('s')instead ofcomponent[1..]byte indexing for safe UTF-8 handling.
Fixed
CODEC_NUMBERSshared constant (264, 265, 128) — extracted from duplicated checks ininvariance.rsandepisodes/mod.rs. (DRY)- Stale SP comment orphan removed from
anime_bonus.toml. - Unused
_inputparameter removed fromapply_invariance_signals. .unwrap()→.expect()on CJK regex capture groups.
1.1.4 - 2026-03-20
Added
- Cross-file context for title extraction (
run_with_context,hunch_with_context) — when sibling filenames are provided, hunch identifies the invariant text across files as the title. Dramatically improves CJK and non-standard filename parsing. (#47) - CLI
--context <dir>flag — use sibling files from a directory for improved title detection. - CLI
--batch <dir>flag — parse all media files in a directory with mutual cross-file context. Confidenceenum onHunchResult—High | Medium | Lowbased on structural signals (tech anchors, title quality, cross-file context).- Low-confidence CLI warning suggesting
--contextwhen results are uncertain. - Architecture documentation for cross-file context design decisions. (#48)
- 10 matching constraint tests covering
not_before,not_after,requires_context,requires_nearby, side effects, compound windows, zone scoping, and reclaimable matches.
Changed
- Pipeline refactored into
pass1()/pass2()for reuse by cross-file context. No behavior change for existingrun()callers. Token::lower()now cached — lowercased text computed once at tokenization, eliminating 6+ redundant allocations per token in matching.trim_title_suffixzero-alloc — uses&strslices instead of cloning in a loop.- CLI deps feature-gated —
clapandenv_loggernow behind theclifeature (enabled by default). Library consumers no longer pull in CLI dependencies. --batchnow properly conflicts with positional filename args.list_media_filessignature:&PathBuf→&Path(idiomatic Rust).
Fixed
- Stale doc-links pointing to
hunchinstead ofhunch_with_context. Pipelinedoc comment merged withSegmentScopedoc (missing blank line).- ARCHITECTURE.md pass rate updated to 81.8%.
- README.md: removed deleted
options.rs, updated test count to 333.
1.1.3 - 2026-03-19
Changed
- Overall pass rate: 81.7% → 82.2% (1,069 → 1,076 / 1,309).
- Structure-aware neighbor-context disambiguation — replaced fragile
positional heuristics (“first half of title zone”, “before the anchor”,
“unmatched bytes ratio”) with principled structural reasoning based on
what actually surrounds each token. New
token_contextmodule provides:- Neighbor roles: Score adjacent tokens as title words vs tech tokens.
- Peer reinforcement: Adjacent tokens of the same property type (e.g., FRENCH next to ENGLISH) signal a metadata cluster.
- Structural separators: Tokens after “ - “ or in brackets are metadata, not title content.
- Structural fallback: Edge-of-segment tokens use position relative to first tech anchor as tiebreaker.
- Duplicate detection: Same value in firm tech context elsewhere drops the title-zone instance.
- Structure-aware episode title extraction — episode title is now extracted from whichever path segment contains the episode anchor, not hardcoded to the leaf filename.
- TOML-driven disambiguation — new
requires_nearbyandreclaimablefields in TOML rules reduce Rust-side special-casing.
Improved
- language: 80.3% → 81.0% — neighbor context + peer reinforcement.
- title: 91.8% → 92.0% — better language filtering.
- episode_title: 73.6% → 76.1% — parent-dir extraction, boundary fixes.
- other: 88.8% → 89.1% — TOML-driven
requires_nearbyfor “Proper”.
Fixed
- Episode title extraction from parent directories when the leaf filename
contains only a numeric code (e.g.,
Bones.S12E02.The.Brain.In.The.Bot .1080p.WEB-DL/161219_06.mkv→ episode_title: “The Brain In The Bot”). - Language “FR” after “ - “ separator no longer dropped
(
Love Gourou (Mike Myers) - FR→ language: French). - Adjacent language tokens now reinforce each other as metadata
(
QC.FRENCH.ENGLISH.NTSC→ both languages detected). - JSON numeric coercion limited to semantically numeric properties.
- Added BDMux/BRMux/BDRipMux/BRRipMux source patterns.
- Multi-segment alternative_title with earliest-boundary fix.
Refactored
Propertyenum usesdefine_properties!macro (DRY).- 8 positional args replaced with
MatchContextstruct. known_tokens.rsrenamed tovalidation.rs.
Removed
Optionsstruct,hunch_with(),--type/--name-onlyCLI flags. These were dead code from v1.0.0 (never wired into the pipeline).src/options.rsmodule deleted.
1.1.2 - 2026-02-28
Fixed
- docs.rs build — added
rust-version = "1.85"and[package.metadata.docs.rs]toCargo.toml. Edition 2024 requires Rust 1.85+; docs.rs needs this hint to select a compatible toolchain. Versions 1.0.0–1.1.1 failed to build on docs.rs for this reason.
1.1.1 - 2026-02-28
Fixed
cargo fmt— applied rustfmt to all files modified in v1.1.0. No logic changes; line wrapping only.
1.1.0 - 2026-02-28
Added
- Structured logging — integrated the
logcrate withdebug!andtrace!instrumentation across the full pipeline. Each stage (tokenize, zone map, matching, conflict resolution, zone disambiguation, title extraction) emits diagnostic messages. Zero runtime cost when no subscriber is attached. --verbose/-vCLI flag — enableshunch=debuglogging viaenv_logger. Users can also setRUST_LOG=hunch=tracefor per-match detail.env_loggerdependency — powers CLI log output.#![warn(missing_docs)]— compiler lint prevents future doc regressions.- 15 new doc-tests — all rustdoc examples are compiled and run as
part of
cargo test(total: 295 tests).
Changed
- Comprehensive Rustdoc coverage — 81 missing-doc warnings → 0:
- All 49
Propertyenum variants documented with example values. HunchResult,Options,Pipeline,MatchSpan,MediaTypeenriched with usage examples and cross-links.hunch_with()fully documented with two worked examples.- Crate-level docs (
lib.rs) expanded: Quick Start, Options, Property access, Multi-valued, JSON output, Logging, Architecture. - All 15
find_matches()functions documented. SideEffect,BoundedRegex,TitleYearfields documented.- Internal modules (
matcher,properties) marked with stability notes.
- All 49
- README.md — added Logging section,
--verboseflag,Optionsexample, API Documentation section with docs.rs links, updated test count (295). - CLI error handling — JSON serialization errors now print to stderr and exit(1) instead of silently producing empty output.
Fixed
- ~30 bare
.unwrap()calls replaced with descriptive.expect()messages acrosszone_map.rs,bit_rate.rs,size.rs,uuid.rs,crc32.rs,year.rs,version.rs,proper_count.rs,release_group/mod.rs,episodes/mod.rs,episodes/patterns.rs. - O(n²) comment added to
resolve_conflicts()documenting algorithmic complexity and future optimization path. #[allow(dead_code)]onOptionsannotated with TODO explaining plannedmedia_type/expected_titlewiring.
1.0.1 - 2026-02-28
Fixed
- Documentation patch — v1.0.0 shipped with incorrect compatibility numbers in README. This release corrects all documentation to match actual test results (81.7%, 1,069 / 1,309).
- Updated COMPATIBILITY.md version reference to v1.0.1.
- Added missing CHANGELOG entries for v1.0.0 and v1.0.1.
1.0.0 - 2026-02-28
Changed
- Stable release — first non-pre-release version.
- Removed “in progress” / “developing” warnings from all documentation.
- Updated all compatibility numbers to match current test results.
- CLI description updated.
Summary
- 81.7% compatibility with guessit’s 1,309-case YAML test suite.
- 22 properties at 95%+ accuracy, 16 at 100%.
- All 49 properties implemented (3 intentionally diverged).
- Zero-dependency on network, databases, or ML.
- Single binary, TOML rules embedded at compile time.
0.3.1 - 2026-02-27
Fixed
- Language/subtitle_language disambiguation — Add zone Rule 8 to
suppress Language matches contained within SubtitleLanguage spans.
Fixes cases like
ENG.-.FR SubwhereFRwas incorrectly detected as both language and subtitle_language. - Subtitle language 2-letter codes — Add ISO 639-1 codes (FR, SV,
DE, etc.) to the
LANG SUBSregex. Patterns likeFR SubandSV Subnow correctly produce subtitle_language matches. - Bracket subtitle over-matching — Tighten the
SUB_LANGregex separator class to exclude)}], preventing greedy matches that consumed content past closing brackets (e.g.,St{Fr-Eng}.Chaps]). Multi-language bracket patterns likeSt{Fr-Eng}now correctly extract both languages. - Remove unused
is_episode_property— Dead code cleanup.
Changed
- language.yml pass rate — 66.7% → 100% (ratcheted to 98%).
- Enable Language rules in directory segments — Language TOML matching now applies to directory components with per-directory zone filtering.
- LC-AAC audio profile — Added Low Complexity pattern.
- Space-separated episode numbers — Zero-padded episode numbers with spaces are now detected.
- Spanish season keyword —
Temprecognized as Temporada. - Bonus without film/year — Implies episode media type.
- Portuguese ‘pt’ code — Added ISO 639-1 code for language matching.
- Multi-dot release groups — Names like
YTS.LTare merged. - Mid-filename bracket release groups — Detection improved.
- Bracket trailing strip — Metadata cleanup for release groups.
- Episode title paren fix — Don’t truncate at parens with digits.
- Bracket ‘/’ skip — Skip bracket groups with slashes in RG detection.
- Episode title separator — Strip leading separators.
- Per-directory Other rules — Other property matching with zone filtering.
- Compound bracket groups — Tokenizer model improvements.
0.3.0 - 2026-02-26
Added
- Two-pass pipeline — Release group extraction runs after conflict resolution (Pass 2), using resolved match positions instead of a 130-token exclusion list.
- Position-based release group validation —
is_position_claimed()checks candidate spans against resolved tech matches. Replaces the DRY-violatingis_known_token()function. - Bracket group model —
BracketGroupstruct in tokenizer tracks matched bracket pairs (Square, Round, Curly) with positions and content. - Per-directory zone maps —
SegmentZoneprovides title/tech zone boundaries for directory segments. TOML zone-scope filtering now works for directory tokens. - TokenStream in Pass 2 — All positional extractors (release_group, title, episode_title, film_title, alternative_title) receive the full TokenStream for bracket-aware and path-aware parsing.
- Suspicious Other detection —
Other:Properin episode titles is treated as title content when the original token text is not a release tag and the next word is not a tech token. - Episode title separator splitting — show title repetition after
-is correctly split from the actual episode title. - Trailing Part stripping — “Part N” at the end of episode titles is stripped (Part is extracted as a separate property).
- EpisodeCount/SeasonCount boundary — episode title extraction starts after episode_count matches, not just episode matches.
- Title: leading tech skip — when filename starts with codec tokens, title extraction skips to the first non-tech gap.
- Zone Rule 1 duplicate language detection — drops language in title zone when the same language appears in the tech zone.
Changed
- Overall pass rate: 79.0% → 80.0% (1,034 → 1,047 / 1,309).
- title: 90.1% → 91.6% — leading codec, language dedup, asterisks.
- release_group: 89.1% → 90.2% — post-resolution, SC/SDH context.
- episode_title: 70.1% → 74.1% — boundaries, Part strip, suspicious Other.
- other: 83.7% → 84.8% — Zone Rule 5 post-RG, HQ adjacency.
release_group::find_matches()signature changed to accept(input, resolved_matches, zone_map, token_stream).- All Pass 2 extractors now accept
token_streamparameter. - Zone Rule 5 moved to
apply_post_release_group_rules()so it can see release group positions.
Fixed
- video_codec.toml: HEVC suffix regex
hevc.+→hevc[a-zA-Z0-9_]+to prevent multi-token window over-matching (e.g., HEVC.Atmos-GROUP). - video_profile.toml: SC/SCH/SDH require preceding codec token
(
requires_before). Prevents false positives where SC is a release group name or SDH means subtitle tag. - Title asterisk stripping:
*treated as separator character. - Episode title REPACK/REAL: checks original input text, not just the Other match value, to distinguish metadata from title content.
Removed
is_known_token()— 130-token exclusion list replaced by position-based overlap detection + 20-token curated non-group list.
0.2.2 - 2026-02-26
Added
requires_beforeconstraint in TOML rule engine — symmetric withrequires_after. A match is rejected unless the previous token (lowercased) is in the list.- Zone Rule 8: Source subsumption dedup — when both a generic source (TV) and a specific source (HDTV) exist, the generic is dropped.
- AmazonHD side_effect —
AmazonHDnow emits bothstreaming_service:Amazon Primeandother:HD. - Tier 2 anchor expansion —
dvd,dvdr,bd,pal,ntsc,secamadded as unambiguous tech vocabulary for zone boundary detection. - Year-as-anchor for zone filtering — when title content before a
year is ≥6 bytes, the year enables zone filtering even without Tier 1/2
anchors. Fixes titles like
A.Common.Title.Special.2014.
Changed
- Overall pass rate: 76.6% → 79.1% (1,003 → 1,036 / 1,309).
- edition: 97.6% → 100% on per-property accuracy.
- source: 95.4% → 97.5% — BD standalone, source dedup.
- title: 89.1% → 90.8% — bracket group boundary detection, year-as-anchor zone filtering, Edition Collector pattern, parent dir after-match extraction.
- other: 81.7% → 84.5% — HQ/LD unrestricted, Complete context, SCR screener, FanSub pruning, Dubbed not_after.
- language: 77.5% → 84.5% — FLEMISH nl-be, Tier 2 anchor improvements.
- episode_title: 70.1% → 72.1% — Date-based anchoring, Part exclusion.
- year: 96.1% → 96.5% — first-paren disambiguation.
- release_group module split into
mod.rs+known_tokens.rs(626 lines → 312 + 190).
Fixed
- HQ standalone → Other:High Quality (was audio_profile:High Quality). AudioProfile HQ now requires AAC prefix.
- LD/HQ moved from tech_only to unrestricted zone scope (fixes detection when appearing before the first Tier 2 tech token).
- Dubbed no longer emits Other:Dubbed after language names (GERMAN.DUBBED → just language, not Other).
- Complete now requires contextual preceding token (season, language, number, source) to avoid false-positive matching on title words.
- Fix requires tech tokens on both sides (
requires_before+requires_after) per guessit semantics. - Edition Collector 2-token pattern added (French reversed form).
- Bracket group titles now apply find_title_boundary
(
[Ayako] Infinite Stratos - IS→Infinite Stratos). - Episode titles no longer stop at Part matches
(
Elements.Part.1.Skyhooks→ full episode title). - Zone Rule 5 extended with adjacency gap and Fan Subtitled value.
0.2.1 - 2026-02-26
Added
bit_rateproperty — detects audio/video bit rates from filename patterns (320Kbps,19.1Mbps,1.5Mbps). Emitted as a singlebit_rate(not split into audio/video — see COMPATIBILITY.md).episode_formatproperty — detects “Minisode” / “Minisodes”.weekproperty — detects “Week 45” in episode context.- Zone map (ZoneMap) — two-phase anchor detection for structural filename analysis. Tier 1+2 anchors establish tech_zone_start; Tier 3 year disambiguation uses that boundary.
zone_scopein TOML rules —tech_onlyandafter_anchorscopes suppress ambiguous tokens in the title zone at match time.- Source side-effects in TOML —
source.tomlnow emits Other:Rip, Other:Screener, Other:Reencoded via declarative side_effects. - Zone Rule 7 — promotes Blu-ray → Ultra HD Blu-ray when UHD/4K/2160p signals exist elsewhere in the filename.
Changed
- Overall pass rate: 78.2% → 76.6% (1,023 → 1,003 / 1,309). Slight regression from eliminating dual-pipeline overlap; source-specific accuracy improved (91% → 100%). See architecture notes below.
- Source: 91.3% → 100% on rules/source.yml fixture.
- Year: 95.2% → 96.1% — improved boundary handling.
Architecture
- Phase A + A.1 complete — ZoneMap, zone_scope filtering, year disambiguation all integrated into pipeline.
- Dual-pipeline eliminated — source.rs retired to TOML-only; subtitle_language.rs trimmed to algorithmic-only (no TOML overlap); language.rs already cooperative (bracket codes only).
- ValuePattern retired — year.rs uses plain Regex; ValuePattern struct and related code deleted from regex_utils.rs.
- Dead legacy code removed — other.rs gutted (282→75 lines); source.rs gutted (288→80 lines).
- File splits for clarity —
pipeline.rs(808 lines) →pipeline/module: mod.rs (600), zone_rules.rs (165), proper_count.rs (68)title.rs(1043 lines) →title/module: mod.rs (365), clean.rs (266), secondary.rs (253)episodes/mod.rsfind_matches (640-line function) → 25-line orchestrator + 6 named category functions
- Renamed
other_weak.toml→other_positional.tomlfor clarity. episode_details.tomltagged withzone_scope = "tech_only", retiring zone Rule 4.- Zone Rule 1 (language in title zone) now uses ZoneMap boundaries directly instead of re-deriving from match positions.
- cargo clippy clean — zero warnings.
Fixed
- Title: “The 100” pattern — absolute episode candidates before the first S/E span are now skipped.
- Title: trailing keywords — strip trailing
Episode/Epwords and-xNNbonus markers. - Title: trailing punctuation — strip trailing colons, hyphens, commas, semicolons.
- Title: year-as-title — uses ZoneMap year disambiguation for structural handling (e.g., “2001.A.Space.Odyssey.1968”).
- Release group: language prefixes —
HUN-nIk→nIk,TrueFrench-Scarface45→Scarface45. - Episode title: Part boundary —
Property::Partstops extraction.
Intentional divergences (documented)
audio_bit_rate/video_bit_rate: singlebit_rateproperty.mimetype: trivially derived fromcontainer; redundant.
0.2.0 - 2026-02-25
Added
- TOML side effects — one pattern match can emit multiple properties
(e.g.,
DVDRip→ Source:DVD + Other:Rip). Declarative, no callbacks. - Neighbor constraints —
not_before,not_after,requires_afterfor context-aware TOML matching. - Path-segment tokenizer — tokenizes all path segments with
SegmentKind(Directory vs Filename). - Property-scoped
SegmentScope— each TOML rule set declares whether it matches directory tokens (AllSegmentsfor unambiguous tech properties,FilenameOnlyfor ambiguous ones). absolute_episodeproperty — detects absolute episode numbers (anime-style) when both S/E markers and standalone ranges coexist. 0% → 90%.film_titleproperty — extracts franchise title from-fNN-patterns (e.g., James Bond). 0% → 87.5%.alternative_titleproperty — extracts content after title boundary separators (-,--,(). 0% → 43.8%.- Title boundary detection — structural separators (
-,--,()) stop title extraction at subtitle/director content. - Single-word input handling — bare words without path/extension are treated as title.
- Italian
Stagioneseason keyword support. audio_channels.toml— standalone channel count detection (5.1, 7.1, 2ch, mono, stereo).- Subtitle language capture groups —
SUB.FR/FR-SUBpatterns extract the language code via{1}template.
Changed
- Overall pass rate: 75.1% → 77.3% (983 → 1,012 / 1,309 test cases).
fancy_regexremoved entirely — all regex is now standardregexcrate only (linear-time, ReDoS-immune). 🎉- 4 legacy matchers fully retired to TOML-only: frame_rate, container, screen_size, audio_codec.
language.rsgutted — TOML handles tokens, Rust handles only bracket/brace multi-language codes ([ENG+RU+PT],{Fr-Eng}).- 8 dead modules cleaned — removed vestigial
ValuePatterncode from video_codec, audio_profile, color_depth, country, edition, episode_details, streaming_service, video_profile. - Directory selection — title extraction now walks directories deepest-first (closest to filename preferred).
- Language zone rule improved — fixes “The Italian Job” case where “Italian” was matched as language instead of title word.
- Case-insensitive dedup for language/subtitle_language values.
- All clippy warnings resolved.
Property improvements
| Property | v0.1.2 | v0.2.0 |
|---|---|---|
| video_codec | 94.0% | 98.6% |
| screen_size | 93.7% | 98.4% |
| audio_codec | 91.2% | 97.8% |
| title | 84.6% | 87.9% |
| subtitle_language | 49.4% | 77.8% |
| language | 77.5% | 84.5% |
| episode_title | 69.7% | 70.6% |
| absolute_episode | 0% | 90.0% |
| film_title | 0% | 87.5% |
| alternative_title | 0% | 43.8% |
Dependencies
- Removed:
fancy-regex(was fallback for lookaround patterns) - All regex matching is now guaranteed linear-time via
regexcrate
0.1.2 - 2026-02-24
Added
- ARCHITECTURE.md — layered architecture design document with decision log (D001–D005) covering TOML rules, regex-only, tokenizer, and offline-only constraints.
- VideoApi property — DXVA (DirectX Video Acceleration) detection.
- Proof detection — standalone
PROOFtag in Other flags. - DOKU support — German
DOKUnow maps to “Documentary” (likeDOCU). - Español Castellano — combined pattern maps to Catalan correctly.
- DTS.HD-MA — dot-separated
DTS.HD-MAnow matches as DTS-HD.
Changed
- Overall pass rate: 61.6% → 75.1% (806 → 983 / 1,309 test cases).
- proper_count —
REALkeyword scanned case-insensitively but only in the technical zone (prevents false positives on titles like “Real Time With Bill Maher”). - All clippy warnings resolved (regex-in-loop, collapsible-if, char arrays).
- Updated ARCHITECTURE.md with architecture decisions and v0.2 roadmap.
- Updated README.md with current compatibility stats.
0.1.1 - 2026-02-22
Added
- Pre-built binaries for 5 platforms in GitHub Releases.
cargo-binstallsupport — install without compiling.
Fixed
- All clippy warnings resolved.
cargo fmtapplied consistently.- CI workflow now callable as reusable workflow.
0.1.0 - 2026-02-22
Added
- Initial release — Rust port of Python’s guessit.
- 27 property matchers covering all 49 guessit properties.
- Span-based conflict resolution engine.
- CLI binary (
hunch "filename.mkv") with JSON output. - Library API:
hunch()andhunch_with()entry points. - 140 unit tests + doc-tests.
- Validation against guessit’s 1,309-case test suite (53.6% pass rate).
- 191 Rust tests (140 unit + 22 regression + 27 integration + 2 doc-tests).
- Benchmark suite (
benches/parse.rs).
Properties at 95%+ accuracy
video_codec, container, aspect_ratio, year, edition, crc32, website, source, audio_codec, screen_size, audio_channels, date.
Properties at 100% accuracy
color_depth, streaming_service, bonus, episode_details, film.