Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Grader

Grader assigns a difficulty label to a puzzle by simulating a human solver step by step.

pub trait Grader {
    type Config: Default + Serialize + DeserializeOwned;
    const NAME: &'static str;
    fn new(config: Self::Config) -> Self;
    fn grade(&self, puzzle: &Puzzle) -> GradeResult;
}

/// Cross-puzzle grader options. Per-impl Configs typically embed
/// this with `#[serde(flatten)]`.
pub struct GraderParams {
    pub time_budget: Duration,
    pub profile: String,         // grading profile name
    pub max_trial_depth: Option<u32>,
}

StandardGrader works for every genre by firing named human-pattern techniques in priority order; the techniques are generic over rule kind, so genres that share rule shapes share deductions. Genres with bespoke patterns can append their own Technique impls to the list.

Forward logic: techniques and families

A technique is a small named function over (puzzle, game) that observes a DeductionBuilder — recording which cells the pattern matched and what moves they force — and returns nothing. The grader owns the builder’s lifecycle; the technique just describes what it saw. Techniques are grouped into families by their underlying rule shape — Sudoku families like Naked Singles, Hidden Pairs, X-Wing; Akari families like LosCross-singles, clue-saturation, mutual-visibility-elim. Each family is one module; each technique is one function in it.

The Technique trait is defined above (see “Propagation engine”). A technique declares PHASE: Propagation for cheap per-move fixpoint inference, Pattern for the higher-cost matchers that only run when propagation stalls. The Grading section here focuses on Pattern-phase techniques and how the grader sequences them; Propagation-phase techniques participate in the inner propagation loop instead.

TARGETS examples (across both phases):

TechniquePHASETARGETSFires on…
Naked Singles (Sudoku)Propagation&[RuleKind::Distinct]Any Sudoku-shape puzzle
Bulb-clears-LOS (Akari)Propagation&[RuleKind::ExactCount]Akari
Hidden Pairs / X-Wing (Sudoku)Pattern&[RuleKind::Distinct]Sudoku, when propagation stalls
Hidden Cage Sum (Killer)Pattern&[RuleKind::Sum]Killer / Sandwich / Frame — skipped on vanilla
Bulb-Lit Single (Akari)Pattern&[RuleKind::ExactCount]Akari (clue or floor cells)
Trial-1 / Trial-2 / clue-coverPattern&[]Universal backward fallbacks

The grader sets up the builder, calls fire, then converts the builder into a Deduction (or skips the step if nothing was recorded). Every technique skips the bookkeeping — no DeductionBuilder::new() at the top, no into_deduction() at the bottom, no Option<…> plumbing. Just observe.

Deduction is a library type carrying both halves of the result:

pub struct Deduction {
    /// Cells the pattern matched on — used by the renderer for
    /// highlighting and by traces for explanation. Doesn't affect
    /// solve correctness.
    pub observed: Vec<Coord>,
    /// The moves the technique wants applied.
    pub moves: Vec<Move>,
}

DeductionBuilder is the shared accumulator, library-defined and used identically across every technique in every genre. It handles three pieces of bookkeeping — deduplicating moves (multiple observers may target the same coord), recording an observer only when its moves were novel, and producing an Option<Deduction> based on whether anything was new:

pub struct DeductionBuilder { /* observers, moves, seen-coord set */ }

impl DeductionBuilder {
    pub fn new() -> Self;

    /// Record that `observer` produced these `moves`. Moves whose
    /// target coord was already added by an earlier observer are
    /// dropped. If at least one move actually landed, `observer` is
    /// appended to the observed list; otherwise this is a no-op.
    pub fn observe(
        &mut self,
        observer: Coord,
        moves: impl IntoIterator<Item = Move>,
    );

    /// Pair-observer variant — for techniques whose deduction comes
    /// from two interacting cells (e.g., paired clues, X-Wing's two
    /// rows). Both observers land in the list iff the pair produced
    /// novel moves.
    pub fn observe_pair(
        &mut self,
        a: Coord, b: Coord,
        moves: impl IntoIterator<Item = Move>,
    );

    /// `Some(Deduction)` if any move was recorded, else `None`. Called
    /// by the grader after `fire`, never by the technique.
    pub fn into_deduction(self) -> Option<Deduction>;
}

A canonical technique impl: just the pattern, nothing else.

fn fire<'g, G: Game<'g>>(
    puzzle: &Puzzle, game: &G, b: &mut DeductionBuilder,
) {
    for bulb in bulbs(game) {
        b.observe(bulb, los_cross_from(puzzle, bulb)
            .filter(|&c| game.candidates(c).any(|m| m == Mark::Binary(true)))
            .map(|c| Move::Eliminate { coord: c, mark: Mark::Binary(true) }));
    }
}

The grader calls each registered technique’s fire against a fresh DeductionBuilder, then drains it with into_deduction():

// Inside the grader's forward-logic loop, once per technique:
let mut b = DeductionBuilder::new();
TechniqueX::fire(puzzle, game, &mut b);
if let Some(d) = b.into_deduction() {
    /* apply d.moves, record one SolveStep per move */
}

When grading begins, the grader walks puzzle.constraints() once to compute the set of present RuleKinds, then filters the genre’s registered technique list down to those whose TARGETS are either empty or share at least one kind with that set. The filtered list is iterated in registration order until none fire. Each fire produces one or more SolveSteps in the trace, tagged with the technique name.

This is the “concatenating constraints” pitch from earlier paying its dues at runtime. A vanilla Sudoku grade never invokes any killer-cage technique — they’re filtered out before the inner loop. Adding Killer cages to a puzzle expands its constraint set, which expands the filter’s positive set, which lets the killer-cage techniques back in. The cost of an inapplicable technique is one HashSet::contains check at puzzle-load time, not one fire-and-bail per inner-loop iteration.

Backward logic: when forward stalls

When no technique fires, the grader escalates through fallbacks in order. Each fallback is itself a “technique” (named, recorded in the trace, scorable in the profile) — the difference between forward and backward is just which bucket the technique lives in.

  1. Clue-cover analysis (clue-cover). Walks unsatisfied clues and looks for minimum-cover deductions structural to the constraint shape: a clue needing N more bulbs whose remaining LOS has exactly N candidates forces all of them; two clues sharing candidates forcing each other; an ExactCount whose region’s remaining capacity equals its target. Polynomial-time backward reasoning — cheap, no search.

  2. Trial search at depth 1 (trial-1). Pick a candidate move, commit it, run forward + clue-cover. If it reaches contradiction, the opposite move is forced.

  3. Trial search at iterating depths (trial-2, trial-3, …). Iterative deepening up to a configurable wall-clock budget (default 10s). At depth d, the grader runs forward + clue-cover

    • recursive trial-search at depths < d. No artificial depth cap beyond the time budget.

When the budget expires, GradeResult::status = TimedOut and the trace is whatever was reached up to that point. The puzzle may still be solvable by deduction at deeper trial — these get bucketed for study but filtered out of curated sets.

Tiers and scores live in profiles, not on techniques

Techniques don’t carry their own tier or score. Both live in a grading profile — a runtime-loadable table mapping technique name → tier → score. The profile is the unit of tuning.

pub struct GradingProfile {
    pub name: String,
    pub entries: Vec<GradingEntry>,
}

pub struct GradingEntry {
    pub technique: String,        // kebab-case technique name
    pub tier: Tier,
    pub score: u64,
}

pub enum Tier { Trivial, Easy, Medium, Hard, Extreme }

GradingProfile is Serialize + Deserialize and loads from TOML. The thesis is “difficulty is relative; recalibration must be cheap” — compiling Rust to retune a curve would be the opposite of cheap.

Profiles live in two places, with different cadences:

  • Shipped defaults, embedded via include_str! at build time from penmark/core/src/puzzles/<genre>/profiles/<name>.toml. Editing one of these does require a rebuild — they’re a maintainer task, the curve we ship by default. Source-of-truth in the repo so they version with the techniques they reference.
  • User overrides, runtime-loaded from ~/.config/penmark/profiles/<genre>/<name>.toml (or --profile-file=<path> on the CLI). These take precedence over the embedded default of the same name. Editing one is a TOML save, no compile, and the next penmark grade picks it up.

The grader walks the user dir first, falling back to the embedded default. So the everyday calibration loop — tweak weights, re-grade the db, eyeball the distribution — runs entirely against user TOMLs. The embedded copies only change when we ship a new release with a different default curve.

penmark profile-init <genre> writes the embedded default to the user dir, giving you a starting TOML to edit instead of building one from scratch.

A profile TOML reads like:

name = "nikoli-strict"

[[entries]]
technique = "naked-singles"
tier      = "trivial"
score     = 1

[[entries]]
technique = "hidden-pairs"
tier      = "easy"
score     = 4

[[entries]]
technique = "x-wing"
tier      = "hard"
score     = 25

Each genre ships one or more default profiles in its source tree (under puzzles/<genre>/profiles/) so out-of-the-box runs work without any user setup. Editing a default means editing a TOML file in the repo, not Rust source.

Why outside the technique impl: difficulty is relative. Calling naked-singles “Trivial” only means something next to where x-wing lands. Pulling tier and score onto each technique would force every recalibration to ripple through impls; keeping them in a TOML file lets you re-cut the whole curve, save the file, re-grade, repeat — no compile in the loop.

penmark profile-eval <genre> --profile=<name> runs the active profile against the db and prints the difficulty distribution, so calibration has a tight feedback loop without any code changes.

Profile loading fails loudly on mismatch. TOML technique names and Rust Technique::NAME constants are decoupled, so a rename in code would silently invalidate stale TOML entries. The grader validates the profile against the genre’s registered techniques at load time and errors immediately with the offending name(s) and the list of valid names. Same check on a registered technique that has no entry. No silent fallbacks — fix the profile or the rename.

Adding a new technique means: write the Rust impl, register it, add an entry to the default profile TOML. Three edits.

GradeResult

pub struct GradeResult {
    /// How grading concluded. `Tier` is purely difficulty; operational
    /// state lives here.
    pub status: GradeStatus,
    /// Highest tier any step reached. Meaningful in every status —
    /// for `Solved` it's the puzzle's final difficulty; for `TimedOut`
    /// or `Unsolvable` it's the deepest tier reached during the
    /// attempt.
    pub difficulty: Tier,
    /// Sum of per-step scores under the active grading profile.
    pub hard_score: u64,
    /// Ordered trace of the steps the grader took.
    pub steps: Vec<SolveStep>,
}

pub enum GradeStatus {
    /// Forward + backward logic completed; puzzle solved.
    Solved,
    /// The grader's time budget expired before reaching a solution.
    /// The trace is whatever was reached.
    TimedOut,
    /// The grader proved no solution exists (backward logic forced a
    /// contradiction at the root).
    Unsolvable,
}

pub struct SolveStep {
    pub technique: &'static str, // technique name (forward or backward)
    pub observed: Vec<Coord>,    // cells the pattern matched on
    pub mv: Move,                // the deduction's move
    pub depth: u32,              // 0 forward, ≥1 trial search
}

Tier is purely a difficulty bucket; “we ran out of budget” is GradeStatus::TimedOut, “the puzzle is broken” is GradeStatus::Unsolvable. Both persist alongside puzzles in the db, indexed separately for curation queries — “show me solved Hard puzzles” and “show me puzzles that timed out at any tier” are two different filters.