Grader
Grader assigns a difficulty label to a puzzle by simulating a
human solver step by step.
pub trait Grader {
type Config: Default + Serialize + DeserializeOwned;
const NAME: &'static str;
fn new(config: Self::Config) -> Self;
fn grade(&self, puzzle: &Puzzle) -> GradeResult;
}
/// Cross-puzzle grader options. Per-impl Configs typically embed
/// this with `#[serde(flatten)]`.
pub struct GraderParams {
pub time_budget: Duration,
pub profile: String, // grading profile name
pub max_trial_depth: Option<u32>,
}
StandardGrader works for every genre by firing named human-pattern
techniques in priority order; the techniques are generic over rule
kind, so genres that share rule shapes share deductions. Genres
with bespoke patterns can append their own Technique impls to the
list.
Forward logic: techniques and families
A technique is a small named function over (puzzle, game) that
observes a DeductionBuilder — recording which cells the pattern
matched and what moves they force — and returns nothing. The grader
owns the builder’s lifecycle; the technique just describes what it
saw. Techniques are grouped into families by their underlying
rule shape — Sudoku families like Naked Singles, Hidden Pairs,
X-Wing; Akari families like LosCross-singles, clue-saturation,
mutual-visibility-elim. Each family is one module; each technique
is one function in it.
The Technique trait is defined above (see “Propagation engine”).
A technique declares PHASE: Propagation for cheap per-move
fixpoint inference, Pattern for the higher-cost matchers that
only run when propagation stalls. The Grading section here focuses
on Pattern-phase techniques and how the grader sequences them;
Propagation-phase techniques participate in the inner
propagation loop instead.
TARGETS examples (across both phases):
| Technique | PHASE | TARGETS | Fires on… |
|---|---|---|---|
| Naked Singles (Sudoku) | Propagation | &[RuleKind::Distinct] | Any Sudoku-shape puzzle |
| Bulb-clears-LOS (Akari) | Propagation | &[RuleKind::ExactCount] | Akari |
| Hidden Pairs / X-Wing (Sudoku) | Pattern | &[RuleKind::Distinct] | Sudoku, when propagation stalls |
| Hidden Cage Sum (Killer) | Pattern | &[RuleKind::Sum] | Killer / Sandwich / Frame — skipped on vanilla |
| Bulb-Lit Single (Akari) | Pattern | &[RuleKind::ExactCount] | Akari (clue or floor cells) |
| Trial-1 / Trial-2 / clue-cover | Pattern | &[] | Universal backward fallbacks |
The grader sets up the builder, calls fire, then converts the
builder into a Deduction (or skips the step if nothing was
recorded). Every technique skips the bookkeeping — no
DeductionBuilder::new() at the top, no into_deduction() at the
bottom, no Option<…> plumbing. Just observe.
Deduction is a library type carrying both halves of the result:
pub struct Deduction {
/// Cells the pattern matched on — used by the renderer for
/// highlighting and by traces for explanation. Doesn't affect
/// solve correctness.
pub observed: Vec<Coord>,
/// The moves the technique wants applied.
pub moves: Vec<Move>,
}
DeductionBuilder is the shared accumulator, library-defined and
used identically across every technique in every genre. It handles
three pieces of bookkeeping — deduplicating moves (multiple
observers may target the same coord), recording an observer only
when its moves were novel, and producing an Option<Deduction>
based on whether anything was new:
pub struct DeductionBuilder { /* observers, moves, seen-coord set */ }
impl DeductionBuilder {
pub fn new() -> Self;
/// Record that `observer` produced these `moves`. Moves whose
/// target coord was already added by an earlier observer are
/// dropped. If at least one move actually landed, `observer` is
/// appended to the observed list; otherwise this is a no-op.
pub fn observe(
&mut self,
observer: Coord,
moves: impl IntoIterator<Item = Move>,
);
/// Pair-observer variant — for techniques whose deduction comes
/// from two interacting cells (e.g., paired clues, X-Wing's two
/// rows). Both observers land in the list iff the pair produced
/// novel moves.
pub fn observe_pair(
&mut self,
a: Coord, b: Coord,
moves: impl IntoIterator<Item = Move>,
);
/// `Some(Deduction)` if any move was recorded, else `None`. Called
/// by the grader after `fire`, never by the technique.
pub fn into_deduction(self) -> Option<Deduction>;
}
A canonical technique impl: just the pattern, nothing else.
fn fire<'g, G: Game<'g>>(
puzzle: &Puzzle, game: &G, b: &mut DeductionBuilder,
) {
for bulb in bulbs(game) {
b.observe(bulb, los_cross_from(puzzle, bulb)
.filter(|&c| game.candidates(c).any(|m| m == Mark::Binary(true)))
.map(|c| Move::Eliminate { coord: c, mark: Mark::Binary(true) }));
}
}
The grader calls each registered technique’s fire against a fresh
DeductionBuilder, then drains it with into_deduction():
// Inside the grader's forward-logic loop, once per technique:
let mut b = DeductionBuilder::new();
TechniqueX::fire(puzzle, game, &mut b);
if let Some(d) = b.into_deduction() {
/* apply d.moves, record one SolveStep per move */
}
When grading begins, the grader walks puzzle.constraints() once
to compute the set of present RuleKinds, then filters the
genre’s registered technique list down to those whose TARGETS
are either empty or share at least one kind with that set. The
filtered list is iterated in registration order until none fire.
Each fire produces one or more SolveSteps in the trace, tagged
with the technique name.
This is the “concatenating constraints” pitch from earlier paying
its dues at runtime. A vanilla Sudoku grade never invokes any
killer-cage technique — they’re filtered out before the inner
loop. Adding Killer cages to a puzzle expands its constraint set,
which expands the filter’s positive set, which lets the
killer-cage techniques back in. The cost of an inapplicable
technique is one HashSet::contains check at puzzle-load time,
not one fire-and-bail per inner-loop iteration.
Backward logic: when forward stalls
When no technique fires, the grader escalates through fallbacks in order. Each fallback is itself a “technique” (named, recorded in the trace, scorable in the profile) — the difference between forward and backward is just which bucket the technique lives in.
-
Clue-cover analysis (
clue-cover). Walks unsatisfied clues and looks for minimum-cover deductions structural to the constraint shape: a clue needing N more bulbs whose remaining LOS has exactly N candidates forces all of them; two clues sharing candidates forcing each other; anExactCountwhose region’s remaining capacity equals its target. Polynomial-time backward reasoning — cheap, no search. -
Trial search at depth 1 (
trial-1). Pick a candidate move, commit it, run forward + clue-cover. If it reaches contradiction, the opposite move is forced. -
Trial search at iterating depths (
trial-2,trial-3, …). Iterative deepening up to a configurable wall-clock budget (default 10s). At depthd, the grader runs forward + clue-cover- recursive trial-search at depths
< d. No artificial depth cap beyond the time budget.
- recursive trial-search at depths
When the budget expires, GradeResult::status = TimedOut and the
trace is whatever was reached up to that point. The puzzle may still
be solvable by deduction at deeper trial — these get bucketed for
study but filtered out of curated sets.
Tiers and scores live in profiles, not on techniques
Techniques don’t carry their own tier or score. Both live in a grading profile — a runtime-loadable table mapping technique name → tier → score. The profile is the unit of tuning.
pub struct GradingProfile {
pub name: String,
pub entries: Vec<GradingEntry>,
}
pub struct GradingEntry {
pub technique: String, // kebab-case technique name
pub tier: Tier,
pub score: u64,
}
pub enum Tier { Trivial, Easy, Medium, Hard, Extreme }
GradingProfile is Serialize + Deserialize and loads from TOML.
The thesis is “difficulty is relative; recalibration must be cheap”
— compiling Rust to retune a curve would be the opposite of cheap.
Profiles live in two places, with different cadences:
- Shipped defaults, embedded via
include_str!at build time frompenmark/core/src/puzzles/<genre>/profiles/<name>.toml. Editing one of these does require a rebuild — they’re a maintainer task, the curve we ship by default. Source-of-truth in the repo so they version with the techniques they reference. - User overrides, runtime-loaded from
~/.config/penmark/profiles/<genre>/<name>.toml(or--profile-file=<path>on the CLI). These take precedence over the embedded default of the same name. Editing one is a TOML save, no compile, and the nextpenmark gradepicks it up.
The grader walks the user dir first, falling back to the embedded default. So the everyday calibration loop — tweak weights, re-grade the db, eyeball the distribution — runs entirely against user TOMLs. The embedded copies only change when we ship a new release with a different default curve.
penmark profile-init <genre> writes the embedded default to the
user dir, giving you a starting TOML to edit instead of building
one from scratch.
A profile TOML reads like:
name = "nikoli-strict"
[[entries]]
technique = "naked-singles"
tier = "trivial"
score = 1
[[entries]]
technique = "hidden-pairs"
tier = "easy"
score = 4
[[entries]]
technique = "x-wing"
tier = "hard"
score = 25
Each genre ships one or more default profiles in its source tree
(under puzzles/<genre>/profiles/) so out-of-the-box runs work
without any user setup. Editing a default means editing a TOML file
in the repo, not Rust source.
Why outside the technique impl: difficulty is relative. Calling
naked-singles “Trivial” only means something next to where x-wing
lands. Pulling tier and score onto each technique would force every
recalibration to ripple through impls; keeping them in a TOML file
lets you re-cut the whole curve, save the file, re-grade, repeat —
no compile in the loop.
penmark profile-eval <genre> --profile=<name> runs the active
profile against the db and prints the difficulty distribution, so
calibration has a tight feedback loop without any code changes.
Profile loading fails loudly on mismatch. TOML technique names
and Rust Technique::NAME constants are decoupled, so a rename in
code would silently invalidate stale TOML entries. The grader
validates the profile against the genre’s registered techniques at
load time and errors immediately with the offending name(s) and the
list of valid names. Same check on a registered technique that has
no entry. No silent fallbacks — fix the profile or the rename.
Adding a new technique means: write the Rust impl, register it, add an entry to the default profile TOML. Three edits.
GradeResult
pub struct GradeResult {
/// How grading concluded. `Tier` is purely difficulty; operational
/// state lives here.
pub status: GradeStatus,
/// Highest tier any step reached. Meaningful in every status —
/// for `Solved` it's the puzzle's final difficulty; for `TimedOut`
/// or `Unsolvable` it's the deepest tier reached during the
/// attempt.
pub difficulty: Tier,
/// Sum of per-step scores under the active grading profile.
pub hard_score: u64,
/// Ordered trace of the steps the grader took.
pub steps: Vec<SolveStep>,
}
pub enum GradeStatus {
/// Forward + backward logic completed; puzzle solved.
Solved,
/// The grader's time budget expired before reaching a solution.
/// The trace is whatever was reached.
TimedOut,
/// The grader proved no solution exists (backward logic forced a
/// contradiction at the root).
Unsolvable,
}
pub struct SolveStep {
pub technique: &'static str, // technique name (forward or backward)
pub observed: Vec<Coord>, // cells the pattern matched on
pub mv: Move, // the deduction's move
pub depth: u32, // 0 forward, ≥1 trial search
}
Tier is purely a difficulty bucket; “we ran out of budget” is
GradeStatus::TimedOut, “the puzzle is broken” is
GradeStatus::Unsolvable. Both persist alongside puzzles in the
db, indexed separately for curation queries — “show me solved
Hard puzzles” and “show me puzzles that timed out at any tier” are
two different filters.