Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

External solvers

penmark ships its own FastSolver (see the Fast solver chapter for internals) and treats external solvers as feature-gated oracles — useful for cross-validation, useful as fallbacks for genres where writing a domain-specific propagator is hard, but not on the production solve path.

This doc captures what we tried and what we learned, since most of it isn’t obvious from the source.

Headline numbers

100×100 Akari (release build, single thread, scaffold-derived clued puzzles at 0.42 wall density, cached at /tmp/penmark_compare_100x100.json):

solverwall-clockgap to FastSolver
FastSolver (production)~10 ms
OrToolsSolver (CP-SAT, tuned)~28 ms2.8×
OrToolsSolver (CP-SAT, defaults)~52 ms5.2×
OrToolsSolver (initial port)~115 ms11.4×

compare_ortools (in core/examples/) reproduces this table; profile_ortools breaks the OR-tools call into model-build vs CP-SAT-solve and runs a 30+ parameter sweep so the next round of tuning has a measurement harness ready.

The takeaway is structural: FastSolver is a hand-rolled Akari propagator (per-cell lit_by counter, LOS-cross-aware AC-3, trial-1 with MRV, bitmask state). A general CP solver can’t replicate that without knowing about the puzzle’s lighting structure; it just sees clauses and cardinality constraints. The “10-100×” pitch from CP-SAT marketing assumes a naive backtracker as the baseline, not a tuned propagator.

OR-tools journey

Initial port: 115 ms at 100×100

The first cut used a one-hot encoding: one BoolVar per (coord, mark) pair with add_exactly_one over each coord. For Akari (binary domain) that’s 2× the variables it needs. CP-SAT’s default parameters were also fighting us — the portfolio of subsolvers, presolve, probing, and linearization all charge fixed setup costs that don’t pay off on a 7,000-constraint Boolean satisfaction model.

Win 1: single-bool encoding for binary domains (115 ms → 45 ms, 2.5×)

When every coord shares a 2-mark domain, allocate one BoolVar per coord and use !v as the indicator for the negative mark. BoolVar implements Not so the negation is free. No exactly_one constraints, half the variables. OneHot stays as a fallback for wider domains (Sudoku-style numeric marks).

Win 2: SatParameters tuning (45 ms → 33 ms)

Methodically benchmarked each top-level parameter against the 100×100 corpus. The CP-SAT primer is emphatic about not modifying top-level params — the portfolio of subsolvers is what makes CP-SAT fast — but that advice is tuned for optimization workloads. For pure satisfaction at our scale:

paramsavings
cp_model_presolve = false~7 ms
cp_model_probing_level = 0~3 ms
linearization_level = 0~2 ms
symmetry_level = 0~3 ms
use_sat_inprocessing = false~1 ms
use_phase_saving = falsesmall
stop_after_first_solution = truesmall
num_search_workers = 1(avoids 8-worker overhead)

These are deliberately against the primer’s guidance because at our scale the portfolio’s fixed setup cost outweighs its wins.

Win 3: specialised constraint encodings (33 ms → 28 ms)

The CP-SAT modelling page points out boolean constraints (add_and, add_or, add_at_most_one, add_exactly_one) are recognised as native clause forms by the SAT presolve, while add_eq on a linear sum routes through the LP relaxation — costlier and less helpful for boolean inputs. Specialising ExactCount { n } over k literals:

  • n = 0add_and(negations)
  • n = 1add_exactly_one
  • n = k - 1add_exactly_one(negations)
  • n = kadd_and(literals)
  • otherwise → linear add_eq (the prior path)

For Akari this hits the 0/1/2/3/4 clue cases that dominate the constraint count.

Things that didn’t work

subsolvers = ["fixed"] — went down to 21 ms but broke 1×1 (returned empty solution sets). Single-strategy fixed-order DFS skips edge cases the default subsolver handles.

Warm-start via penmark’s Propagator — left as opt-in OrToolsSolverConfig::warm_start (default false). The propagator covers ~99.8% of cells in 5 ms at 100×100. Hinting those to CP-SAT helps at small sizes (5×5–17×17, ~10-20% faster) but tanks at 50×50+ (5× regression): the missing 0.2% (10 cells out of 5832) trips up CP-SAT’s hint-driven search badly. Verified the propagator’s deductions are correct (0/5822 disagreements with FastSolver), so it’s a CP-SAT behaviour issue with high-coverage-but-incomplete hints. Tried add_and pin, fix_variables_to_their_hinted_value, presolve toggling — none recovered the regression. Useful at small sizes; opt-in.

Pinning warm-start cells via add_and instead of hint — 200 ms at 100×100, 6× regression. Without presolve to simplify them, the unit clauses from a single add_and([v1, v2, ..., v5822]) slow down CP-SAT’s search loop. With presolve on, still regressed because the model becomes disconnected fragments that the LP doesn’t help with.

Disabling cp_model_use_sat_presolve only — wash. The other presolve phases dominate.

disable_constraint_expansion — crashed with SolutionIsFeasible assertion failures. Constraint expansion is required for our encoding.

Changing search branchingLP_SEARCH, PSEUDO_COST_SEARCH, PORTFOLIO_WITH_QUICK_RESTART_SEARCH were all within 0.5 ms of each other. AUTOMATIC_SEARCH (default) wins by being marginally faster on average and not having edge cases.

linearization_level = 0 + lighting=add_ge — 33 ms vs 28 ms with add_or. add_or is recognised as a clause; add_ge(sum, 1) goes through the linear path even with linearization off.

What about pure SAT?

Briefly explored: a SatSolver that translated the same constraint list to CNF + cardinality constraints via the rustsat library and fed it to BatSat / CaDiCaL / Kissat / MiniSat. Numbers at 100×100:

backendwall-clockgap to FastSolver
Kissat~16 ms1.6×
BatSat~18 ms1.8×
MiniSat~18 ms1.8×
CaDiCaL~25 ms2.4×

Pure SAT genuinely beat OR-tools’ general-purpose CP solver on this problem class — Akari is essentially Boolean satisfaction with cardinality, and a modern CDCL solver (Kissat especially) plus rustsat’s totalizer cardinality encoding landed within 60% of the hand-rolled engine. CaDiCaL underperformed despite its reputation — the problem is too small for its strengths (incremental clause learning across many calls, lazy clause minimisation).

The SatSolver lived in the codebase for a few commits before being removed to keep the maintenance surface small; the experiment validated the “SAT-as-IR” approach for genres where we’d otherwise have to write a domain-specific propagator (Slitherlink’s “single closed loop”, Nurikabe’s connectivity, etc.). If we ever bring it back, the encoding is documented in git history (search for algorithm::rustsat).

When external solvers would actually win

At our scale they don’t. Scenarios where they would:

  • Optimization variants. CP-SAT’s branch-and-bound shines with objectives. min unlit cells, max bulbs — the LP relaxation actually earns its keep. FastSolver has no objective machinery.
  • Genres with hard-to-propagate constraints. Slitherlink’s “single closed loop”, Nurikabe’s “all white cells connected”, Yajilin’s path rules. Writing a CP propagator for these is months of work; CP-SAT handles them via lazy clauses with reasonable performance even on first try. Pure SAT (Kissat etc.) was particularly strong here since these genres reduce naturally to CDCL with cardinality constraints — if the SAT path becomes load-bearing, that’s the point to bring rustsat back.
  • Really hard hand-crafted puzzles. Where FastSolver’s trial-1 stalls and DFS branching explodes. CDCL clause learning starts paying off. We haven’t actually hit one in the corpus; if we do, escalation to the external solver becomes useful.
  • Multi-genre infrastructure that doesn’t justify per-genre tuning. If the goal is “solve any reasonable CSP in ≤ 1 s”, CP-SAT or rustsat are the right call. penmark’s bet is the opposite: domain knowledge per genre, with the external solvers as backstops.

Maintenance posture

  • OrToolsSolver lives behind --features ortools. Default builds don’t pull in cp_sat (which links a system OR-tools install plus libprotobuf).
  • encode_constraint is exhaustive on Rule — adding a new variant is a compile error here, not a panic at run time. Every variant in the enum has either an eager encoding (boolean-friendly rules) or a lazy cut (global rules).
  • The criterion akari/solver and slitherlink/solver groups under the consolidated penmark bench include OR-tools as a sibling series when the ortools feature is on, so head-to-head ratios show up in the report alongside FastSolver.
  • Per-puzzle cross-check tests (ortools_agrees_with_fast_* in each genre’s tests.rs) make divergence a unit-test failure, not a silent corpus drift.

Build setup

cargo … --features ortools Just Works on Apple Silicon Homebrew:

brew install or-tools
cargo bench -p penmark-core --features ortools --bench penmark -- akari/solver
cargo bench -p penmark-core --features ortools --bench penmark -- slitherlink/solver

The wiring:

  • penmark/.cargo/config.toml sets ORTOOLS_PREFIX (read by cp_sat’s build.rs) and CXXFLAGS (read by cc-rs when cp_sat compiles its C++ wrapper, so abseil + protobuf headers are reachable).
  • core/build.rs adds the protobuf link line that cp_sat doesn’t emit itself, plus rpaths for both dylibs so binaries don’t need DYLD_LIBRARY_PATH at run time.

Both auto-detect via brew --prefix <pkg> if the env var isn’t set, so the hardcoded /opt/homebrew/... defaults are just a fallback. To use a non-Homebrew install (Linux, Intel Mac, source build), export ORTOOLS_PREFIX / PROTOBUF_PREFIX / CXXFLAGS in your shell — those override the [env] defaults.

If cargo build --features ortools fails with 'absl/...' file not found, your CXXFLAGS doesn’t include the abseil headers. If it fails with Undefined symbols ... google::protobuf::..., the build script didn’t find protobuf — set PROTOBUF_PREFIX.

Rule encoding strategy

CP-SAT 0.4’s Rust bindings expose boolean primitives (add_or, add_at_most_one, add_exactly_one), linear constraints over multi-interval domains (add_linear_constraint), and reified constraints via only_enforce_if. They do not expose add_circuit, add_table, or add_inverse — which leaves the “single closed loop” / “all marked cells connected” family of rules without a clean direct encoding.

The compromise: eager + lazy cuts.

Eager encodings (CP-SAT enforces directly):

Rule variantEncoding
Distinctper-mark add_at_most_one over region
Sumadd_eq over Σ value · indicator
Increasingpairwise add_lt on per-coord value expressions
MinDifferencereified add_or([pos, neg]) over pos → a-b≥n, neg → b-a≥n
AtLeastOneadd_or
AtMost { n }add_at_most_one (n=1) or add_le
ExactCount { n }specialised: add_and (n∈{0,k}), add_exactly_one (n∈{1,k-1}), add_eq otherwise
DegreeIn { allowed }add_linear_constraint(sum, allowed-as-singleton-intervals)
NoTwoByTwoper 2×2 block, add_or of 4 negated indicators
Polyominoper coord with mark, add_or([!self, neighbor_indicators…])
Decidedno-op (Binary singleton / OneHot exactly_one already enforces)
CellMinper region cell, add_ge over reach-set indicators (or add_or for min=1)
LoopFreezesCountsper (cells, target), eager ExactCount over cells

Lazy cuts (CP-SAT solves, post-check, add add_or no-good clause if violated, re-solve until satisfied or infeasible):

  • Connected { mark } — find connected components of marked coords; if more than 1, cut the smallest component as “this exact set must not all be marked together.”
  • Path { mark, closed } — partition drawn-edge subgraph into connected components; for closed, each must be a degree-2 cycle and there must be exactly one; for open, exactly one component with two degree-1 endpoints. Cut violating shapes.
  • BoundedSize { mark, max } — cut any component larger than max.
  • ValueGroupedSize — for each numerically-marked coord, BFS over same-value neighbors; cut if size ≠ value.

The lazy-cut loop terminates because each cut shrinks the feasible boolean assignment space by at least one. Worst case is exponential, but in practice for well-constrained puzzles convergence is a handful of iterations.

The slitherlink bench shows the lazy path is workable: 4×4 puzzles solve in ~600µs–2ms, 6×6 in ~3-7ms (vs FastSolver’s ~100µs–1ms on 4×4, ~1-13ms on 6×6 — comparable, and OR-tools even beats FastSolver on a hard 6×6 case).

When a future genre needs a rule whose lazy-check encoding is prohibitively slow at the target sizes, that’s the signal to reach for a different external solver (rustsat with Kissat, say) rather than to push the lazy-cut harder.