External solvers
penmark ships its own FastSolver (see the Fast solver
chapter for internals) and treats
external solvers as feature-gated oracles — useful for cross-validation,
useful as fallbacks for genres where writing a domain-specific propagator is
hard, but not on the production solve path.
This doc captures what we tried and what we learned, since most of it isn’t obvious from the source.
Headline numbers
100×100 Akari (release build, single thread, scaffold-derived clued puzzles
at 0.42 wall density, cached at /tmp/penmark_compare_100x100.json):
| solver | wall-clock | gap to FastSolver |
|---|---|---|
FastSolver (production) | ~10 ms | 1× |
OrToolsSolver (CP-SAT, tuned) | ~28 ms | 2.8× |
OrToolsSolver (CP-SAT, defaults) | ~52 ms | 5.2× |
OrToolsSolver (initial port) | ~115 ms | 11.4× |
compare_ortools (in core/examples/) reproduces this table; profile_ortools
breaks the OR-tools call into model-build vs CP-SAT-solve and runs a 30+
parameter sweep so the next round of tuning has a measurement harness ready.
The takeaway is structural: FastSolver is a hand-rolled Akari propagator
(per-cell lit_by counter, LOS-cross-aware AC-3, trial-1 with MRV, bitmask
state). A general CP solver can’t replicate that without knowing about the
puzzle’s lighting structure; it just sees clauses and cardinality constraints.
The “10-100×” pitch from CP-SAT marketing assumes a naive backtracker as the
baseline, not a tuned propagator.
OR-tools journey
Initial port: 115 ms at 100×100
The first cut used a one-hot encoding: one BoolVar per (coord, mark) pair
with add_exactly_one over each coord. For Akari (binary domain) that’s 2×
the variables it needs. CP-SAT’s default parameters were also fighting us —
the portfolio of subsolvers, presolve, probing, and linearization all charge
fixed setup costs that don’t pay off on a 7,000-constraint Boolean satisfaction
model.
Win 1: single-bool encoding for binary domains (115 ms → 45 ms, 2.5×)
When every coord shares a 2-mark domain, allocate one BoolVar per coord and
use !v as the indicator for the negative mark. BoolVar implements Not so
the negation is free. No exactly_one constraints, half the variables.
OneHot stays as a fallback for wider domains (Sudoku-style numeric marks).
Win 2: SatParameters tuning (45 ms → 33 ms)
Methodically benchmarked each top-level parameter against the 100×100 corpus. The CP-SAT primer is emphatic about not modifying top-level params — the portfolio of subsolvers is what makes CP-SAT fast — but that advice is tuned for optimization workloads. For pure satisfaction at our scale:
| param | savings |
|---|---|
cp_model_presolve = false | ~7 ms |
cp_model_probing_level = 0 | ~3 ms |
linearization_level = 0 | ~2 ms |
symmetry_level = 0 | ~3 ms |
use_sat_inprocessing = false | ~1 ms |
use_phase_saving = false | small |
stop_after_first_solution = true | small |
num_search_workers = 1 | (avoids 8-worker overhead) |
These are deliberately against the primer’s guidance because at our scale the portfolio’s fixed setup cost outweighs its wins.
Win 3: specialised constraint encodings (33 ms → 28 ms)
The CP-SAT modelling page points out boolean constraints (add_and,
add_or, add_at_most_one, add_exactly_one) are recognised as native
clause forms by the SAT presolve, while add_eq on a linear sum routes
through the LP relaxation — costlier and less helpful for boolean inputs.
Specialising ExactCount { n } over k literals:
n = 0→add_and(negations)n = 1→add_exactly_onen = k - 1→add_exactly_one(negations)n = k→add_and(literals)- otherwise → linear
add_eq(the prior path)
For Akari this hits the 0/1/2/3/4 clue cases that dominate the constraint count.
Things that didn’t work
subsolvers = ["fixed"] — went down to 21 ms but broke 1×1 (returned
empty solution sets). Single-strategy fixed-order DFS skips edge cases the
default subsolver handles.
Warm-start via penmark’s Propagator — left as opt-in
OrToolsSolverConfig::warm_start (default false). The propagator covers
~99.8% of cells in 5 ms at 100×100. Hinting those to CP-SAT helps at small
sizes (5×5–17×17, ~10-20% faster) but tanks at 50×50+ (5× regression):
the missing 0.2% (10 cells out of 5832) trips up CP-SAT’s hint-driven
search badly. Verified the propagator’s deductions are correct (0/5822
disagreements with FastSolver), so it’s a CP-SAT behaviour issue with
high-coverage-but-incomplete hints. Tried add_and pin,
fix_variables_to_their_hinted_value, presolve toggling — none recovered
the regression. Useful at small sizes; opt-in.
Pinning warm-start cells via add_and instead of hint — 200 ms at
100×100, 6× regression. Without presolve to simplify them, the unit clauses
from a single add_and([v1, v2, ..., v5822]) slow down CP-SAT’s search
loop. With presolve on, still regressed because the model becomes
disconnected fragments that the LP doesn’t help with.
Disabling cp_model_use_sat_presolve only — wash. The other presolve
phases dominate.
disable_constraint_expansion — crashed with
SolutionIsFeasible assertion failures. Constraint expansion is required
for our encoding.
Changing search branching — LP_SEARCH, PSEUDO_COST_SEARCH,
PORTFOLIO_WITH_QUICK_RESTART_SEARCH were all within 0.5 ms of each other.
AUTOMATIC_SEARCH (default) wins by being marginally faster on average and
not having edge cases.
linearization_level = 0 + lighting=add_ge — 33 ms vs 28 ms with
add_or. add_or is recognised as a clause; add_ge(sum, 1) goes through
the linear path even with linearization off.
What about pure SAT?
Briefly explored: a SatSolver that translated the same constraint list to
CNF + cardinality constraints via the rustsat
library and fed it to BatSat / CaDiCaL / Kissat / MiniSat. Numbers at 100×100:
| backend | wall-clock | gap to FastSolver |
|---|---|---|
| Kissat | ~16 ms | 1.6× |
| BatSat | ~18 ms | 1.8× |
| MiniSat | ~18 ms | 1.8× |
| CaDiCaL | ~25 ms | 2.4× |
Pure SAT genuinely beat OR-tools’ general-purpose CP solver on this problem class — Akari is essentially Boolean satisfaction with cardinality, and a modern CDCL solver (Kissat especially) plus rustsat’s totalizer cardinality encoding landed within 60% of the hand-rolled engine. CaDiCaL underperformed despite its reputation — the problem is too small for its strengths (incremental clause learning across many calls, lazy clause minimisation).
The SatSolver lived in the codebase for a few commits before being removed
to keep the maintenance surface small; the experiment validated the
“SAT-as-IR” approach for genres where we’d otherwise have to write a
domain-specific propagator (Slitherlink’s “single closed loop”, Nurikabe’s
connectivity, etc.). If we ever bring it back, the encoding is documented
in git history (search for algorithm::rustsat).
When external solvers would actually win
At our scale they don’t. Scenarios where they would:
- Optimization variants. CP-SAT’s branch-and-bound shines with
objectives.
min unlit cells,max bulbs— the LP relaxation actually earns its keep.FastSolverhas no objective machinery. - Genres with hard-to-propagate constraints. Slitherlink’s “single
closed loop”, Nurikabe’s “all white cells connected”, Yajilin’s path
rules. Writing a CP propagator for these is months of work; CP-SAT
handles them via lazy clauses with reasonable performance even on
first try. Pure SAT (Kissat etc.) was particularly strong here since
these genres reduce naturally to CDCL with cardinality constraints —
if the SAT path becomes load-bearing, that’s the point to bring
rustsatback. - Really hard hand-crafted puzzles. Where
FastSolver’s trial-1 stalls and DFS branching explodes. CDCL clause learning starts paying off. We haven’t actually hit one in the corpus; if we do, escalation to the external solver becomes useful. - Multi-genre infrastructure that doesn’t justify per-genre tuning. If the goal is “solve any reasonable CSP in ≤ 1 s”, CP-SAT or rustsat are the right call. penmark’s bet is the opposite: domain knowledge per genre, with the external solvers as backstops.
Maintenance posture
OrToolsSolverlives behind--features ortools. Default builds don’t pull incp_sat(which links a system OR-tools install plus libprotobuf).encode_constraintis exhaustive onRule— adding a new variant is a compile error here, not a panic at run time. Every variant in the enum has either an eager encoding (boolean-friendly rules) or a lazy cut (global rules).- The criterion
akari/solverandslitherlink/solvergroups under the consolidatedpenmarkbench include OR-tools as a sibling series when theortoolsfeature is on, so head-to-head ratios show up in the report alongsideFastSolver. - Per-puzzle cross-check tests (
ortools_agrees_with_fast_*in each genre’stests.rs) make divergence a unit-test failure, not a silent corpus drift.
Build setup
cargo … --features ortools Just Works on Apple Silicon Homebrew:
brew install or-tools
cargo bench -p penmark-core --features ortools --bench penmark -- akari/solver
cargo bench -p penmark-core --features ortools --bench penmark -- slitherlink/solver
The wiring:
penmark/.cargo/config.tomlsetsORTOOLS_PREFIX(read by cp_sat’s build.rs) andCXXFLAGS(read by cc-rs when cp_sat compiles its C++ wrapper, so abseil + protobuf headers are reachable).core/build.rsadds the protobuf link line that cp_sat doesn’t emit itself, plus rpaths for both dylibs so binaries don’t needDYLD_LIBRARY_PATHat run time.
Both auto-detect via brew --prefix <pkg> if the env var isn’t set, so
the hardcoded /opt/homebrew/... defaults are just a fallback. To use
a non-Homebrew install (Linux, Intel Mac, source build), export
ORTOOLS_PREFIX / PROTOBUF_PREFIX / CXXFLAGS in your shell — those
override the [env] defaults.
If cargo build --features ortools fails with 'absl/...' file not found, your CXXFLAGS doesn’t include the abseil headers. If it
fails with Undefined symbols ... google::protobuf::..., the build
script didn’t find protobuf — set PROTOBUF_PREFIX.
Rule encoding strategy
CP-SAT 0.4’s Rust bindings expose boolean primitives (add_or,
add_at_most_one, add_exactly_one), linear constraints over
multi-interval domains (add_linear_constraint), and reified
constraints via only_enforce_if. They do not expose
add_circuit, add_table, or add_inverse — which leaves the
“single closed loop” / “all marked cells connected” family of
rules without a clean direct encoding.
The compromise: eager + lazy cuts.
Eager encodings (CP-SAT enforces directly):
Rule variant | Encoding |
|---|---|
Distinct | per-mark add_at_most_one over region |
Sum | add_eq over Σ value · indicator |
Increasing | pairwise add_lt on per-coord value expressions |
MinDifference | reified add_or([pos, neg]) over pos → a-b≥n, neg → b-a≥n |
AtLeastOne | add_or |
AtMost { n } | add_at_most_one (n=1) or add_le |
ExactCount { n } | specialised: add_and (n∈{0,k}), add_exactly_one (n∈{1,k-1}), add_eq otherwise |
DegreeIn { allowed } | add_linear_constraint(sum, allowed-as-singleton-intervals) |
NoTwoByTwo | per 2×2 block, add_or of 4 negated indicators |
Polyomino | per coord with mark, add_or([!self, neighbor_indicators…]) |
Decided | no-op (Binary singleton / OneHot exactly_one already enforces) |
CellMin | per region cell, add_ge over reach-set indicators (or add_or for min=1) |
LoopFreezesCounts | per (cells, target), eager ExactCount over cells |
Lazy cuts (CP-SAT solves, post-check, add add_or no-good
clause if violated, re-solve until satisfied or infeasible):
Connected { mark }— find connected components of marked coords; if more than 1, cut the smallest component as “this exact set must not all be marked together.”Path { mark, closed }— partition drawn-edge subgraph into connected components; for closed, each must be a degree-2 cycle and there must be exactly one; for open, exactly one component with two degree-1 endpoints. Cut violating shapes.BoundedSize { mark, max }— cut any component larger thanmax.ValueGroupedSize— for each numerically-marked coord, BFS over same-value neighbors; cut if size ≠ value.
The lazy-cut loop terminates because each cut shrinks the feasible boolean assignment space by at least one. Worst case is exponential, but in practice for well-constrained puzzles convergence is a handful of iterations.
The slitherlink bench shows the lazy path is workable: 4×4 puzzles solve in ~600µs–2ms, 6×6 in ~3-7ms (vs FastSolver’s ~100µs–1ms on 4×4, ~1-13ms on 6×6 — comparable, and OR-tools even beats FastSolver on a hard 6×6 case).
When a future genre needs a rule whose lazy-check encoding is prohibitively slow at the target sizes, that’s the signal to reach for a different external solver (rustsat with Kissat, say) rather than to push the lazy-cut harder.