All writing

Cutting 98% of the search nodes in a chess engine

Five changes took a depth-6 search from 3.28M nodes to 45K. None of them were clever. All of them were measured.

Topic
Search · C++
Length
~9 min read
Companion
01 The problem

Depth is exponential, and that's the whole fight.

A chess engine plays by looking ahead: from the current position, try every legal move, then every reply, then every reply to that, as far down as time allows, and score the leaves. The search tree branches by roughly 35 moves at every ply. One move deep is ~35 positions. Six moves deep, in theory, is 356 over a billion. You do not get to depth 6 by being patient. You get there by not visiting most of the tree.

I built the engine in C++17, from scratch, to understand exactly where that time goes. The first working version used a plain minimax search to depth 6. On the opening position it visited 3.28 million nodes and took several seconds for a single move. That's unusable, a real game gives you seconds for the whole game, not one move of it.

What follows are the five changes that took that number down by 98.6%, in the order I made them, with the node count after each. The headline is the last one: 45,231 nodes, same depth, same position, same best move.

02 Alpha-beta

The first cut is the deepest: −92%.

Minimax explores branches it has already proven irrelevant. If you're looking for your best move and you've found one worth +0.5, and then you start examining a reply that lets the opponent win a rook, you can stop evaluating that line the moment you see the rook, it can only get worse for you, and you already have something better. That's alpha-beta pruning: carry a lower bound (alpha) and an upper bound (beta) down the tree, and cut any branch that falls outside them.

// The cutoff is the whole trick: a move so good the opponent // would never allow this line. Stop the rest can't matter. if (score >= beta) return beta; // fail-high, prune siblings if (score > alpha) alpha = score; // new best for us

Alpha-beta returns the exact same move as minimax. It just refuses to do the provably-pointless work. On the opening position it dropped the search from 3.28M nodes to 248K a 92% reduction from a change that doesn't touch the result at all. Everything after this is about helping alpha-beta cut sooner.

03 Move ordering

Pruning only works if you look at the best move first.

Alpha-beta's cuts depend entirely on order. If the best move is the first one you try, you set a tight bound immediately and prune everything else fast. If it's the last, you've already searched the whole list before the bound helps. Same position, same algorithm order decides how much you skip.

So I stopped searching moves in whatever order the generator produced them and started searching the most promising first: captures ordered by MVV-LVA (grab the biggest piece with the smallest attacker), the best move from a previous search at the very top, then killer moves and history heuristics for quiet moves that have caused cutoffs elsewhere. Better ordering means earlier cutoffs means fewer nodes. Combined with the next change, ordering took the count to 58K.

04 Transposition table

The same position, reached two ways, is the same position.

Chess transposes constantly: 1. e4 e5 2. Nf3 and 1. Nf3 e5 2. e4 reach an identical board. A naive search re-evaluates both from scratch. A transposition table stores the result of every position you search, keyed by a Zobrist hash of the board, and lets you skip an entire subtree on a hit.

Mine holds one million entries in a fixed 32-byte layout 32MB, allocated once. On the benchmark it ran a 40%+ hit rate: nearly half the positions the search reached had already been solved and were returned instantly. The transposition table pulled the count to 189K on its own, and made the move-ordering wins above compound.

05 Iterative deepening

Searching shallower first is faster than not.

This is the counterintuitive one. Instead of searching straight to depth 6, I search to depth 1, then 2, then 3, all the way up. It sounds like pure waste you're re-searching the early plies every time. But each shallow pass fills the transposition table and refines the move ordering for the next, deeper pass, so by the time you reach depth 6 you're almost always trying the best move first. The shallow searches are cheap, and they pay for themselves many times over. It also means you can stop anytime and still have a complete best move from the last finished depth which is exactly what a clock-bound game needs.

Iterative deepening took the count from 58K to 45,231. Here is the whole journey on one position:

BASELINE OPTIMIZATIONS, CUMULATIVE 3.28M 2M 1M 0.5M 0 3.28M 248K ↓ 92% 189K ↓ 24% 58K ↓ 69% 45K ↓ 23% search_ minimax search_ alphabeta search_ tt search_ ordered search_ iterative
98.6% fewer nodes — same best move (e2e4)
Node count across the five passes, same depth-6 search on the same starting position. Each bar is cumulative it includes every optimization to its left.

Note the shape: alpha-beta does almost all of the work in raw percentage terms, but the later changes matter more than they look. At depth 6 the absolute numbers are small; at depth 8 or 10, where the tree is thousands of times larger, a tight transposition table and good ordering are the difference between a move in 50ms and a move that never finishes.

06 What it cost, what it bought

Faster isn't the same as better.

Throughput held roughly constant through all of this around 820,000 nodes per second so wall-clock time fell almost exactly in step with the node count. The fully-optimized depth-6 search on the opening position settles in 55 milliseconds.

Search nodes (depth 6, opening) 3.28M → 45,231 (98.6% fewer)
Throughput ~820K nodes/sec
Search time (depth 6, opening) 55 ms
Puzzle accuracy (100 mate-in-3) 38% → 42%

That last row is the honest one. To check the engine was getting better and not just faster on the wrong moves, I ran it against 100 Lichess mate-in-3 puzzles and compared its move to the published solution. Going from 38% to 42% solved is real, but small at depth 6 the engine is still being out-calculated by the puzzle setters. The remaining wins aren't in the search anymore; they're in evaluation: king safety, pawn structure, passed pawns. Knowing which move to want is a different problem from finding it quickly, and it's the one I'm working on next.

If there's a lesson here, it's that almost none of this was clever. It was measurement. Every change was a commit with a node count before and after; the ones that didn't move the number got reverted. The chart above isn't a marketing artifact it's the commit history, reordered into a story.

More

The full build architecture, the bitboard representation, and the transposition-table layout is on the project page. Code is on GitHub.