0 · The big picture
The engine simulates a single NFL game as a pure state machine. A run drives that same machine across tens of thousands of parallel "worlds" — typically 10,000 to 100,000. Three ideas make that cheap: the engine owns and reuses all its working memory, parallel workers never share mutable state, and every result is written into shared, index-addressed storage.
Open any concept below to drill in.
See it run: the whole process, two ways at once
A fast-food kitchen and the real engine, stepped together — two cooks (workers) cook burgers (games) and box one meal per customer (the store), all from a single Back / Next.
This is the entire process in one place, shown two ways that step together. Picture a franchise where an order lands for four cheeseburgers (four worlds). With two cooks on shift, the work splits two and two — Cook A takes burgers 0–1, Cook B takes 2–3, and no burger is ever cooked by both (workers own disjoint world ranges). Each cook keeps one station and wipes it clean between burgers instead of grabbing new equipment — that is a worker reusing its pooled engine context. Both work from the same order ticket (the read-only invariant), but no two burgers come out identical: one is a touch under, one over, one just right — that variation is the per-world randomness. Each finished meal is boxed for its own customer and never shared — that is each world's result written to its own slot in the outcome store. The two cooks work their own tickets in parallel without reaching into each other's pans (no locks), and one can finish before the other.
Step through with Next: the kitchen (analogy) and the engine (real) advance in lock-step.
waiting
waiting
waiting
waiting
| world | home_pts | away_pts |
|---|---|---|
| 0 | ||
| 1 | ||
| 2 | ||
| 3 |
The same station (context) is wiped clean and reused for every burger — never a new one built — which is memory pooling and zero per-game allocation (concepts 6, 2); the stn #A / ctx #A badge keeps its identity across both games to prove it. Doneness varies per burger — the per-world randomness (concept 3). Every meal is boxed for its own customer, never shared — each world owns its store slot (concept 5). Two cooks never share a pan, read the one ticket without ever locking it, and one finishes first — workers over disjoint ranges, no locks (concept 4). The chips under the caption keep these countable: readers climb but writers and locks stay 0, results fill their own slots with 0 collisions, and the context count never grows.
Invariants in, engine owns its state
Read-only facts cross the boundary one way; the engine drives a caller-supplied state object and returns nothing.
Two kinds of data cross the boundary into the engine, and the distinction is the foundation everything else builds on:
- Invariants (read-only). The roster, the matchup, the schedule, the random seed. The engine reads these as immutable facts — it never writes them. Many worlds can read the same invariants at once because reads never conflict.
- Working state (engine-owned). Everything that changes during a game — score, clock, possession, down and distance, per-play scratch, per-player stats — lives in a context object the caller creates and hands in. The engine mutates it in place.
The engine exposes one entry point: it takes the read-only invariants and the context it should drive, and returns nothing (the exact signature is incidental — any shape carrying those inputs works). It is a pure function over caller-owned state: same inputs produce the same in-place mutations, with no hidden globals or static buffers. Before each game it resets the context in place — wiping the durable state, re-stamping the random source, clearing the per-play scratch — rather than constructing a fresh one.
That single decision — the caller owns the state and the engine only drives it — is what unlocks everything downstream: the state can be pooled and reused, workers can each own one privately, and the engine itself stays a stateless function you can reason about in isolation.
A stateless engine over externally-owned state is testable, poolable, and trivially parallel. The state is a parameter, not a property.
Zero heap allocation per game
The hot path allocates nothing on the managed heap — and a sentinel test fails the build if it ever does.
Running a matchup across tens of thousands of worlds means the per-game cost is multiplied tens of thousands of times over. The biggest lever is the garbage collector: if each game allocated even a little, the GC would dominate the run. So the design target is blunt — zero bytes allocated per game. The strategies that get there:
- Reuse, don't allocate. One context per worker is reset and replayed for every game it runs (see concept 1). The expensive objects exist once and live for the whole run.
- Mutate in place. State objects are cleared and refilled each game, never reconstructed.
- Stack, not heap. Short-lived per-decision buffers — like the probability vector a predictor fills before sampling — live on the stack, so they never touch the GC.
- Value types for records. Per-play log rows are value types copied into a pre-sized buffer rather than freshly allocated objects.
- No language features that allocate invisibly on the hot path — in C#, that rules out LINQ, closures, and params arrays. Each allocates quietly behind the scenes, so they're kept off the per-play and per-game path entirely.
- A recording hook that's free. Games optionally record play-by-play through a simple virtual hook; benchmarking showed it costs the same as the more complex generic alternative, so the simpler, still-zero-allocation form was chosen.
This is enforced, not hoped for. A sentinel test warms the engine up (so the runtime has fully optimised every hot method — on .NET, JIT-compiled to its top tier), then runs hundreds of games while measuring exactly how many bytes were allocated. The budget is a hard zero.
An allocation sentinel test with a hard 0 bytes/game budget. The invariant can't silently rot — a regression fails CI immediately.
Determinism, independent of the hardware
Each world's randomness is derived from stable identifiers, so the same world produces the same game on any machine.
Randomness and parallelism are usually in tension: if worlds pulled from a shared random source, the result of any given world would depend on the order workers happened to run — which depends on how many CPUs the host has. That makes runs irreproducible.
The engine sidesteps this entirely. Each world's random stream is seeded from stable identifiers — the run's seed combined with the matchup and the world index — rather than from a shared counter or the wall clock. Because the seed is computed purely from which matchup and which world, world 7 of matchup 3 always plays out the same game, no matter how many cores ran the season or in what order.
Reproducibility is therefore a property of the identifiers, not of the schedule. You can run the same season on a laptop and a 64-core server and get byte-identical results.
Tie randomness to identity, not to execution order. Reproducibility then survives any amount of parallelism.
Share everything, lock nothing
Workers own disjoint slices of the work, so no two threads ever touch the same mutable object — and locks become unnecessary.
A run spans tens of thousands of worlds. They're divided into contiguous ranges, one per worker, and each worker is single-threaded over its own range. The invariant that makes this safe is simple and absolute: no two threads ever modify the same object.
Each worker carries its own private working set — its own engine context, its own pool of player handles, its own memory pools. The only things shared across workers are either read-only (the invariants) or written at slots that belong exclusively to one worker (the next concept). There is no overlap by construction.
The payoff: there are no locks, no atomic operations, and no concurrent collections anywhere on the simulation path. They aren't optimised away — they were never needed. Eliminating the need for synchronisation is both faster and far simpler to reason about than synchronising correctly.
The partition has no overlap by construction, so the absence of synchronisation on the hot path — no lock, atomics, or concurrent collections — is a design guarantee, not a tuning choice.
Memory pooling
Backing arrays are rented from per-worker pools and returned, so a run reuses a handful of buffers instead of allocating tens of thousands.
Concept 2 keeps the per-game path at zero allocation, but a season still needs sizeable backing arrays for those outcome columns. Allocating them fresh, per matchup, would reintroduce exactly the GC pressure we removed. So those arrays come from pools:
- Typed pools. Separate pools hand out the integer, boolean, and floating-point arrays the columns need.
- Per-worker and lock-free. Each worker gets its own pool. Because the worker is single-threaded, that pool can be unsynchronised — no locking — and exact-size, handing back an array of precisely the requested length with no power-of-two rounding waste.
- A simple lifecycle. Rent an array and clear it, fill it during the matchup, then return it to the pool when the matchup is done. The next matchup rents the same memory back.
- Players too. Per-player handles are pooled the same way — minted once per worker and reused across every game it simulates, with their stats reset in place rather than reallocated.
Pooling converts "tens of thousands of allocations" into "rent the same handful of buffers over and over." It's the season-scale counterpart to the per-game zero-allocation rule.
Single-threaded ownership (concept 4) is what lets the pools be lock-free and exact-size — the parallelism model and the memory model reinforce each other.
Why it stays true: auditable invariants
The performance properties survive ongoing change because each one reduces to a cheap review rule.
A fast design is only valuable if it stays fast as the code evolves. What keeps these properties intact is that each one collapses into a rule a reviewer (or a test) can check at a glance:
- Any allocation on the per-game path is a red flag — in C#, a stray new X(). The zero-allocation sentinel catches it automatically; a reviewer catches the intent.
- Any lock or atomic on the simulation path means the ownership model was violated. If you reach for synchronisation, two threads are touching the same object — back up and re-partition instead.
- Randomness comes from derived seeds, never a shared source. A shared counter would couple a world's result to the schedule and break reproducibility.
These are review rules, not just good intentions — which is exactly why the engine's speed and determinism don't quietly erode over time.
an allocation on the hot path (in C#, new) · synchronisation in the simulation core (a lock or atomic) · randomness from a shared counter. Spotting any one means an invariant is about to break.