Pseudo Simulation Model — Enabling Full Season Monte Carlo¶
What was built¶
A pseudo American Football simulation model that generates play-by-play data for a complete 18-week NFL season (32 teams, 288 games, ~35,000 plays per season). The model is intentionally simplified — it doesn't aim for statistical accuracy yet — but it produces structurally realistic output: drives, plays, penalties, turnovers, scoring, and personnel packages that mirror real NFL data shape.
The purpose is to validate the Monte Carlo approach: simulate the same fixed schedule N times with different random outcomes, producing a probability distribution over game results for every matchup.
Why the original implementation couldn't reach scale¶
The initial prototype had three structural problems that would have prevented reaching 100K simulations:
1. Data was rebuilt every world. Teams, rosters, and the schedule were regenerated with fresh random values for each world. This meant each world simulated a different season — cross-world comparison was meaningless. There was no way to ask "what are the probability-weighted outcomes of Game #147?" because Game #147 was a different matchup in every world. The Monte Carlo approach requires fixed inputs with only the game resolution varying.
2. GC pressure would have dominated runtime. At 30.21 MB allocated per season with 800 Gen2 collections, scaling to 100K worlds meant 2.95 TB through the allocator and 80M Gen2 collections. Gen2 collections are the most expensive — they pause all threads and scan the entire heap. At that volume, GC pauses would have consumed a significant portion of the 55-minute runtime, with increasingly unpredictable latency as heap fragmentation grew.
3. No game identity across worlds. Without a stable MatchId, there was no way to correlate the same scheduled game across worlds in the output data. The Parquet output couldn't answer "show me the distribution of outcomes for Chiefs vs Eagles in Week 12" because there was no identifier linking those rows across worlds.
Structural changes¶
| Change | Impact |
|---|---|
| Separated data (teams, schedule) from simulation | Same 288 matchups replayed across all worlds — enables cross-world probability analysis |
Added MatchId to every matchup |
Unique game identity carried through to Parquet output — enables per-game aggregation across worlds |
| Extracted contracts into standalone project | Models can be consumed by analytics/validation without pulling in the engine |
Added interfaces (IGameEngine, ISeasonEngine, IDriveEngine) |
Engine implementations can be swapped without changing consumers |
Converted data shapes to records (Player, SeasonGame, OffensePackage, DefensePackage) |
Immutable value types with built-in equality — cleaner data flow |
| Converted enums to string constants | Follows Foundry coding standards, serialization-safe |
| Split every type into its own file | Maintainable, navigable codebase |
Performance optimisation¶
After structural changes, a BenchmarkDotNet micro-benchmark was established to measure the cost of simulating a single season (288 games). This single-world benchmark was then used to extrapolate compute and memory requirements for 10K and 100K simulation runs.
Pre-optimisation benchmark (single world)¶
| Metric | Value |
|---|---|
| Time | 32.80 ms |
| Allocated | 30.21 MB |
| Gen0 | 2,933 |
| Gen1 | 1,867 |
| Gen2 | 800 |
Optimisations applied¶
| Change | Detail |
|---|---|
| Eliminated per-play list allocations | AllSkillPlayers() and PickDefender() created a new List<Player> on every play (~35,000 times per season). Replaced with direct index arithmetic across sub-lists — zero allocations. |
| Span-based personnel selection | WeightedPick() called .ToList() on every personnel/formation/coverage selection. Changed to accept (T, int)[] directly, eliminating intermediate list allocations. |
| Pre-sized collections | Drive.Plays pre-sized to 16, GameResult.Drives pre-sized to 32, season games list pre-sized to schedule count. Eliminates resize copies. |
Post-optimisation benchmark (single world)¶
| Metric | Value | Change |
|---|---|---|
| Time | 12.52 ms | 2.6x faster |
| Allocated | 14.35 MB | 2.1x less |
| Gen0 | 1,188 | -59% |
| Gen1 | 1,031 | -45% |
| Gen2 | 47 | -94% |
Extrapolated runtime at scale¶
These projections are extrapolated from the single-world benchmark. Actual performance at scale will depend on GC behaviour under sustained load, memory bandwidth, and I/O throughput for Parquet output. Parallelised estimates assume linear scaling — real-world scaling will be sub-linear due to GC contention and memory bus saturation.
Pre-optimisation (30.21 MB, 32.80 ms per season)¶
| Metric | 10K Worlds | 100K Worlds |
|---|---|---|
| Time (1 core) | 5.5 min | 55 min |
| Time (8 cores) | 41 sec | 6.9 min |
| Time (32 cores) | 10 sec | 1.7 min |
| Peak memory (1 core) | ~30 MB | ~30 MB |
| Peak memory (8 cores) | ~240 MB | ~240 MB |
| Total GC allocated | 295 GB | 2.95 TB |
| Gen2 collections | 8M | 80M |
Post-optimisation (14.35 MB, 12.52 ms per season)¶
| Metric | 10K Worlds | 100K Worlds |
|---|---|---|
| Time (1 core) | 2.1 min | 21 min |
| Time (8 cores) | 16 sec | 2.6 min |
| Time (32 cores) | 4 sec | 38 sec |
| Peak memory (1 core) | ~14 MB | ~14 MB |
| Peak memory (8 cores) | ~115 MB | ~115 MB |
| Total GC allocated | 140 GB | 1.4 TB |
| Gen2 collections | 470K | 4.7M |
Output size¶
| 10K Worlds | 100K Worlds | |
|---|---|---|
| Rows | ~370M | ~3.7B |
| Estimated size (compressed Parquet) | ~100 GB | ~1 TB |
Next steps¶
The benchmark establishes that 10K and 100K simulations are computationally feasible. The next steps are to make the code actually run at these scales:
- Thread-safe RNG — Static
Randominstances are not thread-safe and will produce corrupted results under parallelism. Must be replaced with per-thread instances before scaling beyond a single core. - Parallel world execution — Once RNG is thread-safe, implement
Parallel.ForEachor similar over the world loop to utilise multiple cores. - Output strategy — At 100K worlds, writing 3.7B play rows to a single Parquet file produces ~1 TB. Need to either aggregate in-flight (per-game summaries during simulation) or partition output (one file per world/chunk).
- Fix known simulation bugs — Double field-position flip on turnovers, kneel causing turnover on downs. Failing tests document both, skipped pending DS fix.
- Play struct conversion — The remaining 14 MB per season is dominated by ~35,000
Playclass instances. Converting to a struct or pooled object would halve allocations further. - Statistical validation — Tune simulation parameters against real NFL distributions (completion rates, yards per carry, scoring, etc.).
- Real data integration — Replace
RosterFactorywith real team/player data loaded from an external source.