Skip to content

Pseudo Simulation Model — Enabling Full Season Monte Carlo

What was built

A pseudo American Football simulation model that generates play-by-play data for a complete 18-week NFL season (32 teams, 288 games, ~35,000 plays per season). The model is intentionally simplified — it doesn't aim for statistical accuracy yet — but it produces structurally realistic output: drives, plays, penalties, turnovers, scoring, and personnel packages that mirror real NFL data shape.

The purpose is to validate the Monte Carlo approach: simulate the same fixed schedule N times with different random outcomes, producing a probability distribution over game results for every matchup.

Why the original implementation couldn't reach scale

The initial prototype had three structural problems that would have prevented reaching 100K simulations:

1. Data was rebuilt every world. Teams, rosters, and the schedule were regenerated with fresh random values for each world. This meant each world simulated a different season — cross-world comparison was meaningless. There was no way to ask "what are the probability-weighted outcomes of Game #147?" because Game #147 was a different matchup in every world. The Monte Carlo approach requires fixed inputs with only the game resolution varying.

2. GC pressure would have dominated runtime. At 30.21 MB allocated per season with 800 Gen2 collections, scaling to 100K worlds meant 2.95 TB through the allocator and 80M Gen2 collections. Gen2 collections are the most expensive — they pause all threads and scan the entire heap. At that volume, GC pauses would have consumed a significant portion of the 55-minute runtime, with increasingly unpredictable latency as heap fragmentation grew.

3. No game identity across worlds. Without a stable MatchId, there was no way to correlate the same scheduled game across worlds in the output data. The Parquet output couldn't answer "show me the distribution of outcomes for Chiefs vs Eagles in Week 12" because there was no identifier linking those rows across worlds.

Structural changes

Change Impact
Separated data (teams, schedule) from simulation Same 288 matchups replayed across all worlds — enables cross-world probability analysis
Added MatchId to every matchup Unique game identity carried through to Parquet output — enables per-game aggregation across worlds
Extracted contracts into standalone project Models can be consumed by analytics/validation without pulling in the engine
Added interfaces (IGameEngine, ISeasonEngine, IDriveEngine) Engine implementations can be swapped without changing consumers
Converted data shapes to records (Player, SeasonGame, OffensePackage, DefensePackage) Immutable value types with built-in equality — cleaner data flow
Converted enums to string constants Follows Foundry coding standards, serialization-safe
Split every type into its own file Maintainable, navigable codebase

Performance optimisation

After structural changes, a BenchmarkDotNet micro-benchmark was established to measure the cost of simulating a single season (288 games). This single-world benchmark was then used to extrapolate compute and memory requirements for 10K and 100K simulation runs.

Pre-optimisation benchmark (single world)

Metric Value
Time 32.80 ms
Allocated 30.21 MB
Gen0 2,933
Gen1 1,867
Gen2 800

Optimisations applied

Change Detail
Eliminated per-play list allocations AllSkillPlayers() and PickDefender() created a new List<Player> on every play (~35,000 times per season). Replaced with direct index arithmetic across sub-lists — zero allocations.
Span-based personnel selection WeightedPick() called .ToList() on every personnel/formation/coverage selection. Changed to accept (T, int)[] directly, eliminating intermediate list allocations.
Pre-sized collections Drive.Plays pre-sized to 16, GameResult.Drives pre-sized to 32, season games list pre-sized to schedule count. Eliminates resize copies.

Post-optimisation benchmark (single world)

Metric Value Change
Time 12.52 ms 2.6x faster
Allocated 14.35 MB 2.1x less
Gen0 1,188 -59%
Gen1 1,031 -45%
Gen2 47 -94%

Extrapolated runtime at scale

These projections are extrapolated from the single-world benchmark. Actual performance at scale will depend on GC behaviour under sustained load, memory bandwidth, and I/O throughput for Parquet output. Parallelised estimates assume linear scaling — real-world scaling will be sub-linear due to GC contention and memory bus saturation.

Pre-optimisation (30.21 MB, 32.80 ms per season)

Metric 10K Worlds 100K Worlds
Time (1 core) 5.5 min 55 min
Time (8 cores) 41 sec 6.9 min
Time (32 cores) 10 sec 1.7 min
Peak memory (1 core) ~30 MB ~30 MB
Peak memory (8 cores) ~240 MB ~240 MB
Total GC allocated 295 GB 2.95 TB
Gen2 collections 8M 80M

Post-optimisation (14.35 MB, 12.52 ms per season)

Metric 10K Worlds 100K Worlds
Time (1 core) 2.1 min 21 min
Time (8 cores) 16 sec 2.6 min
Time (32 cores) 4 sec 38 sec
Peak memory (1 core) ~14 MB ~14 MB
Peak memory (8 cores) ~115 MB ~115 MB
Total GC allocated 140 GB 1.4 TB
Gen2 collections 470K 4.7M

Output size

10K Worlds 100K Worlds
Rows ~370M ~3.7B
Estimated size (compressed Parquet) ~100 GB ~1 TB

Next steps

The benchmark establishes that 10K and 100K simulations are computationally feasible. The next steps are to make the code actually run at these scales:

  • Thread-safe RNG — Static Random instances are not thread-safe and will produce corrupted results under parallelism. Must be replaced with per-thread instances before scaling beyond a single core.
  • Parallel world execution — Once RNG is thread-safe, implement Parallel.ForEach or similar over the world loop to utilise multiple cores.
  • Output strategy — At 100K worlds, writing 3.7B play rows to a single Parquet file produces ~1 TB. Need to either aggregate in-flight (per-game summaries during simulation) or partition output (one file per world/chunk).
  • Fix known simulation bugs — Double field-position flip on turnovers, kneel causing turnover on downs. Failing tests document both, skipped pending DS fix.
  • Play struct conversion — The remaining 14 MB per season is dominated by ~35,000 Play class instances. Converting to a struct or pooled object would halve allocations further.
  • Statistical validation — Tune simulation parameters against real NFL distributions (completion rates, yards per carry, scoring, etc.).
  • Real data integration — Replace RosterFactory with real team/player data loaded from an external source.