Pseudo Simulation Model — Enabling Full Season Monte Carlo¶

What was built¶

A pseudo American Football simulation model that generates play-by-play data for a complete 18-week NFL season (32 teams, 288 games, ~35,000 plays per season). The model is intentionally simplified — it doesn't aim for statistical accuracy yet — but it produces structurally realistic output: drives, plays, penalties, turnovers, scoring, and personnel packages that mirror real NFL data shape.

The purpose is to validate the Monte Carlo approach: simulate the same fixed schedule N times with different random outcomes, producing a probability distribution over game results for every matchup.

Why the original implementation couldn't reach scale¶

The initial prototype had three structural problems that would have prevented reaching 100K simulations:

1. Data was rebuilt every world. Teams, rosters, and the schedule were regenerated with fresh random values for each world. This meant each world simulated a different season — cross-world comparison was meaningless. There was no way to ask "what are the probability-weighted outcomes of Game #147?" because Game #147 was a different matchup in every world. The Monte Carlo approach requires fixed inputs with only the game resolution varying.

2. GC pressure would have dominated runtime. At 30.21 MB allocated per season with 800 Gen2 collections, scaling to 100K worlds meant 2.95 TB through the allocator and 80M Gen2 collections. Gen2 collections are the most expensive — they pause all threads and scan the entire heap. At that volume, GC pauses would have consumed a significant portion of the 55-minute runtime, with increasingly unpredictable latency as heap fragmentation grew.

3. No game identity across worlds. Without a stable MatchId, there was no way to correlate the same scheduled game across worlds in the output data. The Parquet output couldn't answer "show me the distribution of outcomes for Chiefs vs Eagles in Week 12" because there was no identifier linking those rows across worlds.

Structural changes¶

Change	Impact
Separated data (teams, schedule) from simulation	Same 288 matchups replayed across all worlds — enables cross-world probability analysis
Added `MatchId` to every matchup	Unique game identity carried through to Parquet output — enables per-game aggregation across worlds
Extracted contracts into standalone project	Models can be consumed by analytics/validation without pulling in the engine
Added interfaces (`IGameEngine`, `ISeasonEngine`, `IDriveEngine`)	Engine implementations can be swapped without changing consumers
Converted data shapes to records (`Player`, `SeasonGame`, `OffensePackage`, `DefensePackage`)	Immutable value types with built-in equality — cleaner data flow
Converted enums to string constants	Follows Foundry coding standards, serialization-safe
Split every type into its own file	Maintainable, navigable codebase

Performance optimisation¶

After structural changes, a BenchmarkDotNet micro-benchmark was established to measure the cost of simulating a single season (288 games). This single-world benchmark was then used to extrapolate compute and memory requirements for 10K and 100K simulation runs.

Pre-optimisation benchmark (single world)¶

Metric	Value
Time	32.80 ms
Allocated	30.21 MB
Gen0	2,933
Gen1	1,867
Gen2	800

Optimisations applied¶

Change	Detail
Eliminated per-play list allocations	`AllSkillPlayers()` and `PickDefender()` created a `new List<Player>` on every play (~35,000 times per season). Replaced with direct index arithmetic across sub-lists — zero allocations.
Span-based personnel selection	`WeightedPick()` called `.ToList()` on every personnel/formation/coverage selection. Changed to accept `(T, int)[]` directly, eliminating intermediate list allocations.
Pre-sized collections	`Drive.Plays` pre-sized to 16, `GameResult.Drives` pre-sized to 32, season games list pre-sized to schedule count. Eliminates resize copies.

Post-optimisation benchmark (single world)¶

Metric	Value	Change
Time	12.52 ms	2.6x faster
Allocated	14.35 MB	2.1x less
Gen0	1,188	-59%
Gen1	1,031	-45%
Gen2	47	-94%

Extrapolated runtime at scale¶

These projections are extrapolated from the single-world benchmark. Actual performance at scale will depend on GC behaviour under sustained load, memory bandwidth, and I/O throughput for Parquet output. Parallelised estimates assume linear scaling — real-world scaling will be sub-linear due to GC contention and memory bus saturation.

Pre-optimisation (30.21 MB, 32.80 ms per season)¶

Metric	10K Worlds	100K Worlds
Time (1 core)	5.5 min	55 min
Time (8 cores)	41 sec	6.9 min
Time (32 cores)	10 sec	1.7 min
Peak memory (1 core)	~30 MB	~30 MB
Peak memory (8 cores)	~240 MB	~240 MB
Total GC allocated	295 GB	2.95 TB
Gen2 collections	8M	80M

Post-optimisation (14.35 MB, 12.52 ms per season)¶

Metric	10K Worlds	100K Worlds
Time (1 core)	2.1 min	21 min
Time (8 cores)	16 sec	2.6 min
Time (32 cores)	4 sec	38 sec
Peak memory (1 core)	~14 MB	~14 MB
Peak memory (8 cores)	~115 MB	~115 MB
Total GC allocated	140 GB	1.4 TB
Gen2 collections	470K	4.7M

Output size¶

	10K Worlds	100K Worlds
Rows	~370M	~3.7B
Estimated size (compressed Parquet)	~100 GB	~1 TB

Next steps¶

The benchmark establishes that 10K and 100K simulations are computationally feasible. The next steps are to make the code actually run at these scales:

Thread-safe RNG — Static Random instances are not thread-safe and will produce corrupted results under parallelism. Must be replaced with per-thread instances before scaling beyond a single core.
Parallel world execution — Once RNG is thread-safe, implement Parallel.ForEach or similar over the world loop to utilise multiple cores.
Output strategy — At 100K worlds, writing 3.7B play rows to a single Parquet file produces ~1 TB. Need to either aggregate in-flight (per-game summaries during simulation) or partition output (one file per world/chunk).
Fix known simulation bugs — Double field-position flip on turnovers, kneel causing turnover on downs. Failing tests document both, skipped pending DS fix.
Play struct conversion — The remaining 14 MB per season is dominated by ~35,000 Play class instances. Converting to a struct or pooled object would halve allocations further.
Statistical validation — Tune simulation parameters against real NFL distributions (completion rates, yards per carry, scoring, etc.).
Real data integration — Replace RosterFactory with real team/player data loaded from an external source.