Environment: Azure Container Apps Job (D32-benchmark, 32 vCPU / 64 GiB), westus3 ClickHouse Cloud Production tier (3 replicas × 16 vCPU × 64 GiB = 48 vCPU / 192 GiB) Connection: HTTPS, Compression=true, set_async_insert=1, set_wait_for_async_insert=0 PBP sink: clickhouse (direct bulk-copy, 4-way parallel per chunk) Season context: ENABLED with refactored AmericanFootballSeasonAccumulator (per-outcome double[worldCount] arrays, was per-world nested dicts) Job execution: oc-exp-1k-p32-j55xtk1 Image: commit 41ab4c90 (season accumulator refactor) Backend: clickhouse, Scale: 100000 worlds, Environment: local Streaming mode: totalWorlds=100000, chunkSize=500 [parallel=32] PBP sink: clickhouse at ./pbp-parquet Streaming Run Report -------------------- Total wall time: 10696.41 s Simulation time: 2169.31 s OC write time: 387.29 s PBP write time: 7983.42 s Merge time: 121.24 s Season write time: 22.90 s Peak working set: 46598.1 MB Chunks completed: 200 OC rows (merged): 244,800 PBP rows (written): 4,198,981,505 Season rows written: 1,728 Notes: - First complete 100K full-pipeline run covering all three outcome tables: game_outcome_context, play_by_play, season_outcome_context. Total wall 2h 58m, cost ~$27. - Memory: 46.6 GB peak vs 46.8 GB on the no-season 100K run — essentially unchanged. The refactor capped the season accumulator's footprint to ~1.3 GB regardless of world count (was projecting 270 GB on the original nested-dict implementation, which OOM'd at 100K). - Sim phase +48% vs no-season (1,462s → 2,169s). Cause: season accumulator AccumulateGame + FinalizeWorld calls run under a global lock because the shared instance's internal dictionaries aren't thread-safe. 288 games × 100K worlds = 28.8M locked calls. A future follow-up could use per-worker accumulators + a merge step (same pattern the game accumulator uses), but 12 min of added wall time at 100K is well within budget for this gate. - Row counts: 244,800 OC ✓ • 4,198,981,505 PBP ✓ (matches 10K ×10 within stochastic variance) • 1,728 season ✓. - Season write: 22.90s for 1,728 rows with 100,000-element arrays each. Scales roughly linearly from the 10K run's 7.08s.