Environment: Azure Container Apps Job (D32-benchmark, 32 vCPU / 64 GiB), westus3
ClickHouse Cloud Production Scale (2 replicas × 4 CPU × ~8 GiB)
Connection: HTTPS, Compression=true, set_async_insert=1, set_wait_for_async_insert=0
PBP sink: clickhouse (direct bulk-copy, 4-way parallel per chunk)
Job execution: oc-exp-1k-p32-p0xsb6u
Replica timeout: 14,400s (4 hours)
Result: DeadlineExceeded — job terminated at 4h without completing

Backend: clickhouse, Scale: 100000 worlds, Environment: local
Streaming mode: totalWorlds=100000, chunkSize=500 [parallel=32]
PBP sink: clickhouse at ./pbp-parquet

(no Streaming Run Report — process terminated mid-run)

Analysis of the failure:
- Started 2026-04-19T18:20:12 UTC, terminated 2026-04-19T22:20:18 UTC
- Linear projection from 10K (818s PBP write, 513K rows/sec throughput)
  gave 100K PBP write = ~8,200s = 2.3h, total ~2.5h. Should have fit.
- Actual: still in the PBP write phase at 4h. That's super-linear, at
  least 1.75x worse than the 10K-based projection.
- Most likely cause: the play_by_play table is now PARTITION BY season_id
  (single partition per season). At 100K scale a single partition has
  ~4.2B rows. ClickHouse MergeTree creates parts and eventually merges
  them in the background; at this volume the background-merge pressure
  on the single partition may cap foreground INSERT throughput in a way
  that doesn't show at 10K (where the partition is only ~420M rows).
- Secondary factor: 4 client workers × 32 parallel_workers = potentially
  overwhelming the 2-replica Cloud tier's insert throughput ceiling for
  sustained hours.

What to try next (any one should unblock 100K-with-PBP):
1. Bigger CH Cloud tier. Production Scale has 2 × 4 CPU replicas; bumping
   to 4 × 16 CPU would 8x the insert-side capacity.
2. Partition PBP by (season_id, intDiv(world_id, 10000)). Still <100
   partitions per run (a hard CH Cloud setting), but spreads data across
   10 partitions so background merges happen in parallel.
3. Drop PBP from the critical-path 100K pipeline. Ship OC-only at
   49.85 min as the primary artifact; run PBP as a secondary bulk-insert
   pipeline when needed.

Highest validated with-PBP scale: 10K (1,120s / 18.7 min). See the
10K run file for details.
