Environment: Azure Container Apps Job (D32-benchmark, 32 vCPU / 64 GiB), westus3 ClickHouse Cloud Production tier (3 replicas × 16 vCPU × 64 GiB) Connection: HTTPS, Compression=true, set_async_insert=1, set_wait_for_async_insert=0 PBP sink: clickhouse with parallel writes Season context: ENABLED (refactored accumulator) Job execution: oc-exp-1k-p32-84iykp5 Image: commit 34155f2d (schema changes from the CH docs review) Schema changes applied in this run: 1. play_by_play: PARTITION BY (season_id, intDiv(world_id, 10000)) (was: PARTITION BY season_id — 1 partition per season) 2. play_by_play: CODEC(Delta, ZSTD) on monotonic integer columns (world_id, play_sequence, clock_seconds, drive_*_clock_seconds) 3. season_outcome_context: added PARTITION BY season_id 4. experiment_results: added PARTITION BY toYYYYMM(run_timestamp) RESULT: FAILED at 4h replica deadline. Only ~2,000 of 100,000 worlds processed (2% complete). Baseline without these changes was 2h 58m complete. The "optimisations" caused a ~68× slowdown in per-world throughput. Diagnostic evidence from CH Cloud system.query_log (last 5h): - Insert queries: 387 (baseline ~30,000 for a complete run) - Select queries: 8,737 (ratio 22:1 select-to-insert) - Individual insert speed when they did run: healthy (21s for 5.2M rows = 250K rows/sec, matches historical) - No exceptions, no MEMORY_LIMIT errors, no server-side failures The 22:1 select-to-insert ratio is the smoking gun. ClickHouseBulkCopy calls InitAsync before every insert to fetch column metadata (the 'SELECT ... WHERE ?=?' pattern). With the new partition expression on play_by_play (intDiv(world_id, 10000)), each of those InitAsync calls became much slower — possibly because the driver has to parse and understand the computed partition expression for each call. The client then appeared stuck between inserts, making almost no progress. Action: reverted the schema changes in commit 150ec156. Primary baseline remains the 100K full-pipeline+season run at 2h 58m with PARTITION BY season_id on play_by_play. Lessons: - CH best-practice docs describe table-level design in isolation. They don't account for client-library behaviour (BulkCopy's per- insert metadata fetch) or our specific orchestration pattern. - Schema changes from 'review by spec' need validation runs BEFORE being committed as improvements. - Empirical > theoretical: the 2h 58m baseline with 'wrong-by-the- book' partitioning beats every proposed improvement we've measured.