ADR-009: Sports Content Query Service — Validation & Foundational Architecture¶

Status¶

Proposed — 2026-05-10

Pending review and sign-off by Engineering and Product. Closes the Validation of Sports Content Query Service project (LBS-1181).

Project objective assessment¶

Closing project statement:

Prove the foundational architecture and system's structural ability to process and return queryable, structured data derived from the prediction platform.

Assessment: PARTIALLY PROVEN.

Aspect	Proven?	Evidence
Foundational architecture is structurally sound	Yes	End-to-end pipeline shipped: simulator → accumulator → ClickHouse → GraphQL. Validated at 100,000 worlds × full NFL season + playoff bracket.
System processes data derived from the prediction platform	Yes	Outcome Context tables populated by `LBS.OutcomeContext.SimulationRunner`; queryable via `LBS.OutcomeContext.QueryApi`.
Returns queryable, structured data	Yes	GraphQL surface with `gameContext`, `seasonContext`, `outcomeDefinitions`, expression evaluator. Roster directory for GUID↔name resolution.
Same Game query path proven	Yes	`gameContext(gameId, outcomeIds)` with expression evaluator; verified against 100K-world dataset.
Same Comp query path proven	Yes	`seasonContext(seasonId, gameIds, outcomeIds)` with cross-game expressions; cross-context joins via `context: GAME` + `scopeId` per leaf.
Query language schema holds across both scenarios	Yes	Single `ExpressionInput` discriminated union covers Same Game and Same Comp without surface change.
Performance target ≤200ms met	Not yet	Current measured filtered-query latency is ~1–2 s end-to-end (cross-region from AU). In-region, with cache + bitvector optimisations, target is achievable but not yet demonstrated. See §3.2 and §5.
Outcome Context structure fit for downstream consumers	Mostly	Structure is correct and complete for canonical use cases; gaps documented in §3.3 and tracked as follow-up.

Net: the structural ability is proven; the performance contract at production scale is not yet demonstrated and is the sole remaining gate. A bounded, scoped performance phase (caching + bitvector evaluation engine) is required before the service can be commercially asserted as production-ready against the ≤200ms SLA.

1. Context¶

Problem¶

The platform produces probabilistic predictions across tens of thousands of parallel simulated seasons. Customers need to query that data: ask "what's the chance of X" and get back a calibrated probability with acceptable response time. The Validation of Sports Content Query Service project was scoped to prove that the foundational architecture supports this query workload — structurally first, performance second — across two query shapes:

Same Game — questions whose outcomes belong to a single game (e.g. "P(Mahomes throws ≥ 250 yards AND Chiefs win this game)").
Same Comp — questions whose outcomes span multiple games inside one competition / season (e.g. "P(Rams win their week 6 AND week 12 games)" or "P(team wins the Super Bowl)").

What was built¶

A working prototype across five preceding tickets, with the implementation layers documented at: docs/outcome-context/design.md.

Layer	Project	Responsibility
Pseudo simulation model	`LBS.Model.AmericanFootball(.Simulation)`	RosterFactory / ScheduleFactory / GameEngine / SeasonEngine / PlayoffEngine — produces in-memory game results across N parallel worlds.
Accumulation	`LBS.Model.AmericanFootball.Accumulation`	Per-game and per-season accumulators reduce stochastic per-world outputs into world-indexed outcome arrays. Templates + materialiser produce concrete outcome IDs.
Storage	`LBS.OutcomeContext.Contracts` + `.Storage`	ClickHouse schema (canonical + staging tables both scopes); chunked-streaming writer with per-worker shared-nothing accumulators; reads dedup with `SELECT … FINAL`.
Query	`LBS.OutcomeContext.Query`	Adapter ring; expression model; per-world evaluator producing `EvaluationResult` (probability, distribution stats).
GraphQL surface	`LBS.OutcomeContext.QueryApi`	HotChocolate v16 schema; roster directory for GUID-to-name resolution; deployed to Azure Container Apps in westus3.

Questions this project was designed to answer¶

Does the Outcome Query Language schema hold across both Same Game and Same Comp question shapes?
Is the chosen architecture viable at game speed — i.e. can the system serve queries within the ≤200ms SLA?
Is the Outcome Context result structure fit for purpose for the downstream consumers that will build on it?
Is the chosen tech stack appropriate for production scale-out, or are there structural blockers?

Predecessor ticket scope¶

Ticket	Scope (as inferred from project context)
LBS-1175	Initial scope / requirements / approach. TO VERIFY against source ticket.
LBS-1177	Storage experiment + tech-stack selection. Outcomes captured in `docs/outcome-context/evaluations/storage-experiment/rd-findings.md`, `docs/outcome-context/evaluations/storage-experiment/status.md`, `docs/clickhouse-schema-review.md`.
LBS-1178	Query the Outcome Context — Same Game. Implementation visible in `LBS.OutcomeContext.QueryApi` `gameContext` resolver.
LBS-1179	Query the Outcome Context — Same Comp. Implementation visible in `seasonContext` resolver with cross-game expression support.
LBS-1180	Query API performance baseline. Performance findings — TO VERIFY against source ticket. Baseline numbers from this branch's runs are summarised in §3.2.

Note for reviewers: the entries marked TO VERIFY above need confirmation against the source tickets before sign-off. The implementation outcomes are documented; the per-ticket scope statements may need wording adjustments to match what each ticket explicitly committed to.

2. Decisions Made¶

2.1 The Outcome Query Language — schema and approach¶

Decision: A typed-tree expression language exposed at the GraphQL surface as ExpressionInput. Recursive discriminated union of {outcome, constant, binary, unary} with closed enum operator sets.

Approach:

Wire form: ExpressionInput typed-tree (HotChocolate [OneOf] discriminated input). Suits direct authoring from a UI, persistence, and partial-evaluation caching keys.
Canonical form: postfix token stream (RPN) used internally for evaluation, hashing, and cache lookup. Round-trips with the wire form.
Outcome leaves carry (type, timePeriod, participantId) plus optional (context, scopeId) for cross-context references. The storage outcome_id is synthesised as {TYPE}_{TIME_PERIOD}_{participantId}.
Operators are closed sets (15 binary, 3 unary) bound to GraphQL enums. No custom function nodes; deliberately small to keep the evaluator simple and the cache surface tractable.
Result shape: EvaluationResult exposes probability, matchingWorlds, totalWorlds for boolean expressions and mean / median / min / max / stdDev / mode for numeric expressions, plus expressionHash and resolvedOutcomeIds for caching and observability.

Rationale: the typed tree is the simplest representation that's both machine-friendly (validatable, persistable, hashable) and human-friendly (direct correspondence with how a customer thinks about probability questions). The closed operator set is deliberate: we add operators when we need them, not before, so the language doesn't accumulate features that have no production use.

Alternatives considered and ruled out (captured during D-series decisions, see docs/outcome-context/query-layer/gap-analysis.md):

Free-form string DSL — rejected for parsing ambiguity, validation cost, and lack of UI authoring path.
Open function set with a registry — rejected for premature generalisation; the closed binary/unary set covers every use case identified to date.

2.2 Same Comp architectural approach¶

Decision: Same Comp queries are served by extending seasonContext with a gameIds: [String!] argument. Cross-game expressions reference specific games by binding each leaf to (context: GAME, scopeId: <gameId>).

Concretely:

Single GraphQL request; single resolver call.
Storage seam (IOutcomeContextStore.GetManyByScopeIdAsync) fetches the season scope plus all referenced game scopes in one round trip (two SQL queries — one per scope-type table).
The evaluator walks the expression tree resolving each leaf against its bound scope. Unbound leaves inherit the enclosing scope (the season). All referenced contexts are subject to the worldCount- alignment rule (Hard Rule #2).
Boolean reduction (probability) computed across the same world index for all referenced contexts — a per-world AND across game scopes is exactly the "joint over fixtures" semantic.

Alternatives explored and ruled out (captured during gap analysis D4 sub-design):

Federated approach — client orchestrates N gameContext calls. Rejected: N round trips, no single-shot cache key, no atomic worldCount consistency, harder to express cross-scope joints in the GraphQL surface.
Separate comparisonContext query. Rejected: would have required a second expression model and duplicate cache surface for what's semantically the same operation.
Single union scope ("CompContext") flattening multiple games into one virtual scope. Rejected: would have lost the per-game outcome identity and complicated outcome ID grammar with cross-game prefixing.

The chosen seasonContext + gameIds + per-leaf scopeId approach keeps the query language surface unchanged between Same Game and Same Comp.

2.3 Tech stack¶

Decision summary:

Layer	Choice	Rationale
Outcome Context storage	ClickHouse Cloud (Production 3×16 in westus3)	Selected via the storage experiment (LBS-1177). Full findings in `docs/outcome-context/evaluations/storage-experiment/rd-findings.md`; summary: best fit for the wide-array, world-indexed value shape; bulk-insert performance via the binary protocol; native `Array(Float64)` support; partition + ORDER BY surface enabled the chunked-streaming write path.
Schema engine	`ReplacingMergeTree(context_version)` with staging MergeTree pairs	Idempotent re-write semantics; staging tables decouple per-chunk parallelism from canonical-table merge. Trade-off: requires `SELECT … FINAL` on reads to dedup unmerged parts (documented in `ClickHouseOutcomeContextStore`).
Storage write driver	`ClickHouse.Driver` 1.2.0 — binary `InsertBinaryAsync`	Order-of-magnitude higher throughput than HTTP-row inserts at the volumes produced (100K-element `Array(Float64)`).
Query GraphQL server	HotChocolate v16 on ASP.NET Core / .NET 10	First-class fit for `[OneOf]` discriminated input types (the expression tree); native async resolvers; existing platform SDK story; consistent with rest of LBS Foundry.
Hosting — write-side	Azure Container Apps Job (`oc-exp-1k-p32`, westus3, 32 vCPU / 64 GiB)	Job (not service) shape matches the runner's batch nature. Same region as ClickHouse Cloud — eliminates cross-region tax on the write path.
Hosting — read-side	Azure Container App (`oc-query-api`, westus3, public ingress)	Always-on for the query surface; scale 0..1 (room to lift on production load). Same region as cluster.
Image registry	`ocexperimentacr.azurecr.io`	Single ACR for both images; GitHub Actions workflows on `main` push, `az acr build` for ad-hoc deploys from branches.
Roster directory	In-memory facade behind `IRosterDirectory` today	Production target: Marten-backed implementation reading from the same roster store the runner uses. One-line DI swap; interface is in place.

Rationale (overall): the stack keeps the entire pipeline in one region (westus3), uses the same .NET runtime and Azure platform as the rest of LBS Foundry, and sources its data infrastructure from a managed service (ClickHouse Cloud) rather than a self-hosted cluster. The only non-Foundry-standard component is ClickHouse itself, justified by the shape of the data (wide arrays of doubles per row, far better suited to a column store than to Postgres / Marten).

3. Findings¶

3.1 Query Language¶

Does the schema hold across Same Game and Same Comp? Yes.

The ExpressionInput tree is unchanged between the two scenarios. The same operators, the same leaf shape, the same result shape.
Same Comp introduces the optional (context, scopeId) per-leaf binding for disambiguating cross-game references — this is additive, not a different language.
All 15 binary + 3 unary operators are exercised across the canonical question shapes (threshold, AND/OR combinations, deltas as numeric distributions, "most X in league" via balanced AND-trees).

Gaps and limitations surfaced:

Gap	Detail	Impact	Recommendation
Field-type mix on `OutcomeRefInput`	`type` and `participantId` are GraphQL `String`; `timePeriod` and `context` are GraphQL enums. Easy authoring trap — quoted strings on the enum fields fail with `cannot coerce` errors.	Discoverable but recurring source of caller errors.	Documented in `docs/outcome-context/graphql-master-prompt.md`. Long-term: consider switching `type` to a closed enum bound to the catalogue.
Balanced AND/OR trees required for >~30 terms	HotChocolate's JSON parser depth limit is 64. Right-leaning chains over 30+ terms throw `Invalid JSON document` (HC0012).	Affects "most X in league" (32 QBs) shapes. Workaround: build a balanced binary tree client-side.	Document in caller guidance; consider offering an `andAll([...])` / `orAll([...])` syntactic-sugar variant.
No explicit league-leader operator	"Most X in league" must be expressed as N(N-1)/2 pairwise AND-tree. Verbose.	Performance fine (single CH query); ergonomics not.	Could add an `argMax` or `rank` operator if the use case becomes hot. Not blocking.
No historical-version querying	`ReplacingMergeTree(context_version)` + FINAL collapses historical versions. The "query OC as-of older `context_version`" contract is retired.	Affects future "compare last week's prediction to this week's" use cases.	Tracked: when the use case becomes real, addressed via separate read path or different merge tree variant — not by removing FINAL. Skipped test at `ClickHouseOutcomeContextStoreTests.GetByScopeIdAsync_HonoursExplicitContextVersion` documents the trade-off.

3.2 Performance vs ≤200ms target¶

Observed numbers (validated against the 100K-world dataset on the production cluster):

Query shape	Measured wall-time (AU client → westus3 server)	Notes
`gameContext` Same Game, single outcome filter	~1.1 s	Cross-region; ~565 ms RTT included.
`gameContext` Same Game, two outcomes filtered, expression evaluated	~1.7 s	Same as above plus expression evaluation on 100K-world arrays.
`seasonContext` Same Comp, joint over two fixtures with anchor	~2.0 s	Three game scopes + season anchor, per-leaf binding.
32-QB "league leader" expression (Mahomes vs all rivals + KC SB win)	~2.0 s	33 outcomes fetched, balanced AND tree of 31 comparisons.
`gameContext` unfiltered (no `outcomeIds` filter)	Times out at server (>30 s)	Full per-game payload ~480 MB at 100K worlds. Filter is required for cross-region reads.

Performance against the ≤200ms target — current state:

Not met as currently configured. AU→westus3 RTT alone consumes ~565 ms; in-region client measurements have not been run as part of this validation. TO VERIFY against LBS-1180 baseline if in-region numbers exist there.
The dominant wall-time costs are network RTT (cross-region only), ClickHouse query execution + array serialisation, and per-world expression evaluation. Caching, bitvector evaluation, and in-region client topology each address a distinct slice of the budget.
The architecture is not structurally blocked from hitting 200ms. The contributing factors are addressable through:
In-region clients — eliminates ~565ms of WAN tax
Bitvector evaluation engine — boolean expressions evaluated via packed bitwise ops instead of per-world arrays; expected order-of-magnitude reduction in evaluator cost on the common boolean-reduction shape
Result caching keyed on expressionHash — repeat queries served without ClickHouse round-trip; exact-match hits in single-digit ms
Pre-computed bitvector materialisation for hot outcomes — write-time pre-aggregation of common boolean outcomes
These optimisations are scoped to phases 1, 7, and 8 of the production roadmap (see docs/outcome-context/roadmap.md).

Same Game vs Same Comp performance characteristics:

Wall-time is dominated by payload size and expression evaluation cost per outcome. Both scale in the same way across query shapes.
Same Comp adds one round-trip dimension (season scope plus N game scopes), but the storage layer batches into two SQL queries (ReadManyGames + ReadManySeasons), so the additional latency is small compared to the per-query baseline.
Same Comp's expression evaluation is structurally identical to Same Game's once data is in process — no algorithmic difference.

Viability at game speed:

The architecture is structurally viable: bounded query budgets, filtered payloads, and a clear path to <200ms via the planned optimisation stack.
The architecture is not currently demonstrating <200ms — that demonstration is the explicit deliverable of the performance phase.

3.3 Outcome Context — fit for downstream consumers¶

Is the result structure fit for purpose? Yes for the canonical use cases. Documented gaps:

Aspect	State	Notes
Outcome ID grammar	Stable	`{TYPE}_{TIME_PERIOD}_{participantId}` is consistent across all current outcomes; participants are always stable GUIDs.
Game OC payload	Complete for current question shapes	Per-game outcomes cover passing / rushing / receiving / scoring / kicking / drives at game and intra-game (HALF1, HALF2, Q1-Q4) periods.
Season OC payload	Complete for current question shapes	Team standings (W/L/T, points-for/against), team season totals, individual season totals, postseason flags (`MADE_PLAYOFFS`, `WON_AFC`, `WON_NFC`, `WON_SUPER_BOWL`).
Roster / GUID resolution	Available via `IRosterDirectory`	`teams`, `team`, `player`, `players(filter)`, `fixtures(week)` reverse-lookup queries.
`EvaluationResult` shape	Covers all canonical reporting needs	`probability` + `matchingWorlds` for booleans; full distribution stats for numerics. `resolvedOutcomeIds` and `expressionHash` for cache + observability.
Historical-version queries	Not supported	Trade-off accepted in favour of read-time dedup. Tracked.
Defensive scoring outcomes	Partial coverage	Skipped test `SimulationDrivenAccumulationTests.TemplatesCoverRealSimulationOutput` flags three known coverage gaps in the current template registry. Not blocking for the validation gate but tracked for a future template-extension follow-up.
Cross-comp queries (multiple seasons)	Out of scope	Catalog is single-season today (`nfl_2025_26`). Extending requires multi-season SimulationCatalog + roster identity work. Tracked in roadmap phase 2 (historical data ingest).
Auth / authz on returned data	Not implemented	QueryApi is anonymous public ingress today. Acceptable for the validation phase; production would need scope-based authz (e.g. customer X can read certain leagues only).

Net: the structure is fit for the specific question shapes the project targeted. The known gaps are tracked, scoped, and have planned landing points.

4. Consequences¶

4.1 Scale options available with the chosen stack¶

Read-side (QueryApi): - Container App autoscale rules — current 0..1; trivially raisable to N..M based on QPS. - Multi-region read replicas — ClickHouse Cloud supports cross-region replication; QueryApi can deploy to additional regions and pin to the nearest replica. - Per-region custom domain (e.g. oc-query.luckbox.dev) — mitigates corporate-firewall blocks of *.azurecontainerapps.io. - Multiple QueryApi instances behind Azure Front Door for load balancing + failover.

Write-side (SimulationRunner): - Container App Job vCPU/RAM tier raise — current sizing is 32 vCPU / 64 GiB; doubles available without architectural change. - Horizontal: multiple parallel Jobs each handling disjoint world ranges, writing to the same canonical tables — supported by the staging-table merge SQL design. - Larger ClickHouse Cloud tier — Production 3×16 today; doubling replicas is a configuration change.

Storage: - ClickHouse Cloud tier is the primary scaling lever for query throughput and storage capacity. - ReplacingMergeTree partitioning by season_id already in place; per-season scale-out is a matter of adding more seasons.

4.2 Design limitations introduced by these decisions¶

Limitation	Source decision	Mitigation if it becomes blocking
Anonymous public ingress	Validation-phase deployment shape	Add API-key / JWT auth + rate limiting in production hardening (roadmap phase 9).
Single-region deployment	All-in-westus3 to keep latency tight to ClickHouse	Multi-region read replicas in roadmap phase 8 if cross-region client volume warrants.
`SELECT … FINAL` on reads	Replaces the post-load `OPTIMIZE` step the runner used to perform	Bitvector engine + cache layer (roadmap phase 1 + 8) reduce the per-query FINAL cost; not blocking at current load.
Historical-version queries unsupported	FINAL collapses tied versions	Documented; alternative read path or merge-tree variant when required.
Single-season catalog	Current `IOutcomeTemplateCatalog` keyed on a single sport's templates	Multi-season + multi-sport extension at template-registry level; no storage change needed.
In-process Roster directory	`IRosterDirectory` interface with in-memory facade	One-line DI swap to Marten-backed when roster store lands.
GraphQL parser depth limit	HotChocolate's JSON parser caps nesting at 64	Document balanced-tree authoring; consider syntactic-sugar `andAll`/`orAll` operators if it becomes a frequent foot-gun.

4.3 Cost implications¶

At experiment scale (validated):

ClickHouse Cloud Production 3×16, westus3 — TO VERIFY exact monthly figure against current invoice.
Container Apps Job — billed per execution (~25 min × the chosen vCPU/RAM tier).
Container App QueryApi — billed per replica-hour; current 0..1 scale-down means near-zero idle cost.
ACR Standard tier — small fixed monthly cost.

Projected at production scale (rough order-of-magnitude — needs formal sizing as part of phase 9):

ClickHouse Cloud — primary cost driver. Tier sized to query QPS and data retention. TO VERIFY against expected production load.
QueryApi — scale-up cost is per-replica-hour; with caching in place (phase 8) the multiplier is low.
Runner — fixed cost per simulation run; cadence-driven, not load-driven.
Cross-region transfer if multi-region read is added — meaningful at scale; should drive same-region client topology where possible.

Cost-control levers: - The QueryApi's scale-to-zero behaviour eliminates idle cost. - The bitvector + cache phase (roadmap phase 8) is also a cost-reduction measure: cached queries don't hit ClickHouse. - Storage retention policy (how many context_versions we keep) directly shapes ClickHouse tier sizing.

5. Recommendations & Next Steps¶

Required before "production-ready" can be asserted¶

#	Item	Owner	Tracked as
1	Demonstrate ≤200ms in-region for canonical Same Game and Same Comp queries	Engineering	LBS-1359 (Same Game) + LBS-1360 (Same Comp)
2	Implement bitvector evaluation engine for boolean expressions	Engineering	LBS-1361 — phase 1 of production roadmap
3	Implement result-caching layer keyed on `expressionHash` (FusionCache, in-process L1)	Engineering	LBS-1362 — phase 8 of production roadmap
4	Replace the pseudo simulator with the DS team's real model	Engineering + DS	Phase 5 of production roadmap — dependency-gated on DS, ticket to be raised when DS scope confirmed
5	Establish the data-version / epoch contract on storage to retire the read-time-FINAL trade-off where consumers need historical version querying	Engineering	LBS-1364 — phase 7 of production roadmap

Recommended follow-up tasks to raise¶

These should be raised before LBS-1181 is closed, per acceptance criteria. All seven now raised in Linear:

LBS-1359 — Demonstrate Same Game query ≤200ms (in-region client) — direct close-out of the LBS-1180 performance gap. Should produce a reproducible benchmark in LBS.Model.AmericanFootball.Tests-style harness (dependency on the bitvector engine + cache being in place).
LBS-1360 — Demonstrate Same Comp query ≤200ms (in-region client) — same shape as above, for the cross-game query path.
LBS-1361 — Implement bitvector evaluation engine (inner ring) — full scope in production roadmap phase 1.
LBS-1362 — Implement expression-result cache keyed on expressionHash — FusionCache with in-process L1; Redis L2 added later when distributed QueryApi scaling demands it.
LBS-1365 — Extend outcome template registry to close the SimulationDrivenAccumulationTests.TemplatesCoverRealSimulationOutput gaps (defensive scoring, scrambling QBs, kickoff team-drives).
LBS-1363 — Add observability foundation (structured logging, metrics, tracing) — production roadmap phase 7 prerequisite.
LBS-1364 — Establish data-version / epoch semantics on storage to retire FINAL where consumers need historical queries.

Out of scope for this ADR — captured for later¶

Real model integration with the DS team — dependency-gated.
Live data feed integration — its own design phase.
Progressive / partial-season simulation — depends on live state feed.
Multi-region scaling — premature; revisit when load justifies.
Auth / rate limiting on the public ingress — production hardening.

Project objective — final restatement¶

Prove the foundational architecture and system's structural ability to process and return queryable, structured data derived from the prediction platform.

Confirmed proven, with one explicit caveat: the system's structural ability is demonstrated end-to-end at validation scale (100,000 worlds, full NFL season, both query shapes). The performance contract at the ≤200ms target is not yet demonstrated and is identified as the single remaining gate, with a clear architectural path (bitvector engine + cache + in-region client topology) — none of which require structural re-architecting.

Sign-off¶

Role	Name	Date	Signature
Engineering	to be filled at sign-off
Product	to be filled at sign-off

References¶

Items to verify before sign-off¶

The draft above is built from this branch's implementation outcomes plus the cross-cutting docs in this repo. The following items need verification against the source predecessor tickets — none change the conclusion, but the wording should match what was actually committed in those tickets:

§1 — predecessor ticket scope statements (LBS-1175 / LBS-1177 / LBS-1178 / LBS-1179 / LBS-1180): confirm the per-ticket scope rows match what each ticket explicitly committed to.
§3.2 — performance numbers for in-region clients: if LBS-1180 captured in-region client latency figures (i.e. a client running in westus3 against the QueryApi in westus3), pull those in. The numbers in §3.2 today are cross-region (AU→westus3) measurements.
§4.3 — cost figures: the monthly figures for ClickHouse Cloud / Container Apps / ACR are not in this draft. Pull from current invoice or finance-side estimate before sign-off.
§2 — alternatives considered: the alternatives ruled out under Same Comp and Query Language are inferred from the gap-analysis doc rather than from the source tickets directly. Confirm wording.