Skip to content

ADR-009: Sports Content Query Service — Validation & Foundational Architecture

Status

Proposed — 2026-05-10

Pending review and sign-off by Engineering and Product. Closes the Validation of Sports Content Query Service project (LBS-1181).

Project objective assessment

Closing project statement:

Prove the foundational architecture and system's structural ability to process and return queryable, structured data derived from the prediction platform.

Assessment: PARTIALLY PROVEN.

Aspect Proven? Evidence
Foundational architecture is structurally sound Yes End-to-end pipeline shipped: simulator → accumulator → ClickHouse → GraphQL. Validated at 100,000 worlds × full NFL season + playoff bracket.
System processes data derived from the prediction platform Yes Outcome Context tables populated by LBS.OutcomeContext.SimulationRunner; queryable via LBS.OutcomeContext.QueryApi.
Returns queryable, structured data Yes GraphQL surface with gameContext, seasonContext, outcomeDefinitions, expression evaluator. Roster directory for GUID↔name resolution.
Same Game query path proven Yes gameContext(gameId, outcomeIds) with expression evaluator; verified against 100K-world dataset.
Same Comp query path proven Yes seasonContext(seasonId, gameIds, outcomeIds) with cross-game expressions; cross-context joins via context: GAME + scopeId per leaf.
Query language schema holds across both scenarios Yes Single ExpressionInput discriminated union covers Same Game and Same Comp without surface change.
Performance target ≤200ms met Not yet Current measured filtered-query latency is ~1–2 s end-to-end (cross-region from AU). In-region, with cache + bitvector optimisations, target is achievable but not yet demonstrated. See §3.2 and §5.
Outcome Context structure fit for downstream consumers Mostly Structure is correct and complete for canonical use cases; gaps documented in §3.3 and tracked as follow-up.

Net: the structural ability is proven; the performance contract at production scale is not yet demonstrated and is the sole remaining gate. A bounded, scoped performance phase (caching + bitvector evaluation engine) is required before the service can be commercially asserted as production-ready against the ≤200ms SLA.


1. Context

Problem

The platform produces probabilistic predictions across tens of thousands of parallel simulated seasons. Customers need to query that data: ask "what's the chance of X" and get back a calibrated probability with acceptable response time. The Validation of Sports Content Query Service project was scoped to prove that the foundational architecture supports this query workload — structurally first, performance second — across two query shapes:

  • Same Game — questions whose outcomes belong to a single game (e.g. "P(Mahomes throws ≥ 250 yards AND Chiefs win this game)").
  • Same Comp — questions whose outcomes span multiple games inside one competition / season (e.g. "P(Rams win their week 6 AND week 12 games)" or "P(team wins the Super Bowl)").

What was built

A working prototype across five preceding tickets, with the implementation layers documented at: docs/outcome-context/design.md.

Layer Project Responsibility
Pseudo simulation model LBS.Model.AmericanFootball(.Simulation) RosterFactory / ScheduleFactory / GameEngine / SeasonEngine / PlayoffEngine — produces in-memory game results across N parallel worlds.
Accumulation LBS.Model.AmericanFootball.Accumulation Per-game and per-season accumulators reduce stochastic per-world outputs into world-indexed outcome arrays. Templates + materialiser produce concrete outcome IDs.
Storage LBS.OutcomeContext.Contracts + .Storage ClickHouse schema (canonical + staging tables both scopes); chunked-streaming writer with per-worker shared-nothing accumulators; reads dedup with SELECT … FINAL.
Query LBS.OutcomeContext.Query Adapter ring; expression model; per-world evaluator producing EvaluationResult (probability, distribution stats).
GraphQL surface LBS.OutcomeContext.QueryApi HotChocolate v16 schema; roster directory for GUID-to-name resolution; deployed to Azure Container Apps in westus3.

Questions this project was designed to answer

  1. Does the Outcome Query Language schema hold across both Same Game and Same Comp question shapes?
  2. Is the chosen architecture viable at game speed — i.e. can the system serve queries within the ≤200ms SLA?
  3. Is the Outcome Context result structure fit for purpose for the downstream consumers that will build on it?
  4. Is the chosen tech stack appropriate for production scale-out, or are there structural blockers?

Predecessor ticket scope

Ticket Scope (as inferred from project context)
LBS-1175 Initial scope / requirements / approach. TO VERIFY against source ticket.
LBS-1177 Storage experiment + tech-stack selection. Outcomes captured in docs/outcome-context/evaluations/storage-experiment/rd-findings.md, docs/outcome-context/evaluations/storage-experiment/status.md, docs/clickhouse-schema-review.md.
LBS-1178 Query the Outcome Context — Same Game. Implementation visible in LBS.OutcomeContext.QueryApi gameContext resolver.
LBS-1179 Query the Outcome Context — Same Comp. Implementation visible in seasonContext resolver with cross-game expression support.
LBS-1180 Query API performance baseline. Performance findings — TO VERIFY against source ticket. Baseline numbers from this branch's runs are summarised in §3.2.

Note for reviewers: the entries marked TO VERIFY above need confirmation against the source tickets before sign-off. The implementation outcomes are documented; the per-ticket scope statements may need wording adjustments to match what each ticket explicitly committed to.


2. Decisions Made

2.1 The Outcome Query Language — schema and approach

Decision: A typed-tree expression language exposed at the GraphQL surface as ExpressionInput. Recursive discriminated union of {outcome, constant, binary, unary} with closed enum operator sets.

Approach:

  • Wire form: ExpressionInput typed-tree (HotChocolate [OneOf] discriminated input). Suits direct authoring from a UI, persistence, and partial-evaluation caching keys.
  • Canonical form: postfix token stream (RPN) used internally for evaluation, hashing, and cache lookup. Round-trips with the wire form.
  • Outcome leaves carry (type, timePeriod, participantId) plus optional (context, scopeId) for cross-context references. The storage outcome_id is synthesised as {TYPE}_{TIME_PERIOD}_{participantId}.
  • Operators are closed sets (15 binary, 3 unary) bound to GraphQL enums. No custom function nodes; deliberately small to keep the evaluator simple and the cache surface tractable.
  • Result shape: EvaluationResult exposes probability, matchingWorlds, totalWorlds for boolean expressions and mean / median / min / max / stdDev / mode for numeric expressions, plus expressionHash and resolvedOutcomeIds for caching and observability.

Rationale: the typed tree is the simplest representation that's both machine-friendly (validatable, persistable, hashable) and human-friendly (direct correspondence with how a customer thinks about probability questions). The closed operator set is deliberate: we add operators when we need them, not before, so the language doesn't accumulate features that have no production use.

Alternatives considered and ruled out (captured during D-series decisions, see docs/outcome-context/query-layer/gap-analysis.md):

  • Free-form string DSL — rejected for parsing ambiguity, validation cost, and lack of UI authoring path.
  • Open function set with a registry — rejected for premature generalisation; the closed binary/unary set covers every use case identified to date.

2.2 Same Comp architectural approach

Decision: Same Comp queries are served by extending seasonContext with a gameIds: [String!] argument. Cross-game expressions reference specific games by binding each leaf to (context: GAME, scopeId: <gameId>).

Concretely:

  • Single GraphQL request; single resolver call.
  • Storage seam (IOutcomeContextStore.GetManyByScopeIdAsync) fetches the season scope plus all referenced game scopes in one round trip (two SQL queries — one per scope-type table).
  • The evaluator walks the expression tree resolving each leaf against its bound scope. Unbound leaves inherit the enclosing scope (the season). All referenced contexts are subject to the worldCount- alignment rule (Hard Rule #2).
  • Boolean reduction (probability) computed across the same world index for all referenced contexts — a per-world AND across game scopes is exactly the "joint over fixtures" semantic.

Alternatives explored and ruled out (captured during gap analysis D4 sub-design):

  • Federated approach — client orchestrates N gameContext calls. Rejected: N round trips, no single-shot cache key, no atomic worldCount consistency, harder to express cross-scope joints in the GraphQL surface.
  • Separate comparisonContext query. Rejected: would have required a second expression model and duplicate cache surface for what's semantically the same operation.
  • Single union scope ("CompContext") flattening multiple games into one virtual scope. Rejected: would have lost the per-game outcome identity and complicated outcome ID grammar with cross-game prefixing.

The chosen seasonContext + gameIds + per-leaf scopeId approach keeps the query language surface unchanged between Same Game and Same Comp.

2.3 Tech stack

Decision summary:

Layer Choice Rationale
Outcome Context storage ClickHouse Cloud (Production 3×16 in westus3) Selected via the storage experiment (LBS-1177). Full findings in docs/outcome-context/evaluations/storage-experiment/rd-findings.md; summary: best fit for the wide-array, world-indexed value shape; bulk-insert performance via the binary protocol; native Array(Float64) support; partition + ORDER BY surface enabled the chunked-streaming write path.
Schema engine ReplacingMergeTree(context_version) with staging MergeTree pairs Idempotent re-write semantics; staging tables decouple per-chunk parallelism from canonical-table merge. Trade-off: requires SELECT … FINAL on reads to dedup unmerged parts (documented in ClickHouseOutcomeContextStore).
Storage write driver ClickHouse.Driver 1.2.0 — binary InsertBinaryAsync Order-of-magnitude higher throughput than HTTP-row inserts at the volumes produced (100K-element Array(Float64)).
Query GraphQL server HotChocolate v16 on ASP.NET Core / .NET 10 First-class fit for [OneOf] discriminated input types (the expression tree); native async resolvers; existing platform SDK story; consistent with rest of LBS Foundry.
Hosting — write-side Azure Container Apps Job (oc-exp-1k-p32, westus3, 32 vCPU / 64 GiB) Job (not service) shape matches the runner's batch nature. Same region as ClickHouse Cloud — eliminates cross-region tax on the write path.
Hosting — read-side Azure Container App (oc-query-api, westus3, public ingress) Always-on for the query surface; scale 0..1 (room to lift on production load). Same region as cluster.
Image registry ocexperimentacr.azurecr.io Single ACR for both images; GitHub Actions workflows on main push, az acr build for ad-hoc deploys from branches.
Roster directory In-memory facade behind IRosterDirectory today Production target: Marten-backed implementation reading from the same roster store the runner uses. One-line DI swap; interface is in place.

Rationale (overall): the stack keeps the entire pipeline in one region (westus3), uses the same .NET runtime and Azure platform as the rest of LBS Foundry, and sources its data infrastructure from a managed service (ClickHouse Cloud) rather than a self-hosted cluster. The only non-Foundry-standard component is ClickHouse itself, justified by the shape of the data (wide arrays of doubles per row, far better suited to a column store than to Postgres / Marten).


3. Findings

3.1 Query Language

Does the schema hold across Same Game and Same Comp? Yes.

  • The ExpressionInput tree is unchanged between the two scenarios. The same operators, the same leaf shape, the same result shape.
  • Same Comp introduces the optional (context, scopeId) per-leaf binding for disambiguating cross-game references — this is additive, not a different language.
  • All 15 binary + 3 unary operators are exercised across the canonical question shapes (threshold, AND/OR combinations, deltas as numeric distributions, "most X in league" via balanced AND-trees).

Gaps and limitations surfaced:

Gap Detail Impact Recommendation
Field-type mix on OutcomeRefInput type and participantId are GraphQL String; timePeriod and context are GraphQL enums. Easy authoring trap — quoted strings on the enum fields fail with cannot coerce errors. Discoverable but recurring source of caller errors. Documented in docs/outcome-context/graphql-master-prompt.md. Long-term: consider switching type to a closed enum bound to the catalogue.
Balanced AND/OR trees required for >~30 terms HotChocolate's JSON parser depth limit is 64. Right-leaning chains over 30+ terms throw Invalid JSON document (HC0012). Affects "most X in league" (32 QBs) shapes. Workaround: build a balanced binary tree client-side. Document in caller guidance; consider offering an andAll([...]) / orAll([...]) syntactic-sugar variant.
No explicit league-leader operator "Most X in league" must be expressed as N(N-1)/2 pairwise AND-tree. Verbose. Performance fine (single CH query); ergonomics not. Could add an argMax or rank operator if the use case becomes hot. Not blocking.
No historical-version querying ReplacingMergeTree(context_version) + FINAL collapses historical versions. The "query OC as-of older context_version" contract is retired. Affects future "compare last week's prediction to this week's" use cases. Tracked: when the use case becomes real, addressed via separate read path or different merge tree variant — not by removing FINAL. Skipped test at ClickHouseOutcomeContextStoreTests.GetByScopeIdAsync_HonoursExplicitContextVersion documents the trade-off.

3.2 Performance vs ≤200ms target

Observed numbers (validated against the 100K-world dataset on the production cluster):

Query shape Measured wall-time (AU client → westus3 server) Notes
gameContext Same Game, single outcome filter ~1.1 s Cross-region; ~565 ms RTT included.
gameContext Same Game, two outcomes filtered, expression evaluated ~1.7 s Same as above plus expression evaluation on 100K-world arrays.
seasonContext Same Comp, joint over two fixtures with anchor ~2.0 s Three game scopes + season anchor, per-leaf binding.
32-QB "league leader" expression (Mahomes vs all rivals + KC SB win) ~2.0 s 33 outcomes fetched, balanced AND tree of 31 comparisons.
gameContext unfiltered (no outcomeIds filter) Times out at server (>30 s) Full per-game payload ~480 MB at 100K worlds. Filter is required for cross-region reads.

Performance against the ≤200ms target — current state:

  • Not met as currently configured. AU→westus3 RTT alone consumes ~565 ms; in-region client measurements have not been run as part of this validation. TO VERIFY against LBS-1180 baseline if in-region numbers exist there.
  • The dominant wall-time costs are network RTT (cross-region only), ClickHouse query execution + array serialisation, and per-world expression evaluation. Caching, bitvector evaluation, and in-region client topology each address a distinct slice of the budget.
  • The architecture is not structurally blocked from hitting 200ms. The contributing factors are addressable through:
  • In-region clients — eliminates ~565ms of WAN tax
  • Bitvector evaluation engine — boolean expressions evaluated via packed bitwise ops instead of per-world arrays; expected order-of-magnitude reduction in evaluator cost on the common boolean-reduction shape
  • Result caching keyed on expressionHash — repeat queries served without ClickHouse round-trip; exact-match hits in single-digit ms
  • Pre-computed bitvector materialisation for hot outcomes — write-time pre-aggregation of common boolean outcomes
  • These optimisations are scoped to phases 1, 7, and 8 of the production roadmap (see docs/outcome-context/roadmap.md).

Same Game vs Same Comp performance characteristics:

  • Wall-time is dominated by payload size and expression evaluation cost per outcome. Both scale in the same way across query shapes.
  • Same Comp adds one round-trip dimension (season scope plus N game scopes), but the storage layer batches into two SQL queries (ReadManyGames + ReadManySeasons), so the additional latency is small compared to the per-query baseline.
  • Same Comp's expression evaluation is structurally identical to Same Game's once data is in process — no algorithmic difference.

Viability at game speed:

  • The architecture is structurally viable: bounded query budgets, filtered payloads, and a clear path to <200ms via the planned optimisation stack.
  • The architecture is not currently demonstrating <200ms — that demonstration is the explicit deliverable of the performance phase.

3.3 Outcome Context — fit for downstream consumers

Is the result structure fit for purpose? Yes for the canonical use cases. Documented gaps:

Aspect State Notes
Outcome ID grammar Stable {TYPE}_{TIME_PERIOD}_{participantId} is consistent across all current outcomes; participants are always stable GUIDs.
Game OC payload Complete for current question shapes Per-game outcomes cover passing / rushing / receiving / scoring / kicking / drives at game and intra-game (HALF1, HALF2, Q1-Q4) periods.
Season OC payload Complete for current question shapes Team standings (W/L/T, points-for/against), team season totals, individual season totals, postseason flags (MADE_PLAYOFFS, WON_AFC, WON_NFC, WON_SUPER_BOWL).
Roster / GUID resolution Available via IRosterDirectory teams, team, player, players(filter), fixtures(week) reverse-lookup queries.
EvaluationResult shape Covers all canonical reporting needs probability + matchingWorlds for booleans; full distribution stats for numerics. resolvedOutcomeIds and expressionHash for cache + observability.
Historical-version queries Not supported Trade-off accepted in favour of read-time dedup. Tracked.
Defensive scoring outcomes Partial coverage Skipped test SimulationDrivenAccumulationTests.TemplatesCoverRealSimulationOutput flags three known coverage gaps in the current template registry. Not blocking for the validation gate but tracked for a future template-extension follow-up.
Cross-comp queries (multiple seasons) Out of scope Catalog is single-season today (nfl_2025_26). Extending requires multi-season SimulationCatalog + roster identity work. Tracked in roadmap phase 2 (historical data ingest).
Auth / authz on returned data Not implemented QueryApi is anonymous public ingress today. Acceptable for the validation phase; production would need scope-based authz (e.g. customer X can read certain leagues only).

Net: the structure is fit for the specific question shapes the project targeted. The known gaps are tracked, scoped, and have planned landing points.


4. Consequences

4.1 Scale options available with the chosen stack

Read-side (QueryApi): - Container App autoscale rules — current 0..1; trivially raisable to N..M based on QPS. - Multi-region read replicas — ClickHouse Cloud supports cross-region replication; QueryApi can deploy to additional regions and pin to the nearest replica. - Per-region custom domain (e.g. oc-query.luckbox.dev) — mitigates corporate-firewall blocks of *.azurecontainerapps.io. - Multiple QueryApi instances behind Azure Front Door for load balancing + failover.

Write-side (SimulationRunner): - Container App Job vCPU/RAM tier raise — current sizing is 32 vCPU / 64 GiB; doubles available without architectural change. - Horizontal: multiple parallel Jobs each handling disjoint world ranges, writing to the same canonical tables — supported by the staging-table merge SQL design. - Larger ClickHouse Cloud tier — Production 3×16 today; doubling replicas is a configuration change.

Storage: - ClickHouse Cloud tier is the primary scaling lever for query throughput and storage capacity. - ReplacingMergeTree partitioning by season_id already in place; per-season scale-out is a matter of adding more seasons.

4.2 Design limitations introduced by these decisions

Limitation Source decision Mitigation if it becomes blocking
Anonymous public ingress Validation-phase deployment shape Add API-key / JWT auth + rate limiting in production hardening (roadmap phase 9).
Single-region deployment All-in-westus3 to keep latency tight to ClickHouse Multi-region read replicas in roadmap phase 8 if cross-region client volume warrants.
SELECT … FINAL on reads Replaces the post-load OPTIMIZE step the runner used to perform Bitvector engine + cache layer (roadmap phase 1 + 8) reduce the per-query FINAL cost; not blocking at current load.
Historical-version queries unsupported FINAL collapses tied versions Documented; alternative read path or merge-tree variant when required.
Single-season catalog Current IOutcomeTemplateCatalog keyed on a single sport's templates Multi-season + multi-sport extension at template-registry level; no storage change needed.
In-process Roster directory IRosterDirectory interface with in-memory facade One-line DI swap to Marten-backed when roster store lands.
GraphQL parser depth limit HotChocolate's JSON parser caps nesting at 64 Document balanced-tree authoring; consider syntactic-sugar andAll/orAll operators if it becomes a frequent foot-gun.

4.3 Cost implications

At experiment scale (validated):

  • ClickHouse Cloud Production 3×16, westus3 — TO VERIFY exact monthly figure against current invoice.
  • Container Apps Job — billed per execution (~25 min × the chosen vCPU/RAM tier).
  • Container App QueryApi — billed per replica-hour; current 0..1 scale-down means near-zero idle cost.
  • ACR Standard tier — small fixed monthly cost.

Projected at production scale (rough order-of-magnitude — needs formal sizing as part of phase 9):

  • ClickHouse Cloud — primary cost driver. Tier sized to query QPS and data retention. TO VERIFY against expected production load.
  • QueryApi — scale-up cost is per-replica-hour; with caching in place (phase 8) the multiplier is low.
  • Runner — fixed cost per simulation run; cadence-driven, not load-driven.
  • Cross-region transfer if multi-region read is added — meaningful at scale; should drive same-region client topology where possible.

Cost-control levers: - The QueryApi's scale-to-zero behaviour eliminates idle cost. - The bitvector + cache phase (roadmap phase 8) is also a cost-reduction measure: cached queries don't hit ClickHouse. - Storage retention policy (how many context_versions we keep) directly shapes ClickHouse tier sizing.


5. Recommendations & Next Steps

Required before "production-ready" can be asserted

# Item Owner Tracked as
1 Demonstrate ≤200ms in-region for canonical Same Game and Same Comp queries Engineering LBS-1359 (Same Game) + LBS-1360 (Same Comp)
2 Implement bitvector evaluation engine for boolean expressions Engineering LBS-1361 — phase 1 of production roadmap
3 Implement result-caching layer keyed on expressionHash (FusionCache, in-process L1) Engineering LBS-1362 — phase 8 of production roadmap
4 Replace the pseudo simulator with the DS team's real model Engineering + DS Phase 5 of production roadmap — dependency-gated on DS, ticket to be raised when DS scope confirmed
5 Establish the data-version / epoch contract on storage to retire the read-time-FINAL trade-off where consumers need historical version querying Engineering LBS-1364 — phase 7 of production roadmap

These should be raised before LBS-1181 is closed, per acceptance criteria. All seven now raised in Linear:

  1. LBS-1359 — Demonstrate Same Game query ≤200ms (in-region client) — direct close-out of the LBS-1180 performance gap. Should produce a reproducible benchmark in LBS.Model.AmericanFootball.Tests-style harness (dependency on the bitvector engine + cache being in place).
  2. LBS-1360 — Demonstrate Same Comp query ≤200ms (in-region client) — same shape as above, for the cross-game query path.
  3. LBS-1361 — Implement bitvector evaluation engine (inner ring) — full scope in production roadmap phase 1.
  4. LBS-1362 — Implement expression-result cache keyed on expressionHash — FusionCache with in-process L1; Redis L2 added later when distributed QueryApi scaling demands it.
  5. LBS-1365 — Extend outcome template registry to close the SimulationDrivenAccumulationTests.TemplatesCoverRealSimulationOutput gaps (defensive scoring, scrambling QBs, kickoff team-drives).
  6. LBS-1363 — Add observability foundation (structured logging, metrics, tracing) — production roadmap phase 7 prerequisite.
  7. LBS-1364 — Establish data-version / epoch semantics on storage to retire FINAL where consumers need historical queries.

Out of scope for this ADR — captured for later

  • Real model integration with the DS team — dependency-gated.
  • Live data feed integration — its own design phase.
  • Progressive / partial-season simulation — depends on live state feed.
  • Multi-region scaling — premature; revisit when load justifies.
  • Auth / rate limiting on the public ingress — production hardening.

Project objective — final restatement

Prove the foundational architecture and system's structural ability to process and return queryable, structured data derived from the prediction platform.

Confirmed proven, with one explicit caveat: the system's structural ability is demonstrated end-to-end at validation scale (100,000 worlds, full NFL season, both query shapes). The performance contract at the ≤200ms target is not yet demonstrated and is identified as the single remaining gate, with a clear architectural path (bitvector engine + cache + in-region client topology) — none of which require structural re-architecting.


Sign-off

Role Name Date Signature
Engineering to be filled at sign-off
Product to be filled at sign-off

References

Items to verify before sign-off

The draft above is built from this branch's implementation outcomes plus the cross-cutting docs in this repo. The following items need verification against the source predecessor tickets — none change the conclusion, but the wording should match what was actually committed in those tickets:

  • §1 — predecessor ticket scope statements (LBS-1175 / LBS-1177 / LBS-1178 / LBS-1179 / LBS-1180): confirm the per-ticket scope rows match what each ticket explicitly committed to.
  • §3.2 — performance numbers for in-region clients: if LBS-1180 captured in-region client latency figures (i.e. a client running in westus3 against the QueryApi in westus3), pull those in. The numbers in §3.2 today are cross-region (AU→westus3) measurements.
  • §4.3 — cost figures: the monthly figures for ClickHouse Cloud / Container Apps / ACR are not in this draft. Pull from current invoice or finance-side estimate before sign-off.
  • §2 — alternatives considered: the alternatives ruled out under Same Comp and Query Language are inferred from the gap-analysis doc rather than from the source tickets directly. Confirm wording.