OutcomeContext — High-Level Design¶
Three layers of code stacked over one shared data model. Every layer is deliberately sport-agnostic at the contract boundary so the AmericanFootball-specific work can be replaced with another sport later without rewriting the storage or query stack.
For the build sequence and current state of each phase, see the build plan. For the post-D1-D4 query layer reference, see the convergence reference.
Layered architecture¶
flowchart TB
subgraph Pres["Presentation · LBS.OutcomeContext.QueryApi"]
gql["GraphQL surface
gameContext · seasonContext · outcomeDefinitions
roster directory (teams, players, fixtures)"]
end
subgraph Query["Query layer · LBS.OutcomeContext.Query"]
repo["ContextRepository
adapter ring · OutcomeCatalog · per-world evaluator"]
expr["ExpressionInput model
binary / unary / outcome / constant nodes
postfix canonical form"]
materializer["OutcomeCatalogMaterializer
templates × SimulationCatalog → concrete OutcomeIds"]
end
subgraph Storage["Storage · LBS.OutcomeContext.Storage"]
writer["ClickHouseOutcomeContextWriter
binary bulk insert · staging→canonical merge SQL
schema bootstrap · OPTIMIZE allowlist"]
store["ClickHouseOutcomeContextStore
SELECT … FINAL on reads · outcomeIds filter"]
end
subgraph Accum["Accumulation · LBS.Model.AmericanFootball.Accumulation"]
gameAcc["AmericanFootballOutcomeAccumulator
per-game outcome arrays indexed by world"]
seasonAcc["AmericanFootballSeasonAccumulator
per-season totals + postseason flags
(MADE_PLAYOFFS, WON_AFC, WON_NFC, WON_SUPER_BOWL)"]
templates["AmericanFootballOutcomeTemplates"]
end
subgraph Sim["Pseudo Simulation · LBS.Model.AmericanFootball(.Simulation)"]
engines["GameEngine · SeasonEngine
StandingsEngine · PlayoffEngine"]
factories["RosterFactory · ScheduleFactory
ESPN-backed snapshot of teams + 272 fixtures"]
end
Pres --> Query
Query --> Storage
Storage --> Accum
Accum --> Sim
Two runtimes ride on top¶
flowchart LR
runner["SimulationRunner
(Container Apps Job)
westus3 · 32 vCPU / 64 GiB
chunked streaming write"]
api["QueryApi
(Container App, ingress)
westus3 · 2 vCPU / 4 GiB
HotChocolate v16 GraphQL"]
ch[("ClickHouse Cloud
Production 3×16
westus3")]
runner -- "binary bulk insert
+ staging→canonical merge" --> ch
ch -- "SELECT … FINAL · outcomeIds filter" --> api
1. Pseudo simulation model¶
The simulator is sport-specific (AmericanFootball today) and lives in
src/Models/AmericanFootball. Stage abstractions:
| Component | Responsibility |
|---|---|
RosterFactory.BuildAllTeams() |
Produces 32 Team records from a committed ESPN snapshot — rosters, GUIDs for team and player IDs, depth-chart ordering. |
ScheduleFactory.Build(teams) |
Emits 272 regular-season Matchups with stable FixtureId GUIDs. Postseason fixtures intentionally not emitted here — playoff brackets diverge per world. |
GameEngine.SimulateGame(home, away) |
Runs one game, returns a GameResult (per-quarter scores + full play list with player attributions). Pure RNG-driven, stateless. |
SeasonEngine.SimulateSeason(schedule) |
Composes the above: regular season via GameEngine, then StandingsEngine derives seedings, then PlayoffEngine plays the bracket. Returns a flat List<SeasonGame> with GameType ∈ {RegularSeason, WildCard, Divisional, ConferenceChampionship, SuperBowl}. |
The simulator has no opinion about how its output is stored. It produces in-memory game results; downstream layers decide what to keep.
2. Accumulation layer¶
This is where stochastic per-world outputs are reduced into addressable
arrays. The unit of currency is an outcome row — one int per world
(values are rounded to Int32 for storage),
identified by an outcome_id of the form
{TYPE}_{TIME_PERIOD}_{participantId}.
Two accumulators:
-
AmericanFootballOutcomeAccumulator— per-game. Accumulates one game's plays into ~250-600 game-scope outcomes (player passing yards, team points by quarter, etc.). One world at a time getsAccumulateGame(gameId, worldIndex, gameResult)calls; emits aGameOutcomeContextwith arrays of lengthworldCount. -
AmericanFootballSeasonAccumulator— per-season. Aggregates per-game stats into season totals (wins, losses, points-for, season passing yards). Adds postseason outcomes (MADE_PLAYOFFS,WON_AFC,WON_NFC,WON_SUPER_BOWL) via a separateAccumulatePlayoffGameentry point so playoff games never produce per-game OC — their bracket diverges per world; per-game shape is unstable.
A separate template registry (AmericanFootballOutcomeTemplates.All)
declares parametric outcome shapes (PASSING_YARDS_GAME_{playerId},
WON_SUPER_BOWL_SEASON_{teamId}, etc.). The
OutcomeCatalogMaterializer (in the Query layer —
LBS.OutcomeContext.Query.Discovery) crosses these templates with a
SimulationCatalog (rosters + fixtures) to produce the concrete
OutcomeCatalog of all materialised IDs the system might emit.
3. Storage layer¶
Pure I/O. Sport-agnostic — the contracts in LBS.OutcomeContext.Contracts
(GameOutcomeContext, SeasonOutcomeContext, OutcomeRow) hold no
AmericanFootball references.
ClickHouse schema¶
| Table | Purpose | Engine + ORDER BY |
|---|---|---|
game_outcome_context |
Canonical per-game outcomes | ReplacingMergeTree(context_version) (game_id, outcome_id), partitioned by season_id |
season_outcome_context |
Canonical per-season outcomes | ReplacingMergeTree(context_version) (season_id, outcome_id) |
game_outcome_context_staging |
Per-chunk pre-merged game OC | MergeTree (game_id, outcome_id, batch_index) |
season_outcome_context_staging |
Per-worker per-chunk season OC | MergeTree (season_id, outcome_id, batch_index) |
Two seams¶
-
ClickHouseOutcomeContextWriter—EnsureSchemaAsync,Write*StagingAsync,MergeStagingFor*Async,TruncateStagingAsync,OptimizeAsync(allowlisted targets). Bulk inserts via theClickHouse.Driverbinary protocol. Staging→canonical merge is oneINSERT … SELECTper scope usingarraySort-over-groupArrayso worker slices concatenate in absolute world-index order. -
ClickHouseOutcomeContextStore— read-only.GetByScopeIdAsync+GetManyByScopeIdAsync, both filterable byoutcomeIdsso callers can pull just the outcomes they need. All SELECTs useFINALto dedup unmerged ReplacingMergeTree parts at read time. Trade-off (FINAL collapses historicalcontext_versions) documented in code; the matching test that previously exercised historical-version querying is left in the suite with a skip reason.
Runner write path¶
The runner uses a chunked streaming with per-worker shared-nothing pattern:
sequenceDiagram
participant R as Runner
participant Wkr as Workers
participant Stg as ClickHouseStaging
participant Can as ClickHouseCanonical
loop for each chunk of worlds
R->>Wkr: spin up and partition chunk worlds across workers
Wkr->>Wkr: each worker simulates its slice with its own season and game accumulators
Wkr-->>R: parallel block returns
R->>R: per fixture, merge worker partials in process (game OC fan-out is 272)
R->>Stg: bulk insert game partial, batchIndex equals chunkIndex
R->>Stg: per worker bulk insert season partial, batchIndex equals chunkIndex times workerCount plus workerIndex
R-->>R: drop worker accumulators, GC reclaims chunk memory
end
R->>Can: per fixture parallel MergeStagingForGameAsync
R->>Can: single MergeStagingForSeasonAsync
Note over R,Can: No OPTIMIZE - reads use FINAL.
Why this shape:
- Chunking bounds memory at
O(chunkSize × fixtures × outcomes)regardless of totalworldCount. The earlier in-memory-only design OOM'd at ~32K worlds; chunked streaming clears 100K cleanly. - Per-worker shared-nothing accumulators lifted N=16 efficiency from ~52 % (shared accumulator + lock) to ~85 %. The lock on the season accumulator was the cross-worker contention point we explicitly removed.
- Asymmetric fan-out handling. Game OC has 272 fixture keys, so we
pre-merge worker partials in C# to keep staging rowcount at
chunks × fixtures. Season OC has fan-out 1 (one season scope), so we let ClickHouse do the merge — staging getschunks × workersrows but the absolute count is small (~215 K at 100 K worlds × N=16) and we avoid a 1 GB process-memory buffer.
4. Query layer¶
LBS.OutcomeContext.Query is sport-agnostic. Three things it owns:
Adapter ring¶
IOutcomeContext is the surface the evaluator consumes;
GameOutcomeContextAdapter and SeasonOutcomeContextAdapter adapt the
storage-side GameOutcomeContext / SeasonOutcomeContext records into
it. The ContextRepository joins OutcomeCatalog (which outcomes exist)
with IOutcomeContextStore (where their values live) and is the single
point of dependency injection for the GraphQL resolvers.
Expression model¶
ExpressionInput is a [OneOf] discriminated union of
{outcome, constant, binary, unary} nodes. Operators are string constants
(BinaryOperatorConstants.Add, etc.) bound to GraphQL enums at the
surface. The shape is small (no functions, no aggregations) by design —
the evaluator runs per-world over the value arrays directly.
Per-world evaluator¶
Walks the expression tree, applying operators element-wise across the
world dimension. Comparisons + logical ops produce booleans →
probability = matchingWorlds / totalWorlds. Pure arithmetic produces a
numeric distribution → mean, median, min, max, stdDev, mode
are computed lazily on first access (HotChocolate only invokes the
property getters for fields the caller actually selected). Result objects
also expose resolvedOutcomeIds and expressionHash for caching and
observability.
5. Presentation — GraphQL surface¶
LBS.OutcomeContext.QueryApi is a small ASP.NET Core / HotChocolate v16
app. Its DI graph:
IRosterDirectory (in-memory facade today, Marten-backed later)
IOutcomeContextStore (ClickHouseOutcomeContextStore against westus3)
ISimulationCatalogStore + IOutcomeTemplateCatalog (in-memory from RosterFactory)
ContextRepository (composes catalog + store)
RosterQueryExtensions (extends Query with teams/players/fixtures lookups)
Top-level GraphQL queries:
| Query | Returns |
|---|---|
gameContext(gameId, outcomeIds) |
A GameContext with evaluate(expression), outcomes(filter), and the metadata fields. |
seasonContext(seasonId, gameIds, outcomeIds) |
A SeasonContext with the same shape, plus the option to pull cross-game expressions. |
outcomeDefinitions(filter) |
Discovery surface — does not hit ClickHouse, returns the materialised OutcomeCatalog. |
teams / team(teamId) / player(playerId) / players(filter) / fixtures(week) |
GUID ↔ name reverse-lookup via IRosterDirectory. |
The outcomeIds arg on gameContext / seasonContext is the critical
perf knob — it pushes filtering all the way to the ClickHouse
outcome_id IN (…) predicate, so cross-region clients don't pay for the
unfiltered ~480 MB-per-game payload.
For LLM-driven querying of this surface, see
graphql-master-prompt.md.
Key cross-cutting decisions¶
| Decision | Why |
|---|---|
ReplacingMergeTree(context_version) + SELECT FINAL on reads |
Operational: re-running the same (season, contextVersion) produces duplicate (scope, outcome) rows until background merges complete. FINAL forces the dedup at read time. Trade-off: historical-version queries at the same key become unreachable (collapsed by FINAL). The single test that exercised that contract is documented + skipped. |
| Sport-agnostic Storage + Query layers | Lets us swap in another sport's accumulators / templates without touching CH or the evaluator. The AmericanFootball-specific code is isolated to LBS.Model.AmericanFootball.* and the RosterDirectory. |
| Chunked streaming + per-worker shared-nothing accumulators (write path) | Memory bounded at O(chunkSize) regardless of total world count → unblocks 100 K. Lock-free per-worker state → ~85 % parallel efficiency at N=16, vs ~52 % for the shared+lock alternative. |
| In-C# pre-merge for game OC, but CH-side merge for season OC | Asymmetric fan-outs. Game has 272 fixture keys → pre-merging in C# keeps staging rowcount at chunks × fixtures. Season has 1 key → no fan-out to amortise, so we let CH do the merge. Same arraySort-over-groupArray SQL pattern in both. |
OutcomeRefInput mixes String (for type, participantId) with Enum (for timePeriod, context) |
Driven by which fields have closed-set values vs open-ended. type is open-ended (every sport's stat dictionary is different), so it stays a string and is membership-checked against the catalog at query canonicalisation. timePeriod and context are closed enums. |
IRosterDirectory interface + in-memory facade |
The roster store will eventually live in Marten alongside the other read models. The interface is in place so the swap is one services.AddSingleton<IRosterDirectory, MartenRosterDirectory>() line later. |
Operational shape¶
| Component | Where it runs | Scaling |
|---|---|---|
SimulationRunner |
Azure Container Apps Job (oc-exp-1k-p32, westus3, 32 vCPU / 64 GiB) |
Triggered manually. Runs once per data refresh. ~24 min for 100 K worlds × 285 games × full season + bracket. |
QueryApi |
Azure Container App (oc-query-api, westus3, 2 vCPU / 4 GiB, scale 0..1, public ingress) |
Cold-starts in 5-8 s; per-query latency ~1-2 s end-to-end with the outcomeIds filter. |
| ClickHouse Cloud | Production 3×16 tier (3 replicas × 16 vCPU × 64 GiB) in westus3. Same region as both runtimes. | Holds canonical + staging tables; background merges run continuously. |
| ACR | ocexperimentacr.azurecr.io. Two images: simulation-runner, query-api. |
GitHub Actions builds + pushes on main; az acr build from a branch for ad-hoc deploys. |