OutcomeContext — High-Level Design¶

Three layers of code stacked over one shared data model. Every layer is deliberately sport-agnostic at the contract boundary so the AmericanFootball-specific work can be replaced with another sport later without rewriting the storage or query stack.

For the build sequence and current state of each phase, see the build plan. For the post-D1-D4 query layer reference, see the convergence reference.

Layered architecture¶

flowchart TB
    subgraph Pres["Presentation · LBS.OutcomeContext.QueryApi"]
        gql["GraphQL surface
gameContext · seasonContext · outcomeDefinitions
roster directory (teams, players, fixtures)"]
    end

    subgraph Query["Query layer · LBS.OutcomeContext.Query"]
        repo["ContextRepository
adapter ring · OutcomeCatalog · per-world evaluator"]
        expr["ExpressionInput model
binary / unary / outcome / constant nodes
postfix canonical form"]
        materializer["OutcomeCatalogMaterializer
templates × SimulationCatalog → concrete OutcomeIds"]
    end

    subgraph Storage["Storage · LBS.OutcomeContext.Storage"]
        writer["ClickHouseOutcomeContextWriter
binary bulk insert · staging→canonical merge SQL
schema bootstrap · OPTIMIZE allowlist"]
        store["ClickHouseOutcomeContextStore
SELECT … FINAL on reads · outcomeIds filter"]
    end

    subgraph Accum["Accumulation · LBS.Model.AmericanFootball.Accumulation"]
        gameAcc["AmericanFootballOutcomeAccumulator
per-game outcome arrays indexed by world"]
        seasonAcc["AmericanFootballSeasonAccumulator
per-season totals + postseason flags
(MADE_PLAYOFFS, WON_AFC, WON_NFC, WON_SUPER_BOWL)"]
        templates["AmericanFootballOutcomeTemplates"]
    end

    subgraph Sim["Pseudo Simulation · LBS.Model.AmericanFootball(.Simulation)"]
        engines["GameEngine · SeasonEngine
StandingsEngine · PlayoffEngine"]
        factories["RosterFactory · ScheduleFactory
ESPN-backed snapshot of teams + 272 fixtures"]
    end

    Pres --> Query
    Query --> Storage
    Storage --> Accum
    Accum --> Sim

Two runtimes ride on top¶

flowchart LR
    runner["SimulationRunner
(Container Apps Job)
westus3 · 32 vCPU / 64 GiB
chunked streaming write"]
    api["QueryApi
(Container App, ingress)
westus3 · 2 vCPU / 4 GiB
HotChocolate v16 GraphQL"]
    ch[("ClickHouse Cloud
Production 3×16
westus3")]
    runner -- "binary bulk insert
+ staging→canonical merge" --> ch
    ch -- "SELECT … FINAL · outcomeIds filter" --> api

1. Pseudo simulation model¶

The simulator is sport-specific (AmericanFootball today) and lives in src/Models/AmericanFootball. Stage abstractions:

Component	Responsibility
`RosterFactory.BuildAllTeams()`	Produces 32 `Team` records from a committed ESPN snapshot — rosters, GUIDs for team and player IDs, depth-chart ordering.
`ScheduleFactory.Build(teams)`	Emits 272 regular-season `Matchup`s with stable `FixtureId` GUIDs. Postseason fixtures intentionally not emitted here — playoff brackets diverge per world.
`GameEngine.SimulateGame(home, away)`	Runs one game, returns a `GameResult` (per-quarter scores + full play list with player attributions). Pure RNG-driven, stateless.
`SeasonEngine.SimulateSeason(schedule)`	Composes the above: regular season via `GameEngine`, then `StandingsEngine` derives seedings, then `PlayoffEngine` plays the bracket. Returns a flat `List<SeasonGame>` with `GameType ∈ {RegularSeason, WildCard, Divisional, ConferenceChampionship, SuperBowl}`.

The simulator has no opinion about how its output is stored. It produces in-memory game results; downstream layers decide what to keep.

2. Accumulation layer¶

This is where stochastic per-world outputs are reduced into addressable arrays. The unit of currency is an outcome row — one int per world (values are rounded to Int32 for storage), identified by an outcome_id of the form {TYPE}_{TIME_PERIOD}_{participantId}.

Two accumulators:

AmericanFootballOutcomeAccumulator — per-game. Accumulates one game's plays into ~250-600 game-scope outcomes (player passing yards, team points by quarter, etc.). One world at a time gets AccumulateGame(gameId, worldIndex, gameResult) calls; emits a GameOutcomeContext with arrays of length worldCount.
AmericanFootballSeasonAccumulator — per-season. Aggregates per-game stats into season totals (wins, losses, points-for, season passing yards). Adds postseason outcomes (MADE_PLAYOFFS, WON_AFC, WON_NFC, WON_SUPER_BOWL) via a separate AccumulatePlayoffGame entry point so playoff games never produce per-game OC — their bracket diverges per world; per-game shape is unstable.

A separate template registry (AmericanFootballOutcomeTemplates.All) declares parametric outcome shapes (PASSING_YARDS_GAME_{playerId}, WON_SUPER_BOWL_SEASON_{teamId}, etc.). The OutcomeCatalogMaterializer (in the Query layer — LBS.OutcomeContext.Query.Discovery) crosses these templates with a SimulationCatalog (rosters + fixtures) to produce the concrete OutcomeCatalog of all materialised IDs the system might emit.

3. Storage layer¶

Pure I/O. Sport-agnostic — the contracts in LBS.OutcomeContext.Contracts (GameOutcomeContext, SeasonOutcomeContext, OutcomeRow) hold no AmericanFootball references.

ClickHouse schema¶

Table	Purpose	Engine + ORDER BY
`game_outcome_context`	Canonical per-game outcomes	`ReplacingMergeTree(context_version)` `(game_id, outcome_id)`, partitioned by `season_id`
`season_outcome_context`	Canonical per-season outcomes	`ReplacingMergeTree(context_version)` `(season_id, outcome_id)`
`game_outcome_context_staging`	Per-chunk pre-merged game OC	`MergeTree` `(game_id, outcome_id, batch_index)`
`season_outcome_context_staging`	Per-worker per-chunk season OC	`MergeTree` `(season_id, outcome_id, batch_index)`

Two seams¶

ClickHouseOutcomeContextWriter — EnsureSchemaAsync, Write*StagingAsync, MergeStagingFor*Async, TruncateStagingAsync, OptimizeAsync (allowlisted targets). Bulk inserts via the ClickHouse.Driver binary protocol. Staging→canonical merge is one INSERT … SELECT per scope using arraySort-over-groupArray so worker slices concatenate in absolute world-index order.
ClickHouseOutcomeContextStore — read-only. GetByScopeIdAsync + GetManyByScopeIdAsync, both filterable by outcomeIds so callers can pull just the outcomes they need. All SELECTs use FINAL to dedup unmerged ReplacingMergeTree parts at read time. Trade-off (FINAL collapses historical context_versions) documented in code; the matching test that previously exercised historical-version querying is left in the suite with a skip reason.

Runner write path¶

The runner uses a chunked streaming with per-worker shared-nothing pattern:

sequenceDiagram
  participant R as Runner
  participant Wkr as Workers
  participant Stg as ClickHouseStaging
  participant Can as ClickHouseCanonical

  loop for each chunk of worlds
    R->>Wkr: spin up and partition chunk worlds across workers
    Wkr->>Wkr: each worker simulates its slice with its own season and game accumulators
    Wkr-->>R: parallel block returns
    R->>R: per fixture, merge worker partials in process (game OC fan-out is 272)
    R->>Stg: bulk insert game partial, batchIndex equals chunkIndex
    R->>Stg: per worker bulk insert season partial, batchIndex equals chunkIndex times workerCount plus workerIndex
    R-->>R: drop worker accumulators, GC reclaims chunk memory
  end
  R->>Can: per fixture parallel MergeStagingForGameAsync
  R->>Can: single MergeStagingForSeasonAsync
  Note over R,Can: No OPTIMIZE - reads use FINAL.

Why this shape:

Chunking bounds memory at O(chunkSize × fixtures × outcomes) regardless of total worldCount. The earlier in-memory-only design OOM'd at ~32K worlds; chunked streaming clears 100K cleanly.
Per-worker shared-nothing accumulators lifted N=16 efficiency from ~52 % (shared accumulator + lock) to ~85 %. The lock on the season accumulator was the cross-worker contention point we explicitly removed.
Asymmetric fan-out handling. Game OC has 272 fixture keys, so we pre-merge worker partials in C# to keep staging rowcount at chunks × fixtures. Season OC has fan-out 1 (one season scope), so we let ClickHouse do the merge — staging gets chunks × workers rows but the absolute count is small (~215 K at 100 K worlds × N=16) and we avoid a 1 GB process-memory buffer.

4. Query layer¶

LBS.OutcomeContext.Query is sport-agnostic. Three things it owns:

Adapter ring¶

IOutcomeContext is the surface the evaluator consumes; GameOutcomeContextAdapter and SeasonOutcomeContextAdapter adapt the storage-side GameOutcomeContext / SeasonOutcomeContext records into it. The ContextRepository joins OutcomeCatalog (which outcomes exist) with IOutcomeContextStore (where their values live) and is the single point of dependency injection for the GraphQL resolvers.

Expression model¶

ExpressionInput is a [OneOf] discriminated union of {outcome, constant, binary, unary} nodes. Operators are string constants (BinaryOperatorConstants.Add, etc.) bound to GraphQL enums at the surface. The shape is small (no functions, no aggregations) by design — the evaluator runs per-world over the value arrays directly.

Per-world evaluator¶

Walks the expression tree, applying operators element-wise across the world dimension. Comparisons + logical ops produce booleans → probability = matchingWorlds / totalWorlds. Pure arithmetic produces a numeric distribution → mean, median, min, max, stdDev, mode are computed lazily on first access (HotChocolate only invokes the property getters for fields the caller actually selected). Result objects also expose resolvedOutcomeIds and expressionHash for caching and observability.

5. Presentation — GraphQL surface¶

LBS.OutcomeContext.QueryApi is a small ASP.NET Core / HotChocolate v16 app. Its DI graph:

IRosterDirectory (in-memory facade today, Marten-backed later)
IOutcomeContextStore (ClickHouseOutcomeContextStore against westus3)
ISimulationCatalogStore + IOutcomeTemplateCatalog (in-memory from RosterFactory)
ContextRepository (composes catalog + store)
RosterQueryExtensions (extends Query with teams/players/fixtures lookups)

Top-level GraphQL queries:

Query	Returns
`gameContext(gameId, outcomeIds)`	A `GameContext` with `evaluate(expression)`, `outcomes(filter)`, and the metadata fields.
`seasonContext(seasonId, gameIds, outcomeIds)`	A `SeasonContext` with the same shape, plus the option to pull cross-game expressions.
`outcomeDefinitions(filter)`	Discovery surface — does not hit ClickHouse, returns the materialised `OutcomeCatalog`.
`teams` / `team(teamId)` / `player(playerId)` / `players(filter)` / `fixtures(week)`	GUID ↔ name reverse-lookup via `IRosterDirectory`.

The outcomeIds arg on gameContext / seasonContext is the critical perf knob — it pushes filtering all the way to the ClickHouse outcome_id IN (…) predicate, so cross-region clients don't pay for the unfiltered ~480 MB-per-game payload.

For LLM-driven querying of this surface, see graphql-master-prompt.md.

Key cross-cutting decisions¶

Decision	Why
`ReplacingMergeTree(context_version)` + `SELECT FINAL` on reads	Operational: re-running the same `(season, contextVersion)` produces duplicate `(scope, outcome)` rows until background merges complete. FINAL forces the dedup at read time. Trade-off: historical-version queries at the same key become unreachable (collapsed by FINAL). The single test that exercised that contract is documented + skipped.
Sport-agnostic Storage + Query layers	Lets us swap in another sport's accumulators / templates without touching CH or the evaluator. The AmericanFootball-specific code is isolated to `LBS.Model.AmericanFootball.*` and the `RosterDirectory`.
Chunked streaming + per-worker shared-nothing accumulators (write path)	Memory bounded at `O(chunkSize)` regardless of total world count → unblocks 100 K. Lock-free per-worker state → ~85 % parallel efficiency at N=16, vs ~52 % for the shared+lock alternative.
In-C# pre-merge for game OC, but CH-side merge for season OC	Asymmetric fan-outs. Game has 272 fixture keys → pre-merging in C# keeps staging rowcount at `chunks × fixtures`. Season has 1 key → no fan-out to amortise, so we let CH do the merge. Same `arraySort`-over-`groupArray` SQL pattern in both.
`OutcomeRefInput` mixes `String` (for `type`, `participantId`) with `Enum` (for `timePeriod`, `context`)	Driven by which fields have closed-set values vs open-ended. `type` is open-ended (every sport's stat dictionary is different), so it stays a string and is membership-checked against the catalog at query canonicalisation. `timePeriod` and `context` are closed enums.
`IRosterDirectory` interface + in-memory facade	The roster store will eventually live in Marten alongside the other read models. The interface is in place so the swap is one `services.AddSingleton<IRosterDirectory, MartenRosterDirectory>()` line later.

Operational shape¶

Component	Where it runs	Scaling
`SimulationRunner`	Azure Container Apps Job (`oc-exp-1k-p32`, westus3, 32 vCPU / 64 GiB)	Triggered manually. Runs once per data refresh. ~24 min for 100 K worlds × 285 games × full season + bracket.
`QueryApi`	Azure Container App (`oc-query-api`, westus3, 2 vCPU / 4 GiB, scale 0..1, public ingress)	Cold-starts in 5-8 s; per-query latency ~1-2 s end-to-end with the `outcomeIds` filter.
ClickHouse Cloud	Production 3×16 tier (3 replicas × 16 vCPU × 64 GiB) in westus3. Same region as both runtimes.	Holds canonical + staging tables; background merges run continuously.
ACR	`ocexperimentacr.azurecr.io`. Two images: `simulation-runner`, `query-api`.	GitHub Actions builds + pushes on `main`; `az acr build` from a branch for ad-hoc deploys.