Skip to content

OutcomeContext — High-Level Design

Three layers of code stacked over one shared data model. Every layer is deliberately sport-agnostic at the contract boundary so the AmericanFootball-specific work can be replaced with another sport later without rewriting the storage or query stack.

For the build sequence and current state of each phase, see the build plan. For the post-D1-D4 query layer reference, see the convergence reference.

Layered architecture

flowchart TB
    subgraph Pres["Presentation · LBS.OutcomeContext.QueryApi"]
        gql["GraphQL surface
gameContext · seasonContext · outcomeDefinitions
roster directory (teams, players, fixtures)"]
    end

    subgraph Query["Query layer · LBS.OutcomeContext.Query"]
        repo["ContextRepository
adapter ring · OutcomeCatalog · per-world evaluator"]
        expr["ExpressionInput model
binary / unary / outcome / constant nodes
postfix canonical form"]
        materializer["OutcomeCatalogMaterializer
templates × SimulationCatalog → concrete OutcomeIds"]
    end

    subgraph Storage["Storage · LBS.OutcomeContext.Storage"]
        writer["ClickHouseOutcomeContextWriter
binary bulk insert · staging→canonical merge SQL
schema bootstrap · OPTIMIZE allowlist"]
        store["ClickHouseOutcomeContextStore
SELECT … FINAL on reads · outcomeIds filter"]
    end

    subgraph Accum["Accumulation · LBS.Model.AmericanFootball.Accumulation"]
        gameAcc["AmericanFootballOutcomeAccumulator
per-game outcome arrays indexed by world"]
        seasonAcc["AmericanFootballSeasonAccumulator
per-season totals + postseason flags
(MADE_PLAYOFFS, WON_AFC, WON_NFC, WON_SUPER_BOWL)"]
        templates["AmericanFootballOutcomeTemplates"]
    end

    subgraph Sim["Pseudo Simulation · LBS.Model.AmericanFootball(.Simulation)"]
        engines["GameEngine · SeasonEngine
StandingsEngine · PlayoffEngine"]
        factories["RosterFactory · ScheduleFactory
ESPN-backed snapshot of teams + 272 fixtures"]
    end

    Pres --> Query
    Query --> Storage
    Storage --> Accum
    Accum --> Sim

Two runtimes ride on top

flowchart LR
    runner["SimulationRunner
(Container Apps Job)
westus3 · 32 vCPU / 64 GiB
chunked streaming write"]
    api["QueryApi
(Container App, ingress)
westus3 · 2 vCPU / 4 GiB
HotChocolate v16 GraphQL"]
    ch[("ClickHouse Cloud
Production 3×16
westus3")]
    runner -- "binary bulk insert
+ staging→canonical merge" --> ch
    ch -- "SELECT … FINAL · outcomeIds filter" --> api

1. Pseudo simulation model

The simulator is sport-specific (AmericanFootball today) and lives in src/Models/AmericanFootball. Stage abstractions:

Component Responsibility
RosterFactory.BuildAllTeams() Produces 32 Team records from a committed ESPN snapshot — rosters, GUIDs for team and player IDs, depth-chart ordering.
ScheduleFactory.Build(teams) Emits 272 regular-season Matchups with stable FixtureId GUIDs. Postseason fixtures intentionally not emitted here — playoff brackets diverge per world.
GameEngine.SimulateGame(home, away) Runs one game, returns a GameResult (per-quarter scores + full play list with player attributions). Pure RNG-driven, stateless.
SeasonEngine.SimulateSeason(schedule) Composes the above: regular season via GameEngine, then StandingsEngine derives seedings, then PlayoffEngine plays the bracket. Returns a flat List<SeasonGame> with GameType ∈ {RegularSeason, WildCard, Divisional, ConferenceChampionship, SuperBowl}.

The simulator has no opinion about how its output is stored. It produces in-memory game results; downstream layers decide what to keep.

2. Accumulation layer

This is where stochastic per-world outputs are reduced into addressable arrays. The unit of currency is an outcome row — one int per world (values are rounded to Int32 for storage), identified by an outcome_id of the form {TYPE}_{TIME_PERIOD}_{participantId}.

Two accumulators:

  • AmericanFootballOutcomeAccumulator — per-game. Accumulates one game's plays into ~250-600 game-scope outcomes (player passing yards, team points by quarter, etc.). One world at a time gets AccumulateGame(gameId, worldIndex, gameResult) calls; emits a GameOutcomeContext with arrays of length worldCount.

  • AmericanFootballSeasonAccumulator — per-season. Aggregates per-game stats into season totals (wins, losses, points-for, season passing yards). Adds postseason outcomes (MADE_PLAYOFFS, WON_AFC, WON_NFC, WON_SUPER_BOWL) via a separate AccumulatePlayoffGame entry point so playoff games never produce per-game OC — their bracket diverges per world; per-game shape is unstable.

A separate template registry (AmericanFootballOutcomeTemplates.All) declares parametric outcome shapes (PASSING_YARDS_GAME_{playerId}, WON_SUPER_BOWL_SEASON_{teamId}, etc.). The OutcomeCatalogMaterializer (in the Query layer — LBS.OutcomeContext.Query.Discovery) crosses these templates with a SimulationCatalog (rosters + fixtures) to produce the concrete OutcomeCatalog of all materialised IDs the system might emit.

3. Storage layer

Pure I/O. Sport-agnostic — the contracts in LBS.OutcomeContext.Contracts (GameOutcomeContext, SeasonOutcomeContext, OutcomeRow) hold no AmericanFootball references.

ClickHouse schema

Table Purpose Engine + ORDER BY
game_outcome_context Canonical per-game outcomes ReplacingMergeTree(context_version) (game_id, outcome_id), partitioned by season_id
season_outcome_context Canonical per-season outcomes ReplacingMergeTree(context_version) (season_id, outcome_id)
game_outcome_context_staging Per-chunk pre-merged game OC MergeTree (game_id, outcome_id, batch_index)
season_outcome_context_staging Per-worker per-chunk season OC MergeTree (season_id, outcome_id, batch_index)

Two seams

  • ClickHouseOutcomeContextWriterEnsureSchemaAsync, Write*StagingAsync, MergeStagingFor*Async, TruncateStagingAsync, OptimizeAsync (allowlisted targets). Bulk inserts via the ClickHouse.Driver binary protocol. Staging→canonical merge is one INSERT … SELECT per scope using arraySort-over-groupArray so worker slices concatenate in absolute world-index order.

  • ClickHouseOutcomeContextStore — read-only. GetByScopeIdAsync + GetManyByScopeIdAsync, both filterable by outcomeIds so callers can pull just the outcomes they need. All SELECTs use FINAL to dedup unmerged ReplacingMergeTree parts at read time. Trade-off (FINAL collapses historical context_versions) documented in code; the matching test that previously exercised historical-version querying is left in the suite with a skip reason.

Runner write path

The runner uses a chunked streaming with per-worker shared-nothing pattern:

sequenceDiagram
  participant R as Runner
  participant Wkr as Workers
  participant Stg as ClickHouseStaging
  participant Can as ClickHouseCanonical

  loop for each chunk of worlds
    R->>Wkr: spin up and partition chunk worlds across workers
    Wkr->>Wkr: each worker simulates its slice with its own season and game accumulators
    Wkr-->>R: parallel block returns
    R->>R: per fixture, merge worker partials in process (game OC fan-out is 272)
    R->>Stg: bulk insert game partial, batchIndex equals chunkIndex
    R->>Stg: per worker bulk insert season partial, batchIndex equals chunkIndex times workerCount plus workerIndex
    R-->>R: drop worker accumulators, GC reclaims chunk memory
  end
  R->>Can: per fixture parallel MergeStagingForGameAsync
  R->>Can: single MergeStagingForSeasonAsync
  Note over R,Can: No OPTIMIZE - reads use FINAL.

Why this shape:

  • Chunking bounds memory at O(chunkSize × fixtures × outcomes) regardless of total worldCount. The earlier in-memory-only design OOM'd at ~32K worlds; chunked streaming clears 100K cleanly.
  • Per-worker shared-nothing accumulators lifted N=16 efficiency from ~52 % (shared accumulator + lock) to ~85 %. The lock on the season accumulator was the cross-worker contention point we explicitly removed.
  • Asymmetric fan-out handling. Game OC has 272 fixture keys, so we pre-merge worker partials in C# to keep staging rowcount at chunks × fixtures. Season OC has fan-out 1 (one season scope), so we let ClickHouse do the merge — staging gets chunks × workers rows but the absolute count is small (~215 K at 100 K worlds × N=16) and we avoid a 1 GB process-memory buffer.

4. Query layer

LBS.OutcomeContext.Query is sport-agnostic. Three things it owns:

Adapter ring

IOutcomeContext is the surface the evaluator consumes; GameOutcomeContextAdapter and SeasonOutcomeContextAdapter adapt the storage-side GameOutcomeContext / SeasonOutcomeContext records into it. The ContextRepository joins OutcomeCatalog (which outcomes exist) with IOutcomeContextStore (where their values live) and is the single point of dependency injection for the GraphQL resolvers.

Expression model

ExpressionInput is a [OneOf] discriminated union of {outcome, constant, binary, unary} nodes. Operators are string constants (BinaryOperatorConstants.Add, etc.) bound to GraphQL enums at the surface. The shape is small (no functions, no aggregations) by design — the evaluator runs per-world over the value arrays directly.

Per-world evaluator

Walks the expression tree, applying operators element-wise across the world dimension. Comparisons + logical ops produce booleans → probability = matchingWorlds / totalWorlds. Pure arithmetic produces a numeric distribution → mean, median, min, max, stdDev, mode are computed lazily on first access (HotChocolate only invokes the property getters for fields the caller actually selected). Result objects also expose resolvedOutcomeIds and expressionHash for caching and observability.

5. Presentation — GraphQL surface

LBS.OutcomeContext.QueryApi is a small ASP.NET Core / HotChocolate v16 app. Its DI graph:

IRosterDirectory (in-memory facade today, Marten-backed later)
IOutcomeContextStore (ClickHouseOutcomeContextStore against westus3)
ISimulationCatalogStore + IOutcomeTemplateCatalog (in-memory from RosterFactory)
ContextRepository (composes catalog + store)
RosterQueryExtensions (extends Query with teams/players/fixtures lookups)

Top-level GraphQL queries:

Query Returns
gameContext(gameId, outcomeIds) A GameContext with evaluate(expression), outcomes(filter), and the metadata fields.
seasonContext(seasonId, gameIds, outcomeIds) A SeasonContext with the same shape, plus the option to pull cross-game expressions.
outcomeDefinitions(filter) Discovery surface — does not hit ClickHouse, returns the materialised OutcomeCatalog.
teams / team(teamId) / player(playerId) / players(filter) / fixtures(week) GUID ↔ name reverse-lookup via IRosterDirectory.

The outcomeIds arg on gameContext / seasonContext is the critical perf knob — it pushes filtering all the way to the ClickHouse outcome_id IN (…) predicate, so cross-region clients don't pay for the unfiltered ~480 MB-per-game payload.

For LLM-driven querying of this surface, see graphql-master-prompt.md.

Key cross-cutting decisions

Decision Why
ReplacingMergeTree(context_version) + SELECT FINAL on reads Operational: re-running the same (season, contextVersion) produces duplicate (scope, outcome) rows until background merges complete. FINAL forces the dedup at read time. Trade-off: historical-version queries at the same key become unreachable (collapsed by FINAL). The single test that exercised that contract is documented + skipped.
Sport-agnostic Storage + Query layers Lets us swap in another sport's accumulators / templates without touching CH or the evaluator. The AmericanFootball-specific code is isolated to LBS.Model.AmericanFootball.* and the RosterDirectory.
Chunked streaming + per-worker shared-nothing accumulators (write path) Memory bounded at O(chunkSize) regardless of total world count → unblocks 100 K. Lock-free per-worker state → ~85 % parallel efficiency at N=16, vs ~52 % for the shared+lock alternative.
In-C# pre-merge for game OC, but CH-side merge for season OC Asymmetric fan-outs. Game has 272 fixture keys → pre-merging in C# keeps staging rowcount at chunks × fixtures. Season has 1 key → no fan-out to amortise, so we let CH do the merge. Same arraySort-over-groupArray SQL pattern in both.
OutcomeRefInput mixes String (for type, participantId) with Enum (for timePeriod, context) Driven by which fields have closed-set values vs open-ended. type is open-ended (every sport's stat dictionary is different), so it stays a string and is membership-checked against the catalog at query canonicalisation. timePeriod and context are closed enums.
IRosterDirectory interface + in-memory facade The roster store will eventually live in Marten alongside the other read models. The interface is in place so the swap is one services.AddSingleton<IRosterDirectory, MartenRosterDirectory>() line later.

Operational shape

Component Where it runs Scaling
SimulationRunner Azure Container Apps Job (oc-exp-1k-p32, westus3, 32 vCPU / 64 GiB) Triggered manually. Runs once per data refresh. ~24 min for 100 K worlds × 285 games × full season + bracket.
QueryApi Azure Container App (oc-query-api, westus3, 2 vCPU / 4 GiB, scale 0..1, public ingress) Cold-starts in 5-8 s; per-query latency ~1-2 s end-to-end with the outcomeIds filter.
ClickHouse Cloud Production 3×16 tier (3 replicas × 16 vCPU × 64 GiB) in westus3. Same region as both runtimes. Holds canonical + staging tables; background merges run continuously.
ACR ocexperimentacr.azurecr.io. Two images: simulation-runner, query-api. GitHub Actions builds + pushes on main; az acr build from a branch for ad-hoc deploys.