Skip to content

D4 sub-design — IOutcomeContextStore method shape

Status: Implemented on branch LBS-1183 (PR #1389). Live: IOutcomeContextStore (LBS.OutcomeContext.Storage), InMemoryOutcomeContextStore, and ClickHouseOutcomeContextStore (LBS.OutcomeContext.Storage/ClickHouse). ContextRepository is async and consumes the typed interface; the GraphQL query layer never sees ClickHouse directly. The body below is preserved as the as-designed reference. Reads alongside: gap analysis, build plan implementation status.

D4 extracts the typed read interface storage exposes to the query layer. This sub-design fills in the method surface, picking up where the gap analysis left off.

What's settled

From the gap analysis D4 resolution:

  • A typed IOutcomeContextStore interface lives between storage and query. The query layer never sees ClickHouse.
  • ContextRepository's real consumption pattern drives the method shape: single-context fetch, multi-context batch fetch, catalog access.
  • The interface is designed as part of the broader storage-layer rework - the timing aligns so the contract can be set without retrofitting.

The existing IStorageBackend (in StorageExperiment) already has a TODO (lines 17-22) explicitly anticipating this split:

IStorageBackend (schema, metrics, OPTIMIZE)
IOutcomeContextStore : IStorageBackend (OC read/write/streaming)
IPlayByPlayStore : IStorageBackend (PBP read/write)
IAnalyticalQueryable : IStorageBackend (QueryAggregationAsync)

D4 lives in the OC interface; the split is the work that delivers it.

Open questions

Q1 — Run-version semantics when omitted

When the caller doesn't specify contextVersion, what gets returned? Latest? Latest-before-cutoff? All?

Recommendation: latest, made explicit on the interface as policy not implementation accident. ReplacingMergeTree(context_version) already resolves to MAX server-side. Documenting the policy on the method signature means it's a contract, not an accident of merge state. A future cutoff parameter (latest-before-X) is an additive change.

Sketch:

/// <summary>
/// Loads a single context. When <paramref name="contextVersion"/> is null, returns the
/// latest version available for the scope (semantics: MAX(context_version) on the
/// underlying MergeTree).
/// </summary>
Task<IOutcomeContext?> GetByScopeIdAsync(
    string scopeId,
    int? contextVersion = null,
    CancellationToken ct = default);

Q2 — Bulk read shape

A seasonContext(seasonId, gameIds:) query loads 1 + N contexts. N=16 for an NFL regular-season-only week, N=17+ for NFL playoffs.

Recommendation: explicit GetManyByScopeIdAsync(IReadOnlyList<string> scopeIds, ...) on the interface. Implementation does one SELECT ... WHERE scope_id IN (...) per scope-type table (one for game OC, one for season OC) and materialises the full set. Avoids N+1 entirely.

Worth noting: scope IDs are heterogeneous (a season ID and game IDs). The implementation needs to fan out by scope type internally - the interface caller doesn't care.

Task<IReadOnlyList<IOutcomeContext>> GetManyByScopeIdAsync(
    IReadOnlyList<string> scopeIds,
    int? contextVersion = null,
    CancellationToken ct = default);

Q3 — Projection (subset of outcome IDs)

A query that only references three outcome IDs shouldn't fetch all ~500 rows. The existing experiment already has ReadSelectiveGameOutcomeContextAsync(gameId, IReadOnlyList<string> outcomeIds, ...) - validated at scale, basket reads measured at 8-20 ms regardless of world count.

Recommendation: lift it to the interface as an optional parameter on GetByScopeIdAsync and GetManyByScopeIdAsync:

Task<IOutcomeContext?> GetByScopeIdAsync(
    string scopeId,
    IReadOnlyList<string>? outcomeIds = null,
    int? contextVersion = null,
    CancellationToken ct = default);

When outcomeIds is null, fetch all rows. When non-null, fetch only that subset. The returned IOutcomeContext's OutcomeIds reflects the projection (not the full universe of stored outcomes).

The query layer can decide: discovery queries pass null; expression evaluation walks the canonical tree's leaves first to collect the leaf set, then passes it as the projection.

Open: does projection apply to the bulk path? Yes - GetManyByScopeIdAsync should accept the same outcomeIds parameter, applied per scope. A seasonContext(seasonId, gameIds:) query referencing 5 outcomes loads 1 + N contexts × 5 rows each, not 1 + N × 500.

Q4 — One store or two (or three)?

Does the same IOutcomeContextStore serve the contexts and the catalogues, or do they split?

Recommendation: three interfaces, all on the Storage assembly.

IOutcomeContextStore       - per-context reads (this sub-design)
IOutcomeTemplateCatalog    - template lookup (D3 sub-design)
ISimulationCatalogStore    - run-scoped catalog read (D3 sub-design)

Coupled at composition time, not at the interface level. The Query project's ContextRepository takes all three as constructor dependencies. Storage internally can implement all three on one class or fan out - that's an implementation detail.

Why split rather than collapse: per Q3, IOutcomeContextStore is the basket-read surface (latency-sensitive, scoped by scope ID). Catalog reads are different patterns - templates are roster-agnostic and rarely change; the simulation catalog is per-run document reads. Different SLAs, different cache strategies.

Final method surface (recommendation)

public interface IOutcomeContextStore
{
    /// <summary>
    /// Loads a single context. When <paramref name="contextVersion"/> is null, returns the
    /// latest version available for the scope.
    /// </summary>
    Task<IOutcomeContext?> GetByScopeIdAsync(
        string scopeId,
        IReadOnlyList<string>? outcomeIds = null,
        int? contextVersion = null,
        CancellationToken ct = default);

    /// <summary>
    /// Loads multiple contexts in a single round-trip per scope-type.
    /// </summary>
    Task<IReadOnlyList<IOutcomeContext>> GetManyByScopeIdAsync(
        IReadOnlyList<string> scopeIds,
        IReadOnlyList<string>? outcomeIds = null,
        int? contextVersion = null,
        CancellationToken ct = default);
}

public interface IOutcomeTemplateCatalog
{
    Task<IReadOnlyList<OutcomeTemplate>> GetAllAsync(CancellationToken ct = default);

    Task<OutcomeTemplate?> TryGetAsync(string outcomeIdTemplate, CancellationToken ct = default);
}

public interface ISimulationCatalogStore
{
    Task<SimulationCatalog?> GetAsync(
        string seasonId,
        int contextVersion,
        CancellationToken ct = default);
}

Response types: IOutcomeContext lives in LBS.OutcomeContext.Query; OutcomeTemplate and SimulationCatalog per D3 land in LBS.OutcomeContext.Contracts (sport-agnostic). The Query project's ContextRepository consumes the three interfaces as constructor dependencies.

Phase 4 implementation notes

Once this sub-design and the storage rework are in flight, Phase 4 (D4) should:

  1. Split IStorageBackend per the existing TODO comment in IStorageBackend.cs lines 17-22. The OC-read methods move to IOutcomeContextStore; PBP read/write to IPlayByPlayStore; analytical query to IAnalyticalQueryable. Schema/metrics methods stay on IStorageBackend.
  2. Implement IOutcomeContextStore in LBS.OutcomeContext.Storage (sport-agnostic; already houses IBlobSink after Phase 0r-2). The implementation lifts ClickHouseBackend.ReadGameOutcomeContextAsync etc. into the new interface.
  3. Implement projection in the bulk path if the experiment doesn't already do this (the existing ReadSelectiveGameOutcomeContextAsync is single-context only).
  4. Wire ContextRepository in the Query project to depend on IOutcomeContextStore instead of building in-memory fixtures. Phase 0e introduces the in-memory ContextRepository; Phase 4 swaps the constructor.
  5. Migrate StorageExperiment.Tests to test against the new interface boundary. Some tests are inherently storage-implementation tests (Testcontainers ClickHouse round-trips) and stay there; others become pure-interface tests in Query.Tests.
  6. Retire the experiment harness's read paths as the canonical surface. The experiment Exe + experiments stay around as a benchmark/diagnostic harness, but the production code path lives in Storage.

Cross-references

  • D2 sub-design Q5 (catalog versioning): the catalogVersion integer rides on IOutcomeTemplateCatalog reads. The cache key triple (expression_hash, worldSetRef, catalogVersion) requires this dimension.
  • D3 sub-design Q3 (identification): (seasonId, contextVersion) is the pre-sim catalog key. ISimulationCatalogStore.GetAsync takes that pair.
  • Storage-layer rework (build plan §3): D4 is one facet of the rework. Signing this off in isolation requires the rework to be in flight.

References

  • Gap analysis §"D4 - Read API" + §"D4 follow-up"
  • Convergence reference §3.3 (the read interface)
  • Build plan §2 Phase 4 + §3 + §5.3
  • IStorageBackend.cs lines 17-22 (the existing split TODO)
  • Spec §10.1 (response shape - drives the method shape from the consumer side)