Skip to content

D2 sub-design — Derivation build pipeline

Status: Implemented on branch LBS-1183 (PR #1389). OutcomeDefinition.Definition is a string? carrying postfix-template syntax (pipe-separated tokens with {slotName} placeholders). ExpandDerived (src/OutcomeContext/LBS.OutcomeContext.Query/Expressions/ExpressionCanonical.cs) recursively expands derived outcomes down to raw stored counters; cycles are caught with a depth limit. The body below is preserved as the as-designed reference. Reads alongside: gap analysis, build plan implementation status.

D2 lands a postfix-template Definition field on storage's OutcomeDefinition. This sub-design covers how derivations get authored, validated, canonicalised to postfix, and persisted into the catalog. Until this is settled, Phase 2 cannot land cleanly.

What's settled

From the gap analysis D2 resolution:

  • OutcomeDefinition.Definition is string? (postfix template). Empty / null = raw outcome.
  • The template carries named placeholders, e.g. TOTAL_TDS_GAME_{participantId} 1 GTE.
  • The query layer's ExpressionCanonical.ExpandDerived consumes templates directly: substitute placeholders with consumer leaf parameters, then evaluate as ordinary postfix.
  • Authors may write derivations in infix; a build step canonicalises to postfix before populating storage.

Open questions

Q1 — Authoring source of truth

Where do template authors edit derivations?

Option Description Pros Cons
A derived-outcomes.yaml checked into lbs.foundry/ Domain-author friendly; YAML is widely tooled; reviewable diffs Needs a build step + parser; YAML schema versioning
B Code-generated from a small DSL Strongly typed; refactor-safe New tooling surface; team has to learn it
C Hand-written postfix in the catalog table Lowest tooling overhead; no build step Easy to write wrong postfix; no infix authoring path; TOTAL_TDS_GAME_{p} 2 GTE is fine but anything cross-participant is hostile

Recommendation: A. YAML keeps derivations under code review, lets non-engineering domain authors edit them, and the build-step infix-to-postfix canonicaliser is a one-time investment that benefits every future derivation.

Sketch.

# derived-outcomes.yaml - template registry
- id: ANYTIME_TD_GAME_{participantId}
  category: TDOrdinal
  valueType: Boolean
  definition: TOTAL_TDS_GAME_{participantId} >= 1
- id: TOP_SCORER_SEASON_{participantId}
  category: SeasonRanking
  valueType: Boolean
  definition: TOTAL_TDS_SEASON_{participantId} > TOTAL_TDS_SEASON_{otherParticipantId}

The build step: parse definition as infix → typecheck → canonicalise → emit postfix → upsert into storage's catalog table.

Q2 — Postfix template grammar

How are placeholders represented in the postfix string?

Recommendation: {slotName} as a literal token in the postfix stream. The tokeniser splits on | (existing prototype convention) and recognises {...} as a slot.

TOTAL_TDS_GAME_{participantId}|1|GTE

Slots are emitted verbatim by the build-step canonicaliser; substitution at expansion time replaces every {slotName} with the corresponding value from the consumer's leaf reference.

Open: how do slots referencing different roles distinguish themselves at substitution? Two participants in one expression need separate slot names — {participantId} and {otherParticipantId} — and the catalog must declare what each slot binds to. This bleeds into D3's slot grammar (§5.2 of the build plan). Coordinate with D3 sub-design: D2's template registry references slot names; D3's slot grammar defines what those names mean.

Q3 — Build-time validation

Does the build step verify each definition before persisting it?

Recommendation: yes, three checks:

  1. Postfix structural well-formedness. The token stream must reduce to a single value via the existing PostfixEvaluator's stack-walk semantics. Fails fast if the canonicaliser produced a malformed stream (which would be a canonicaliser bug, but cheap to verify).
  2. Leaf resolution. Every non-slot leaf must be a known raw outcome template in the registry. Catches typos like TTOTAL_TDS_GAME_{p} at build time rather than at the consumer's first query.
  3. Type-check. Run the canonical tree through TypeChecker with DefaultTypeRules.All. Errors block the build. Warnings are surfaced as build warnings - intentional in development, suppressible per-rule via derived-outcomes.yaml annotation.

The alternative (no build-time validation) silently produces UNKNOWN_OUTCOME_ID or TYPE_MISMATCH at the consumer's first query - real cost in time-to-discover for the team writing derivations.

Q4 — Cross-template references and cycle detection

A derivation may reference another derivation. Example: KC_LIKELY_WINS = KC_WIN AND KC_FAVORED. The runtime ExpressionCanonical.ExpandDerived already recurses up to MaxDepth=20 and detects cycles. Should the build step pre-compute the cycle check?

Recommendation: yes. Build a directed graph of template_id -> referenced_template_ids and run a topological sort. Cycles fail the build with a "cycle: A -> B -> A" message. Bounded recursion at runtime is a defensive belt-and-braces but the build-time check gives a better error.

Q5 — Catalog versioning

Spec §11.8.6 #3 flags this: when a derivation changes, the cache invalidation surface needs a catalogVersion dimension. The §11.5 cache key would become (expression_hash, worldSetRef, catalogVersion).

Recommendation: introduce a single monotonic catalogVersion integer. The build step bumps it whenever any template is added, modified, or removed. The version is part of the catalog read and rides on every EvaluationResult. Runtime cache keyed on the triple naturally invalidates when the catalog changes.

Open: is catalogVersion global across all sports, or scoped per sport? Initially per-sport is enough (American Football has its own catalog). When a second sport materialises this becomes a real question.

Q6 — Where does the persisted template catalog live?

This is also open in D3 (per the build plan §6.4 Question 4) - tied here because D2 needs to know where to write the postfix templates.

Recommendation (defaults until D3 settles it): generated config file (derived-outcomes.yaml + raw-outcomes.yaml) loaded into memory at startup. ClickHouse persistence is a future migration once authoring tooling exists.

Phase 2 implementation notes

Once this sub-design is signed off, Phase 2 (D2) should:

  1. Add string? Definition to LBS.OutcomeContext.Contracts/OutcomeDefinition.cs. Update the Mermaid ERD (per claude.md non-negotiables).
  2. Update the existing AmericanFootballOutcomeCatalogue call sites to pass Definition: null everywhere (no derived templates yet).
  3. Migrate OutcomeCatalogEntry (Query side, Phase 0c interim) to consume the new OutcomeDefinition directly, with a postfix-template parser that converts Definition string into the existing token stream.
  4. Rewrite ExpressionCanonical.ExpandDerived to consume postfix templates with slot substitution. The infix-tree path goes away on the runtime side.
  5. Stand up the build-step tool that reads derived-outcomes.yaml, validates per Q3, and emits canonicalised postfix into the catalog source.
  6. Migrate the existing read-side derivations from imperative C# (§"D2 Resolution - Consequences" in gap analysis lists candidates: ANYTIME_TD, TWO_PLUS_TDS, THREE_PLUS_TDS, TOP_SCORER_SEASON).
  7. Inventory pass: identify which existing derivations are escape-hatch sim-produced booleans (per §11.8.6) that cannot migrate cleanly.

References