OutcomeContext — Cross-Cutting Design Decisions¶
Companion to the executive roadmap and the per-phase breakdown.
The roadmap names nine phases. Several technical and product choices cut across all nine and won't fit cleanly inside any single phase's scope. This doc captures those choices, the reasoning, and what they imply for downstream work.
If a decision in here turns out to be wrong, revisit it deliberately and update this doc — these are the loadbearing assumptions every phase plan inherits from.
1. Caching abstraction is FusionCache, not Redis directly¶
Decision. The expression-result cache (phase 8, LBS-1362) is built on FusionCache with an in-process L1, with the option to plug in Redis as L2 later.
Reasoning.
- FusionCache solves the cache-stampede problem (request coalescing,
factory deduplication) that hand-rolled IMemoryCache doesn't,
and isn't tied to any specific backing store.
- L1-only is enough for the foreseeable QueryApi deployment shape
(scale 0..1 today; even at 2..4 replicas the cache miss cost on a
cold replica is small relative to ClickHouse round-trip).
- Adding Redis later is a configuration change, not a re-architect —
the FusionCache API is the same with or without an L2 provider.
Implications.
- Don't introduce a hard Redis dependency now.
- Cache keys are content-addressed by expressionHash (already
produced by the evaluator); the key contract is stable across
L1-only and L1+L2 deployments.
- Cache invalidation is data-version-driven, not time-driven — see
decision §3 below.
2. "Historical" has two distinct meanings¶
Decision. When the roadmap says "historical data ingest" (phase 2), two unrelated workstreams hide behind the same word. Treat them as two designs, not one.
The two meanings.
| Term used here | What it actually is | When it lands |
|---|---|---|
| Historical facts | Real-world past outcomes — actual final scores, actual player stat lines, actual standings — from a feed like ESPN. The thing customers compare predictions against. | Phase 2 |
| Historical predictions | Past simulation runs preserved as a versioned snapshot — "what did our model predict for week 4 as of week 3?" | Phase 7 (via the data-version / epoch semantics in LBS-1364) |
Why this matters. - They have entirely different storage shapes (historical facts are scalar values per game; historical predictions are 100K-element arrays). - They have entirely different ingest cadences (historical facts back-fill once, then trickle; historical predictions accumulate every time the simulator runs). - The query that asks "how well did we predict?" needs both, so the API needs to expose both without the two getting tangled.
Implications. - Phase 2 only delivers historical facts. Historical predictions are a phase 7 deliverable. - The data-version semantics in LBS-1364 are the unblocking design for historical predictions — not a separate thing.
3. "Accuracy" is undefined against a placeholder simulator¶
Decision. Accuracy validation (phase 7) is structurally gated on real-model integration (phase 5). Running calibration metrics against the current pseudo-simulator produces numbers, but those numbers mean nothing.
Reasoning.
The current LBS.Model.AmericanFootball(.Simulation) is a placeholder
chosen to be structurally representative — same world-indexed
probability shape, same per-game / per-season outcome arrays — not
predictively meaningful. Its calibration is whatever its hardcoded
distributions happen to produce. There's no underlying signal to be
right or wrong about.
Implications. - Don't burn cycles on calibration metric tooling before phase 5 lands. The tooling will be the easy part once a real model exists. - Don't position the platform externally on calibration claims until phase 5 + phase 7 are both done. Soft language until then ("the platform produces calibrated probabilities" reads as a system claim, not a model claim — but the model is what determines whether the claim holds). - Year-over-year stability claims need at least two historical seasons in phase 2, not one.
4. Versioning is the contract — SELECT FINAL is not¶
Decision. Data-version / epoch semantics (LBS-1364)
become the production read contract. The current SELECT FINAL
pattern in ClickHouseOutcomeContextStore is a temporary measure that
gets retired in phase 7.
Why versioning, not FINAL.
- SELECT FINAL collapses to the highest context_version per row.
That means you cannot ask "what did we predict last week?" — only
"what do we predict now?".
- Concurrent re-simulation against a live cluster currently overlaps
with reads in undefined ways. Versioning makes cutover atomic from
the reader's perspective.
- Cache invalidation needs a contract: "the live epoch changed from
X to Y" is a clean event; "FINAL might be returning different rows
now because background merges finished" is not.
Implications.
- RunId (per-run identity: simulator version + model version + input
snapshot version + seed) becomes a first-class column.
- Epoch (pointer to the currently-live RunId) becomes a small
control table — likely Marten, not ClickHouse.
- Reads default to the live epoch; "as-of" reads against historical
epochs are a planned, audited capability.
- The skipped HistoricalVersionsTest gets re-enabled in phase 7.
5. Sport-agnosticism is a contract boundary, not a runtime check¶
Decision. The Storage and Query layers stay sport-agnostic by ensuring nothing inside them references AmericanFootball-specific types, names, or assumptions. There's no runtime sport-detection logic — sport-specificity ends at the Accumulation boundary.
Reasoning.
- The work in docs/outcome-context/sub-designs/d3-pre-simulation-catalog.md
established that templates + materialisation push sport-specific
outcome generation into the model layer. Below that line, the
storage tables, query resolvers, and expression evaluator only
know about outcomeId strings + Array(Int32) payloads.
- Phase 5's sport-agnostic generalisation work doesn't need to
rewrite the contract — it needs to exercise the contract by
adding a second sport and checking nothing in Storage / Query
needed changes.
Implications.
- Any PR that adds an AmericanFootball-specific type to
LBS.OutcomeContext.Storage or LBS.OutcomeContext.Query is a
review red flag.
- The catalog format already supports per-sport materialisation; new
sports add a new template registry, not a new storage path.
- Cohort / participant identity (decision §6)
is the only cross-sport coupling — handled at the model layer.
6. Participant identity is stable across seasons and sports¶
Decision. A participantId is a stable GUID for the lifetime of
the platform. Same Mahomes in 2023 and 2025. Same team across rebrands
(e.g. Washington → Commanders).
Reasoning. - Phase 2's historical ingest spans multiple seasons. Phase 5's multi-sport work spans multiple roster registries. - Without stable identity, "Mahomes' career passing TDs vs all active QBs" needs joining across IDs, which surfaces as a query-time join and breaks the simple per-world array model.
Implications.
- Roster ingest in phase 2 and phase 4 has to resolve identity, not
generate fresh GUIDs per season.
- Rebrand / trade events need their own data model (player A on team
X in 2024, team Y in 2025) — but the participantId stays the
same.
- Out of scope for now (no cross-season questions in current demos),
but this is the cliff every multi-season query falls off if we
don't establish it during phase 2.
7. Query budget, not query timeout, is the abuse defence¶
Decision. Production protection against abusive queries is expressed as query budget (basket size limit, expression depth limit, evaluator cycle cap), not as a wall-clock timeout. Timeouts are a backstop, not the contract.
Reasoning. - A wall-clock timeout under load is non-deterministic: the same query passes at 2am and fails at 8pm. That's not a contract you can build a customer integration on. - A query budget rejects predictably: "your basket has 47 outcomes; the limit is 30" is a clear error that the caller can act on. - The bitvector engine (phase 1) and result cache (phase 8) lower the cost of typical queries; query budgets keep the worst-case ceiling bounded.
Implications. - Phase 8 needs to land budget rules before phase 9 hardening opens the public ingress. - Specific limits to be decided from real traffic shapes once observability (phase 7) is in place — but the design must allow them to be configured per-customer if commercial demands.
8. The pseudo-simulator stays in the repo until phase 5¶
Decision. Keep LBS.Model.AmericanFootball(.Simulation) running
in production until phase 5 swaps in the real DS model. Don't try
to retire it earlier.
Reasoning. - Phases 1–4 all need something producing OC data shaped like a real model would produce. Retiring the placeholder early starves every parallel workstream of test data. - The placeholder's structural correctness (right shape, right scale, right outcome catalog coverage) is what's load-bearing — its predictive value isn't, but no one needs that until phase 5. - It's not blocking anyone: the platform serves real queries against real-shape data today.
Implications. - Phase 1 (bitvector) is validated against the placeholder. - Phase 7's observability is wired against the placeholder. - Phase 8's perf benchmarks are baselined against the placeholder. - The day the real model lands, all of the above keep working — the thing that changes is what the numbers mean, not how they're produced.
9. Multi-region is a phase 8/9 decision, not a phase 1 architectural one¶
Decision. Production deployment stays single-region (westus3) until phase 8 / 9 has observability data showing cross-region client volume is meaningful. No multi-region architectural work in the intervening phases.
Reasoning. - ClickHouse Cloud is single-region today; co-locating the runner and QueryApi with the cluster is what keeps the write path fast and the in-region read path under the latency target. - Multi-region adds replica-lag semantics, cross-region cost, and a cache-coherency surface to a system that doesn't yet have evidence it needs them. - The roadmap already lists multi-region under phase 8 as conditional ("if cross-region client volume warrants").
Implications. - Cross-region clients (e.g. AU users hitting westus3 today) accept the WAN tax for now. Documented in ADR-009 §3.2. - Any architecture proposal for multi-region needs to be motivated by observability data, not assumption.
How to use this document¶
- Reading it: each decision has a Decision, Reasoning, and Implications block. Skim the Decision lines first.
- Changing a decision: don't quietly. Edit this doc with the new
decision, the new reasoning, and a
(was: …)note showing what was replaced. - Inheriting a decision: per-phase planning docs (and ADRs) should link back to the relevant decision here rather than re-justifying.