OutcomeContext — Sequencing Rationale (ADR-lite)¶
Companion to the executive roadmap, the per-phase breakdown, and the cross-cutting design decisions.
The roadmap lays out nine phases in a specific order. This doc captures why that order is right (or, more accurately, the best shape we have today), what we considered and ruled out, and what the phase-3 prototype is expected to settle.
This is an ADR-lite rather than a full ADR because the sequencing decision is reversible at any phase boundary — if signal comes in that changes our thinking, we re-sequence. The numbered phases are a plan, not a contract.
Sequencing principle¶
Earlier phases must unblock later phases. Earlier phases must not depend on later phases. Phases that share no dependency should be able to run in parallel.
Concretely: - Phase N can only begin if every phase it strictly depends on has landed. - Phase N's definition of done must give later phases something they can rely on, not just promise to deliver later. - Two phases that don't share a dependency run in parallel — there is no value in serialising work for the sake of a tidy plan.
Why phase 1 and phase 2 start first, in parallel¶
Phase 1 (bitvector engine) is started first because:
- It has no external dependency. Engineering can begin today.
- The existing benchmarks and placeholder simulator are enough to validate the perf claim — we don't need a real model for the bitvector engine to demonstrate ≥5× on the canonical boolean shape.
- It unblocks phase 8's ≤200ms latency claim. Without bitvector evaluation, the larger basket sizes don't fit in the budget even in-region.
Phase 2 (historical ingest) is started in parallel because:
- It has its own external dependency — the historical scope decision — which is product + engineering, not engineering alone. That decision blocks ingest work, so we shouldn't wait until phase 1 is done to ask the question.
- Historical scope size is the biggest sizing input for ClickHouse storage cost (decision §2 of the design-decisions doc). The earlier this is settled, the earlier infra sizing becomes credible.
- Phase 3, 4, and 7 all depend on phase 2 in different ways. Pulling it forward de-risks more of the plan than any other choice.
Alternatives ruled out:
- Phase 1 only first, then phase 2. Rejected: serialises engineering on a decision that's not engineering-blocked. Two independent tracks is strictly better.
- Phase 2 first, phase 1 second. Rejected: phase 1 ships value faster (perf wins land in days; ingest takes weeks) and unblocks phase 8.
- Skip phase 1, lean on cache. Rejected: cache hit rate against unique customer questions is unknown. Bitvector engine helps every query, not just repeats.
Why phases 3 and 4 follow phase 2¶
Both phase 3 (prediction-vs-actual prototype) and phase 4 (live data ingest) need historical data shaped to match the simulated shape. Without phase 2 having landed at least one season's worth:
- Phase 3 has nothing real to compare prediction against. It would become a prototype-of-a-prototype.
- Phase 4 has no historical baseline to anchor "this roster change matters" / "this score is consistent with our forecast" against.
Phase 3 specifically follows phase 2 because the prototype's job is to settle the prediction-vs-actual contract. It needs both sides of the comparison.
Phase 4 specifically follows phase 2 because:
- The same identity model used for historical ingest is needed for live roster sync.
- The data-version semantics that phase 4 needs (LBS-1364) are partially scoped during phase 2's table-shape decisions.
Alternatives ruled out:
- Phase 3 before phase 2. Rejected: the prototype's whole point is to validate API shape against a real comparison. Without phase 2, the prototype is fiction.
- Phase 4 in parallel with phase 2. Discussed; rejected because the table-shape and identity-model decisions phase 2 surfaces are inputs to phase 4. Doing them in parallel risks two workstreams making conflicting decisions.
Why phase 5 is dependency-gated, not numbered earlier¶
Phase 5 (real model integration) is sequenced after phases 1–4 in the roadmap but is genuinely gated on the DS team's readiness rather than on prior engineering phases.
If the DS team's model becomes available tomorrow: - Phase 5 can slot in alongside whatever else is in flight. - The pseudo-simulator decision (§8 of design-decisions doc) means swapping the model in doesn't break running infrastructure.
If the DS team takes another six months: - Phases 1–4 still deliver value (perf, historical data, prototype, live feed) — none of them are wasted. - Phase 7 specifically would be wasted (accuracy is undefined against a placeholder; design-decisions §3), so it stays paused.
This is why the roadmap text calls phase 5 "dependency-gated" rather than placing it strictly between phases 4 and 6.
Alternatives ruled out:
- Numbering phase 5 as phase 1 because everything else depends on the model. Rejected: not everything else does. Phase 1 (bitvector), phase 2 (historical), phase 3 (prototype), and phase 4 (live feed) all deliver against the placeholder. The real model is the catalyst for accuracy claims, not the catalyst for the platform existing.
- Numbering phase 5 last. Rejected: phase 6 (progressive simulation) and phase 7 (accuracy validation) are dependent on it; placing 5 after them mis-states the dependency.
Why phases 6 and 7 require phase 5¶
Phase 6 (progressive season simulation) and phase 7 (foundation + accuracy validation) both depend on a real model — but for different reasons.
- Phase 6 is "predict from week N onwards conditional on actuals through week N-1". Conditional prediction with a placeholder produces conditional nonsense. The whole point of the phase is the conditioning behaviour of the model — which the placeholder doesn't have.
- Phase 7's accuracy validation is structurally meaningless against the placeholder (design-decisions §3). The observability portion of phase 7 could be done earlier, but is bundled because all three phase-7 sub-streams (observability, versioning, template registry) benefit from being delivered together — they're the "the platform is grown-up now" milestone.
Alternatives ruled out:
- Phase 7 observability before phase 5. Considered. Rejected because observability without something meaningful to observe generates noise, not signal. Once phase 5 lands, observability starts answering real questions immediately.
- Phase 6 before phase 5 using the placeholder. Rejected: produces numbers no one believes, and bakes in conditioning behaviour that the real model will replace anyway.
Why phases 8 and 9 come last¶
Phase 8 (performance) comes after correctness is locked because optimising the wrong thing is the most expensive mistake on the roadmap. Specifically:
- Caching against a placeholder simulator caches placeholder data. Doable, but every cache decision (what to cache, for how long, with what invalidation) is informed by what's actually hot — knowable only after phase 7's observability is live.
- Pre-computed bitvectors target hot outcomes. "Hot" is observed, not assumed.
Phase 9 (production hardening) comes after phase 8 because:
- Auth + rate limiting + IaC + DR + runbooks are all about the system being ready to be public. Public means a defined SLA, which means phase 8's perf work needs to have landed.
- Schema-versioning policy locks in once external consumers integrate. Doing this before the public surface stabilises (which happens late in phase 8 with query-budget rules) locks in the wrong contract.
Alternatives ruled out:
- Hardening earlier. Considered for phase 1/2 era. Rejected because hardening tasks (auth, IaC, DR) all assume a stable surface to harden against, and the surface keeps changing through phases 3–8.
- Perf earlier. The bitvector work is perf-earlier — that's phase 1. The phase 8 perf work specifically is cache + tuning, which benefits from observability data.
What the phase-3 prototype is expected to settle¶
The prototype is deliberately scoped narrow because it exists to settle one decision: the API contract for prediction-vs-actual.
Specific questions the prototype must answer:
- Shape of the comparison. Does
evaluatereturn a single numeric beside the prediction? Does the comparison live in a separatehistoricalContextquery? Is it a third top-levelcompare(prediction, actual)resolver? - Granularity. Per-outcome (prediction vs single actual scalar)? Per-game (full game prediction vs full game actual)? Per-season?
- Calibration surface. Is calibration data part of the same query response, or a separate "model performance" surface that customers query independently?
- Edge cases. What does the API say when the actual hasn't happened yet? When prediction was made after the actual? When prediction was made multiple times (which version do we compare to)?
The prototype is allowed to be ugly — but it has to produce a written decision on each of the four questions above. That decision becomes the input to the eventual production design in phase 7.
Why a prototype and not a design doc? Because the comparison surface is exactly the kind of thing where reading the doc is unconvincing and trying it sells (or kills) it in 15 minutes.
What would cause us to re-sequence¶
Trigger points where we'd revisit the order:
| Trigger | What changes |
|---|---|
| DS team's model becomes production-ready before phase 4 | Phase 5 jumps the queue — slots in alongside phase 3 or 4 |
| Customer integration demands a specific consumer pattern now | Phase 9's consumer-shape decision pulls forward |
| Cross-region client volume turns out to be material | Phase 8's multi-region work moves earlier inside phase 8 |
| Historical scope decision can't be made within 2 weeks | Phase 2 pauses; phase 1 + phase 4-prep continue |
| Phase-3 prototype reveals the comparison surface needs a different storage shape | Phase 2's storage-shape decision is revisited before mass ingest |
The plan is not precious. The principle (earlier-unblocks-later) is.