Skip to content

Outcome Context Platform — Strategic Roadmap

Current strategic objective — Block 2

Validate platform architecture with real model working end to end through the season.

Timeframe: Block 2 (18/5 - 26/6)

Strategic hypothesis

If our platform can query and produce game and season outcomes within latency targets, and triggers correctly propagate outcome changes, then our architecture is validated and ready to support the full breadth of product goals through the season.

Success metrics — primary

Game-scoped query - Single game, outcomes only: < 1 second - Single game with GameState filter: < 1 second

Season-scoped query - Full season, outcomes only: < 60 seconds - Partial season with GameState filter: in scope (latency target set once baseline measured)

Assumptions

  • Powered by Foundry entities
  • GameState can trigger an outcome change
  • A model input can trigger an outcome change
  • Prototype can connect to this when complete
  • Version 1 of the NFL model is integrated
  • Consistency is not within scope this block

Core problems being solved

  • Platform architecture is unvalidated against real model data
  • No proven end-to-end pipeline from event trigger to outcome
  • GameState integration is untested

Explicitly out of scope (future blocks)

  • Performance optimisation — phase 8 of the long-horizon roadmap (caching + scaling for production load). Block 2's latency targets apply to the architecture-validation shape, not to production load.
  • Production hardening — phase 9 (auth, alerting, runbooks, IaC).
  • Consistency guarantees — deliberately deferred so this block can land the validation shape without paying for stronger guarantees we may not need everywhere.

The product

A probabilistic sports-prediction service. Customer asks a question ("what's the chance Patrick Mahomes throws 30+ TDs and the Chiefs win the Super Bowl?"); the platform returns a calibrated probability backed by tens of thousands of simulated seasons.

Where we are today

  • Working end-to-end pipeline. Simulator → outcome accumulators → storage → GraphQL query surface. Hosted in Azure West US 3, queryable from anywhere.
  • Validated at scale. 100,000 parallel simulations of a full NFL regular season + playoff bracket complete in ~24 minutes; targeted queries return in 1-2 seconds.
  • Working prototype, not yet a production service. Built on a placeholder simulator with a snapshot of 2025 NFL data. The shape is right; the contents need swapping out.

Long-horizon path to production

The full path to production is nine work phases, sequenced so each is unblocked by what comes before. The Block 2 objective above is a contained slice of this — landing the validation shape before perf and hardening absorb attention.

# Phase Business outcome Block 2
1 Bitvector evaluation engine Boolean queries (most common shape) get an order of magnitude faster
2 Historical data ingest Past actuals — required for any "how accurate is this?" claim
3 Prediction-vs-actual prototype Settles how customers compare prediction to reality
4 Live data ingest Roster, injury, score feeds keep simulations current partial (GameState trigger path)
5 Real model integration Replaces the placeholder with the data-science team's model. Sport-agnostic generalisation (proving the architecture against a second sport) sits inside this phase — it's the catalyst for catching any contract gaps that hide while there's only one sport in scope. in scope (NFL v1)
6 Progressive season simulation Mid-season prediction — the core product use case partial (validation shape)
7 Foundation + accuracy validation Proves predictions are stable and calibrated against reality
8 Performance Caching + scaling for production load deferred
9 Production hardening Auth, alerting, runbooks, infrastructure-as-code deferred

Sequencing

  • Phases 1 and 2 can run in parallel — independent foundational tracks.
  • Phases 3 and 4 follow phase 2 — both depend on knowing the shape of historical data.
  • Phase 5 is dependency-gated on the data-science team's model. It slots in when ready, alongside any of phases 1-4.
  • Phases 6 and 7 require a working real model before they're meaningful — accuracy is undefined against a placeholder simulator.
  • Phases 8 and 9 follow once correctness is locked in. Hardening interleaves with phase 5 onwards.

Critical dependencies

Dependency Owner Risk if delayed
Data-science team's predictive model Data science Phase 5 slips; phases 6 + 7 can't be validated
Historical data scope decision (one season vs five vs all-time) Engineering + product Phase 2 scope cannot be sized
Production Azure environment (separate from sandbox subscription) Platform / ops Phase 9 hardening can't fully complete
Live data feed provider(s) Engineering + commercial Phase 4 (and phase 6 by extension) blocked

What "production-ready" means

  • A customer asks any well-formed probabilistic question and gets a fast, calibrated answer.
  • Predictions can be measured against reality; the calibration story is documented and defensible.
  • The system runs unattended; alerts trigger on real failures, not noise.
  • Data is versioned; consumers can pin to known-stable snapshots without surprise updates.

Known gaps in this plan

Items the planning conversation has surfaced but the nine phases above don't yet name explicitly. Worth pinning during per-phase detail-out so they don't fall through the cracks.

  • Customer / consumer integration shape. "Production-ready" assumes a customer is talking to the service, but how — direct GraphQL, through a wrapper API, via SDK in TypeScript / Python, federated into a parent schema — is not yet decided. Needs a call before phase 9 hardening locks the public surface.
  • Query cost / fair-use concept. A 100K-world unfiltered query shifts ~480 MB through the cluster; deep AND-trees and large evaluate expressions can pin compute. Phase 9 lists "rate limiting" as a one-liner, but the specific abuse shapes this surface enables — fan- out queries, deep expressions, expensive evaluations — need a query- cost / query-budget design before external exposure.
  • Outcome template / catalog evolution. Three known coverage gaps in the registry are tracked as a skipped test today; live data feeds and new sports will introduce more outcome shapes. Schema-migration story for the canonical OC tables (adding outcomes without breaking consumers) needs a discipline, not a phase.
  • Determinism / RNG seed strategy. Reproducible runs given the same input snapshot + model version + seed are a precondition for the accuracy claims phase 7 will make. Lives logically inside phase 5 but is worth flagging — it's not currently called out as a deliverable.
  • GraphQL schema-versioning policy. Once external consumers integrate, breaking changes become expensive. Listed under phase 9 but worth documenting the deprecation discipline early so phase 5+ changes don't accidentally lock us out of needed changes later.
  • Backup / disaster recovery for ClickHouse Cloud. Has both a procurement and a config dimension; deserves its own item rather than one bullet inside phase 9.
  • Cost-per-query observability. Distinct metric and audience from cost-per-run (commercial vs ops). Phase 7 / 8 question, not phase 9.

Companion planning documents

This roadmap is the executive summary. The supporting detail lives in three companion docs:

  • Per-phase breakdown — scope, definition of done, dependencies, rough sizing, and linked Linear tickets for each of the nine phases. Plus a "what can start this week" pointer at the bottom.
  • Cross-cutting design decisions — the nine loadbearing choices every phase inherits from (caching abstraction, two meanings of "historical", "accuracy needs a real model", versioning, sport-agnostic boundary, participant identity, query budget over timeout, when the placeholder retires, when multi-region is decided).
  • Sequencing rationale (ADR-lite) — why the phases run in this order, what alternatives we ruled out, what the phase-3 prototype is meant to settle, and the triggers that would cause us to re-sequence.

Surface those when the planning conversation needs them; they don't need to be carried through every executive review.