Skip to content

Canonical Entity Mapping — Presentation Outline

A 5–6 slide deck with a live demo in the middle, aimed at a mixed audience: execs, managers, end users, and developers. The structure leads with the problem in business terms, shows the solution working, then circles back to value, roadmap, and asks.

This is a working draft to take to coworkers — bullets are talking-point density, not final copy.


Slide 1 — "Today, every new data provider costs us weeks"

Audience focus: Execs, managers Key message: Every new provider was a weeks-of-dev tax — and worse, the mapping work happened too late and in the wrong place, so ops couldn't see it and data science couldn't trust it.

  • Each provider (FoxSports, StatsPerform, SuperCoach, NRL Web…) had its own bespoke C# importer — adding ESPN for NFL would have been weeks of work
  • We stored the same team / player / fixture multiple times, once per provider, and stitched them together with event-sourced "mapping" commands
  • Mapping was opaque to operators: the mapping layer didn't show whether an entity had been mapped, when it was mapped, or by whom. That created a steady stream of "is this linked yet?" questions and downstream delays whenever a mapping was missing or wrong.
  • Mapping happened too late in the pipeline: data science was effectively re-doing the canonicalisation work themselves on raw provider feeds, because our canonical ids weren't available at the point they needed them. Their training data couldn't reliably reference our internal ids — so we were paying for the same work twice.
  • Net result: slow onboarding, blocked operators, and a data science team building on identifiers they couldn't fully trust

Visual: "before" tangle diagram — six provider boxes each with their own pipe into duplicate aggregates, with a mapping arrow tangle. Annotate two pain points: "engineer-only mapping layer" and "DS re-derives canonical ids downstream".


Slide 2 — "What we built: one canonical entity, one crosswalk, no per-provider code"

Audience focus: Devs (will love this), managers (need to follow the shape) Key message: We replaced bespoke importers with a generic, data-driven pipeline. ClickHouse holds the truth; Foundry consumes it.

  • Standardised provider tables in ClickHouse — same columns across every provider for a given entity type ({provider}_teams, {provider}_participants, etc.). Data science populates these.
  • One normalised crosswalk (entity_mapping_crosswalk) — (canonical_id ↔ provider, provider_id) for every entity type and every provider, in one table
  • Generic importers — one importer per entity type reads from any provider's standardised table + the crosswalk → emits a CreateXCommand against the canonical id. Adding a new provider = zero C# code.
  • Operator UI at /admin/crosswalk plus a per-entity link-provider dialog for managing mappings without engineering involvement

Visual: "after" diagram — ClickHouse on the left (provider tables + crosswalk), generic importers in the middle, single canonical aggregates on the right.


DEMO (5–7 minutes)

Show the dialog doing the work — far more persuasive than describing it.

Suggested demo path:

  1. Navigate to /teams/{some-team-id} → click Provider Mappings.
  2. Open the link-provider dialog. Show the three columns: source (LuckBox team), candidates from all providers, preview.
  3. Demonstrate fuzzy match → narrow with provider filter → link a candidate. Note the live "Mapped to …" pill, conflict detection, diagnostics panel.
  4. Repeat the structured match for a fixture (/fixtures/{id}) — show how it matches by home team + away team + start time, not name.
  5. Close with a quick visit to /admin/crosswalk showing the unified view across providers.

Tip: have a recognisable example loaded (e.g. a well-known team across three providers) — abstract test data is forgettable. Have a dev console open in case anyone asks "what's actually happening?" but don't lead with it.


Slide 3 — "What this unlocks"

Audience focus: Execs, managers Key message: We've turned a weeks-of-dev problem into a days-of-data problem — and stopped paying for the same identifier work twice.

  • New provider onboarding: data eng creates the standardised tables + populates crosswalk → done. No deploy, no PR, no engineer. Weeks → days.
  • Operators self-serve: bad or missing mapping? Fix it in the wizard. Status visible — who linked this, when, and to what — no ticket required.
  • Canonical ids upstream of data science: standardised provider tables + crosswalk are populated before DS consumes the data, so training pipelines reference our canonical ids natively. No more shadow canonicalisation.
  • One source of truth: one team is one team. Reporting, analytics, and ML training all see consistent canonical ids.
  • Lower coupling: data engineering and app engineering can move independently — DS owns the provider tables, app eng owns the canonical model.

Visual: side-by-side comparison.

Old New
Provider onboarding Weeks of dev Days, data-only
Canonical ids for DS Post-hoc, DS-built From day one, infrastructure-provided
Operator visibility Engineer-only Self-serve UI with audit trail

Slide 4 — "What's next"

Audience focus: Managers (planning), execs (timing) Key message: The mapping rebuild is the foundation; here's what we'll stack on top.

  • Prove it: onboard ESPN for NFL with zero C# changes — that's the headline validation
  • Per-provider import wizard — admin UI to kick off + monitor data loads, replacing the current developer-driven runbook
  • Decommission legacy mapping infrastructureAggregateRelationshipBuilder, the dual-command path, and the per-provider importer code can all come out (saves ~20 files, simplifies the domain layer)
  • Migrate existing consumers — Ballr, scoreboard, projections currently read AggregateRelations; switch them to the crosswalk
  • Composite-id support for SuperCoach Teams + Fixtures (their identifiers don't fit the simple model)

Visual: timeline ribbon with the items above, with one already crossed off ("ClickHouse infra + per-entity dialogs ").


Slide 5 — "Risks and asks"

Audience focus: Execs (decisions), managers (resourcing) Key message: The hard work is done; here's what we need from the room.

  • Data migration: existing dual aggregates get deleted on cut-over. Already aligned, but worth re-confirming with stakeholders.
  • ClickHouse becomes critical path: if the crosswalk is down, importers stall. Need to confirm SLA + monitoring from infra.
  • Decision needed: do we want a soft cut-over (legacy importers stay alongside generic ones during validation) or hard (legacy gets deleted on day one)?
  • Resourcing: an owner from data engineering for the standardised provider tables — this is now their interface, not a side concern.

Visual: minimal — three traffic-light boxes (green: shipped; amber: in-flight; red: needs decision).


Optional Slide 6 — "Q&A / Backup material"

Keep one slide of dense detail in case someone asks: the score-band cutoffs, the FastEndpoints route list, the migration sequence. Don't show it unless asked — having it prevents you from getting derailed.


Speaker tips

  • Open with the problem, not the architecture. The exec half of the room needs to feel the pain before they care about the fix.
  • Land the "DS shadow canonicalisation" point hard. It reframes the work from "engineering cleanup" to "we've stopped paying for the same identifier problem twice." Worth saying out loud:

    "What this means in practice: data science was solving the same identifier problem we were, in parallel, with their own logic. That's now infrastructure — they get our canonical ids the moment data lands, and their models train against the same ids the platform serves."

  • Land the "opaque to users" point explicitly on slide 1. It sets up both the demo and slide 3's self-serve bullet:

    "Event sourcing wasn't the wrong technical choice — but it kept the mapping decisions inside the event log, where only engineers could see them. That's what we've fixed."

  • For the dev contingent, drop a one-liner like "we deleted ~3,000 lines of bespoke importer code and replaced it with one generic importer per entity type" — they'll perk up.
  • Don't get sucked into architecture before you've earned it with the problem statement and the demo.
  • Have a real, recognisable demo example loaded — abstract test data is forgettable.