Skip to content

Documentation Audit & OKF Alignment Roadmap

A review of the docs/ tree against Google's Open Knowledge Format (OKF v0.1), with a prioritized roadmap and a file-by-file remediation appendix. This document recommends; it changes nothing on its own — and the YAML header above deliberately models the frontmatter convention it proposes in §5.


1. Executive summary

LBS Foundry's documentation is already ~70% of the way to OKF philosophically: it is plain markdown, version-controlled in git, vendor-neutral, and even has an agent-facing index (the CLAUDE.md "Where to find things" table). What it lacks is machine-readability and consistency.

The single highest-leverage move is to adopt a lightweight YAML frontmatter convention with a required type field, then generate the index from it instead of hand-maintaining two competing indexes. Everything else (naming, placement, durable-vs-ephemeral separation) follows naturally once each document declares what it is.

Three numbers frame the work:

  • 131 durable docs (excluding the auto-generated superpowers/specs/ tree).
  • 0 of them currently carry YAML frontmatter — the heart of OKF.
  • ~35 need relocation, renaming, re-filing, or reclassification (enumerated in the Appendix).

2. What OKF is, and why it is relevant to us

OKF is deliberately tiny. Knowledge is represented as markdown files with YAML frontmatter, stored in git, with no SDK. The only required frontmatter field is type; title, description, resource, tags, and timestamp are encouraged. Files are organised hierarchically (domain → concept type), cross-linked with ordinary markdown links, with optional index.md (progressive disclosure) and log.md (history).

The point of OKF is not the file format — it is convergence. Stop scattering knowledge across wikis, metadata catalogs, code comments, and people's heads; put it in one format that both humans and AI agents can read. Because the format is just markdown + a tiny metadata header, Google's reference tooling (a static HTML graph visualiser and an enrichment agent) works over any conformant bundle with no backend.

Why this matters for this repo specifically:

  • We already feed docs to an agent. CLAUDE.md is a hand-curated agent index. Frontmatter would let that index be generated and let agents filter by type (e.g. "show me every runbook") and status (e.g. "ignore anything deprecated").
  • We ship typed SDKs and an MCP server. Our knowledge is meant to be machine-consumed. OKF is the same instinct applied to prose.
  • We have a clear domain hierarchy already (architecture, importers, outcome-context, models, sport). OKF's domain→type layout is a small step from where we are, not a rewrite.

OKF is "minimally opinionated" — extra frontmatter fields are explicitly allowed — so we can adopt its required type and encouraged fields and add our own (e.g. status) without breaking conformance.


3. Current-state inventory

What is already good

  • Markdown-in-git, vendor-neutral, human + agent readable — the OKF baseline, already met.
  • Sensible top-level taxonomy with per-folder README.md indexes in many sections (architecture/, developer-guide/, adr/, outcome-context/, etc.).
  • A curated agent index — the CLAUDE.md "Where to find things" table is, in effect, a partial OKF index already.
  • Deliberate audience splits — e.g. event-history-viewer.md (backend) vs user-guides/event-history-viewer.md (end-user), and generic-query-pattern.md (design rationale) vs developer-guide/generic-queries.md (hands-on). These already cross-link each other. This is good information architecture; it just needs consistent placement and a type/audience tag.

Distribution (durable docs, excluding superpowers/)

Area Files Notes
outcome-context/ 21 Self-contained, well-organised sub-tree (design + evaluations + sub-designs).
developer-guide/ 20 The daily-development core.
architecture/ 12 Concept + design docs.
adr/ 11 Mixed: 5 true ADRs, the rest are design/guide/ops docs (see gap 6).
api/, models/, integrations/, importers/ 8 / 7 / 7 / 6 Domain references.
sport/, testing/, samples/, getting-started/ 5 / 4 / 4 / 4
plans/, user-guides/ 3 / 2
usecases/, runbooks/, end-user-guides/, data-engineering/, configuration/ 1 each Several near-empty folders.
docs/ root (loose) 12 The largest single problem area — see gap 4.

4. Gap analysis vs OKF

Each finding is tagged High / Medium / Low by leverage (impact ÷ effort).

Gap 1 — No frontmatter anywhere · High

Evidence: 0 of 131 docs begin with a YAML --- block. Nothing is machine-classifiable: an agent or tool cannot tell a runbook from a plan from a reference, cannot filter out deprecated docs, and cannot graph relationships beyond raw link-following. OKF principle: frontmatter with a required type is the core of the format. Fix: define the schema in §5, backfill (Phase 1 for hot docs, Phase 2 for the rest).

Gap 2 — Two indexes that drift · High

Evidence: docs/README.md and the CLAUDE.md "Where to find things" table are both hand-maintained, neither is complete, and they list different subsets. Neither references, e.g., runbooks/ or most of outcome-context/. OKF principle: index.md for progressive disclosure — but an index that must be hand-synced will always lag. Fix: generate the index from frontmatter (see §5.3). One source of truth, cannot drift.

Gap 3 — Durable docs mixed with ephemera · High

Evidence: point-in-time artifacts sit beside durable references with no signal distinguishing them — nfl-season-structure-pr-notes.md, clickhouse-schema-review.md, canonical-entity-mapping-deck-outline.md, integrations/clerk-webhook-implementation-plan.md, api/ballr/LBS-1051-TradeAssist-Implementation.md, the outcome-context/evaluations/storage-experiment/status*.md set. OKF principle: type (and a status extension) is exactly what disambiguates "this is a permanent reference" from "this was a working note in March." Fix: assign type: note | plan | evaluation and status: archived where appropriate; consider docs/archive/ for truly dead material.

Gap 4 — 12 homeless files at docs/ root · Medium

Evidence: see Appendix A.1. Only README.md (and arguably the pillar docs intro.md / security.md, both referenced by CLAUDE.md) belong at root. The ballr-*.md trio clearly belongs under api/ballr/ or sport/. Fix: relocate per the appendix; update the two CLAUDE.md/README.md links that point at moved pillar docs.

Gap 5 — Naming drift · Medium

Evidence: the tree is mostly kebab-case, but architecture/IDENTITY_LINKING_PROBLEM_SUMMARY.md (SHOUTY_SNAKE), api/ballr/LBS-1051-TradeAssist-Implementation.md (ticket-prefixed PascalCase), sport/SuperCoachParticipantStats/CricketStatsRules.md (PascalCase dir + file), and models/.../nfl_sim_flowchart.md (snake_case) break it. Ticket numbers belong in frontmatter, not filenames. Fix: rename to kebab-case (Appendix A.2). Exception: nfl_sim_flowchart.md is a generated artifact referenced by CLAUDE.md and the regen-nfl-flowchart skill — leave it (or change the generator + references together, not the file alone). Conventional uppercase (README.md, CONTEXT.md, UPDATING.md) is fine and should stay.

Gap 6 — adr/ mixes types · Medium

Evidence: adr/ contains 5 genuine ADRs (adr-001/007/008/009, real-time-notification-system.md) alongside design docs (consumer-api-design.md, content-workflow-design.md, event-sourcing-content-implementation.md), an ops doc (deployment-operations.md), and a guide (strapi-integration-guide.md). An ADR is an immutable decision record; a design doc evolves. Conflating them weakens both. Fix: keep true ADRs in adr/; move the rest to design/, runbooks/, integrations/ per Appendix A.3.

Gap 7 — No freshness signal · Low

Evidence: nothing in-doc indicates currency. Git knows, but a reader (or agent) scanning the file does not. Stale docs read as authoritative. OKF principle: timestamp (and optional log.md). Fix: add updated + status to frontmatter; optionally a CI check that flags docs untouched for N months.


5. Proposed conventions

5.1 The type taxonomy (for this repo)

A small, closed set. Every doc gets exactly one.

type Meaning Lifecycle Examples
index Navigation / table of contents living every README.md, docs/README.md
concept Explains how/why something works living architecture/event-sourcing.md, intro.md
guide Task-oriented how-to living developer-guide/common-tasks.md, getting-started/*
reference Authoritative factual lookup living security.md, architecture/database-schema-diagram.md, api/ballr/*
design Technical design / rationale evolves, then settles architecture/*-design.md, outcome-context/design.md
adr Architecture Decision Record immutable once accepted adr/adr-00X-*.md
runbook Operational procedure living runbooks/*, developer-guide/deployment-pipeline-setup.md
model Simulation/model documentation living models/americanfootball/*
plan Roadmap / implementation plan ephemeral plans/*, outcome-context/roadmap.md
evaluation R&D findings, experiments ephemeral outcome-context/evaluations/*
note Working note, review, draft ephemeral clickhouse-schema-review.md, *-pr-notes.md
sample Sample data / diagnostic output ephemeral samples/*

5.2 Frontmatter schema

OKF-conformant field names (so Google's reference visualiser/tooling works out of the box), plus a status extension (OKF permits extra fields):

---
type: guide                       # REQUIRED — one value from the taxonomy in §5.1
title: Common Tasks               # REQUIRED — human title (often matches H1)
description: How to add a query or command in Foundry.   # one line; powers search + index
status: current                   # current | draft | deprecated | archived
tags: [cqrs, marten, commands]    # for discovery / filtering
updated: 2026-06-19               # OKF calls this `timestamp`; `updated` is clearer for prose
resource: src/Domain/...          # OPTIONAL — link to the code/system this documents
audience: developer               # OPTIONAL extension — developer | end-user | ops
---

Minimum to be conformant and useful: type, title, description, status, updated. The rest are encouraged where they add value.

Decision needed: keep OKF's exact timestamp field, or use the friendlier updated? (See §8.) Whichever we pick, apply it uniformly.

5.3 The generated index (replaces hand-maintained indexes)

A small script — natural fit for the existing TS toolchain (src/Tools/ or an pnpm docs:index task) — walks docs/, parses frontmatter, and emits docs/README.md grouped by type/area, skipping anything status: archived. Run it in CI; fail the build (or warn) if any doc lacks required frontmatter. This kills Gap 2 permanently and gives us a free OKF-style index.

The CLAUDE.md table can either be generated by the same pass or kept as a curated subset that links to the generated index — but it should stop trying to be a second full catalogue.

5.4 Naming & placement rules

  • kebab-case for all filenames and directories. Conventional uppercase entry files (README.md, CONTEXT.md, UPDATING.md) excepted.
  • No ticket numbers in filenames — put LBS-XXXX in frontmatter (tags or a ticket field).
  • No durable docs at docs/ root except README.md and the two pillar docs (intro.md, security.md) that CLAUDE.md links by stable path.
  • One folder per type-or-domain — don't mix ADRs with designs (Gap 6).

6. Target structure

The shape after alignment (only deltas from today shown):

docs/
├── README.md                # GENERATED index (type: index)
├── intro.md                 # pillar (type: concept) — kept at root
├── security.md              # pillar (type: reference) — kept at root
├── meta/                    # NEW — docs about docs
│   ├── documentation-audit.md     # this file
│   └── documentation-guide.md     # the conventions in §5, as living guidance
├── design/                  # NEW — technical designs split out of adr/
├── adr/                     # true ADRs only
├── archive/                 # OPTIONAL — status:archived material, or use the status field in place
├── architecture/  developer-guide/  api/  importers/  integrations/
├── models/  outcome-context/  sport/  testing/  samples/  runbooks/
├── getting-started/  user-guides/  end-user-guides/  usecases/
└── ...                      # (loose root files relocated into the above)

outcome-context/ is already well-structured and self-contained — leave its layout, just add frontmatter.


7. Phased roadmap

Phase 0 — Decide (this document)

Approve the type taxonomy (§5.1), the frontmatter schema (§5.2), and the open questions in §8. Output: a short docs/meta/documentation-guide.md capturing the agreed conventions. Effort: ~1 hr of review.

Phase 1 — Quick wins · ~half a day

Mechanical, high-visibility, low-risk: - Add frontmatter to the ~30 hot docs (everything linked from CLAUDE.md + docs/README.md). - Relocate the 12 loose root files (Appendix A.1) and fix the handful of cross-links. - Rename the naming outliers (Appendix A.2). - Tag the obvious ephemera with status: archived / type: note (Appendix A.4). Unlocks: a clean root, consistent naming, and a critical mass of frontmatter to prototype the index generator.

Phase 2 — Structural · ~1–2 days

  • Backfill frontmatter across all 131 docs.
  • Split adr/adr/ + design/ (Appendix A.3).
  • Build the index generator + CI check (§5.3); switch docs/README.md to generated.
  • Decide and apply the archive strategy (folder vs status). Unlocks: drift-proof index, machine-queryable corpus, enforced consistency.

Phase 3 — Optional / aspirational

  • Point Google's OKF static HTML visualiser at docs/ for a browsable knowledge graph (zero backend).
  • Add an enrichment step — an agent (we already run Claude + MCP) that proposes frontmatter and cross-links on new/changed docs in PRs.
  • Extend the OKF idea beyond prose to the data catalog (Marten projections / BigQuery-style table + metric docs), which is OKF's original use case. Unlocks: the full "knowledge graph + agent reasoning" payoff OKF is designed for.

8. Open questions / decisions for you

  1. Field name: OKF-exact timestamp, or human-friendly updated? (Recommend updated.)
  2. Archive strategy: a docs/archive/ folder, or a status: archived field with docs left in place? (Recommend the status field — less churn, preserves links.)
  3. Index ownership: generate docs/README.md and the CLAUDE.md table, or generate the README and keep CLAUDE.md as a curated subset? (Recommend the latter.)
  4. Ephemeral planning docs (plans/, parts of outcome-context/): keep them in docs/ tagged type: plan, or move working-process docs out of the published tree entirely?
  5. Enforcement: should missing frontmatter fail CI, or only warn? (Recommend warn first, fail after Phase 2 backfill is complete.)

Appendix: file-by-file remediation

Only files needing action are listed. Every other doc simply receives frontmatter in Phase 1/2 with no move or rename. "Re-link" means update the one or two references in CLAUDE.md / docs/README.md / sibling docs.

A.1 Relocate loose root files

Current path Action Proposed path type
ballr-get-supercoach-team-rankings.md move api/ballr/get-supercoach-team-rankings.md reference
ballr-link-unlink-supercoach-team.md move api/ballr/link-unlink-supercoach-team.md reference
ballr-search-supercoach-teams.md move api/ballr/search-supercoach-teams.md reference
event-history-viewer.md move + re-link developer-guide/event-history-viewer-backend.md guide
generic-query-pattern.md move + re-link architecture/generic-query-pattern.md design
development-workflow.md move developer-guide/development-workflow.md guide
clickhouse-schema-review.md reclassify meta/reviews/clickhouse-schema-review.md (or in place) note (status: archived)
canonical-entity-mapping-deck-outline.md reclassify meta/notes/canonical-entity-mapping-deck-outline.md note (status: archived)
nfl-season-structure-pr-notes.md move + reclassify models/americanfootball/nfl-season-structure-pr-notes.md note (status: archived)
intro.md keep at root concept
security.md keep at root reference
README.md keep at root index (generated)

A.2 Rename for kebab-case

Current path Proposed path Note
architecture/IDENTITY_LINKING_PROBLEM_SUMMARY.md architecture/identity-linking-problem-summary.md type: design (or note if stale)
api/ballr/LBS-1051-TradeAssist-Implementation.md api/ballr/tradeassist-implementation.md ticket LBS-1051 → frontmatter tags/ticket
sport/SuperCoachParticipantStats/CricketStatsRules.md sport/supercoach-participant-stats/cricket-stats-rules.md rename dir + file
models/americanfootball/nfl-flow/rendered/nfl_sim_flowchart.md (leave as-is) generated artifact; change generator + CLAUDE.md ref together if ever renamed

A.3 Re-file out of adr/ (type confusion)

Current path Proposed path type
adr/consumer-api-design.md design/consumer-api-design.md design
adr/content-workflow-design.md design/content-workflow-design.md design
adr/event-sourcing-content-implementation.md design/event-sourcing-content-implementation.md design
adr/deployment-operations.md runbooks/deployment-operations.md runbook
adr/strapi-integration-guide.md integrations/strapi-integration-guide.md guide
adr/adr-001/007/008/009-*.md, adr/real-time-notification-system.md, adr/README.md (keep) adr / index

A.4 Classify ephemera (assign type + status; no move required)

Path(s) type Suggested status
plans/* (3 files) plan current / archived per state
outcome-context/roadmap.md, phase-breakdown.md, sequencing-rationale.md plan current
outcome-context/query-layer/build-plan.md, gap-analysis.md, rd-findings.md plan / evaluation current
outcome-context/evaluations/** (incl. storage-experiment/status*.md) evaluation archived where the experiment is closed
integrations/clerk-webhook-implementation-plan.md plan archived if shipped
integrations/discord-verification-frontend-implementation.md design / note current
samples/* (4 files) sample current
runbooks/archive-non-luckbox-aggregates.md runbook current (already correctly placed)

A.5 Audience-split pairs (keep both; co-locate + tag)

These are not duplicates — keep both, ensure cross-links survive any move, and add audience frontmatter:

Pair Action
event-history-viewer.md (backend) ↔ user-guides/event-history-viewer.md (end-user) move backend half into developer-guide/ (A.1); add audience: developer / audience: end-user; keep the existing cross-links
generic-query-pattern.md (design) ↔ developer-guide/generic-queries.md (hands-on) move design half into architecture/ (A.1); tag type: design / type: guide; keep cross-links

Execution status (2026-06-20)

The structural recommendations and the frontmatter backfill were applied on branch docs/okf-alignment. What shipped, and where it deviated from the plan above:

Applied - All relocations (A.1), renames (A.2), and the audience-split moves (A.5), with every inbound/outbound relative link updated. A link checker confirms the move introduced no broken links. - OKF frontmatter added to all 131 durable docs (this file and the conventions guide already had it). updated reflects each file's last git commit date; 8 ephemeral docs are marked status: archived. - docs/README.md regenerated as a complete index from frontmatter (closes Gap 2). - CLAUDE.md gains a row pointing to the conventions guide; UTF-8 BOMs were stripped from 4 files.

Deviations from the plan (discovered while executing) - A.3 dropped. The "design docs" in adr/ (consumer-api-design, content-workflow-design, event-sourcing-content-implementation, deployment-operations, strapi-integration-guide) are catalogued as ADR-002 through ADR-006 in adr/README.md — they are genuine ADRs, not mis-filed designs. They were kept in adr/ and tagged type: adr. Optional follow-up: rename them to the adr-00X- filename convention (touches the dense ADR cross-link web; deferred for deliberate review). - clickhouse-schema-review.md kept at root. It is referenced by ~8 links from ADR-009 and the outcome-context evaluations as a permanent record; relocating it was high-churn for little gain. Tagged type: note, status: archived, left in place.

Deferred (Phase 2/3) - The index generator + CI enforcement: docs/README.md was regenerated once by a hand-run script, but no generator is committed to the build (see §5.3). - Hard CI gating on missing frontmatter; the OKF HTML visualiser / enrichment agent (Phase 3).

Pre-existing issues surfaced (not caused by this change) — for your triage - clickhouse-schema-review.md links a deleted file (...StorageExperiment/.../ClickHouseSchemas.cs); that project no longer exists. - Four directory-style links don't resolve: discord-integration.md./api/, ./deployment/; storage-experiment/status.mdsamples/, samples/experiment-runs/.


This document began as the plan; the section above records the executed result on branch docs/okf-alignment.