Allocation Reduction Concepts — Play Object¶

Background¶

The current simulation allocates ~35,000 Play class instances per season (288 games × ~130 plays per game). Each Play has 38 properties (situation, personnel, play call, players, yardage, outcomes, penalties, drive context). At 16.5 MB allocated per season, Play objects dominate the allocation profile.

After the flatten refactor (removing nested Drive.Plays in favour of GameResult.Plays), Gen2 collections jumped from 47 to 328 — the larger Play objects (post drive-context addition) are surviving longer and getting promoted through the generations more frequently.

Further allocation reduction requires changing how Play instances are created, used, and discarded. Three approaches were considered. None are being implemented now; this document captures the options for future work.

Context¶

Current state: 16.5 MB/season, 12.04 ms/season (BenchmarkDotNet, single thread)
Target lifecycle (future): Stream game results to ClickHouse or Parquet/Arrow, then discard. Possibly read back from columnar storage into object context for querying.
Constraint: Only ~200 plays need to exist at once if streaming — the plays of one game.

Approach A: Object Pool¶

Keep Play as a class. Introduce a PlayPool that manages a fixed-size buffer of pre-allocated Play instances. Plays are rented from the pool during simulation and returned after the game is consumed (written to the output sink).

How it works¶

At startup, allocate ~200-300 Play instances into a pool.
PlayResolutionEngine requests a Play from the pool instead of new Play().
After the game is consumed (written to Parquet/ClickHouse), all plays in that game are returned to the pool.
On return, each play's fields are reset (or overwritten on next rent).

Changes required¶

Play class — add a Reset() method that zeros all 38 fields
New PlayPool class — array of reusable instances with Rent() and Return()
PlayResolutionEngine.Resolve() — rent a play instead of allocating
GameEngine/SeasonEngine — return plays to the pool after each game is processed
Program.cs — wire pool lifecycle

Expected impact¶

Near-zero per-play allocation (pool reused across all games)
Season allocation drops from 16.5 MB to estimated ~2-3 MB (mostly lists and strings)
Gen2 collections drop toward zero

Trade-offs¶

Pro: Minimal code churn, Play stays a class, simulation logic unchanged
Pro: Aligns with streaming — naturally fits a "simulate → write → return to pool" lifecycle
Con: Discipline required — if a Play reference escapes the pool (held beyond the game), subsequent mutation will corrupt it
Con: In benchmark mode (no sink), need an explicit "consume and discard" step to return plays to the pool

Approach B: Convert Play to Struct¶

Replace Play class with a mutable struct. Pass by ref everywhere in the engine. Store in Play[] arrays instead of List<Play>.

Trade-offs¶

Pro: Zero heap allocation for plays
Con: Play is ~300 bytes (38 fields) — well above the ~16 byte threshold where struct copies become expensive. Copies on pass-by-value would hurt performance more than they help.
Con: Every engine method signature needs ref Play — touches ~6 files extensively
Con: List<Play> of structs has footguns — indexing returns a copy, not a reference. CollectionsMarshal.AsSpan needed to mutate in place.
Con: Significant code churn for marginal gain over pooling

This approach is not recommended — strictly worse than pooling for a struct of this size.

Approach C: Columnar Buffer (Structure of Arrays)¶

Replace the Play class entirely with a PlayBuffer holding parallel arrays — one per field. Engine methods operate on (PlayBuffer buffer, int rowIndex) instead of Play instances.

How it works¶

public class PlayBuffer
{
    public int Count { get; private set; }
    public int[] PlaySequence { get; }
    public int[] Quarter { get; }
    public int[] Down { get; }
    public int[] Distance { get; }
    public string[] PossessionTeam { get; }
    public string[] PlayType { get; }
    // ... 38 parallel arrays

    public int AddRow() => this.Count++;
}

Engine code becomes:

var i = buffer.AddRow();
buffer.Quarter[i] = state.Quarter;
buffer.Down[i] = state.Down;
buffer.PlayType[i] = PlayType.RunLeft;

Expected impact¶

Zero per-play allocation — just index into pre-allocated arrays
Cache-friendly sequential array access
Data is already in Parquet/Arrow's native format — columnar output writes become trivial direct copies
Perfect alignment with the future streaming architecture

Trade-offs¶

Pro: Best possible performance for both allocation and throughput
Pro: Eliminates the transformation step between simulation and columnar output — the buffer IS the output format
Pro: Cache locality (iterating one column is faster than iterating objects)
Con: Most invasive change — every engine method needs refactoring
Con: Loses the object model — no more play.Quarter, becomes buffer.Quarter[i]
Con: Harder to debug and inspect in dev (no object inspector, just rows of arrays)
Con: ~38 parallel arrays to manage — any schema change touches the buffer definition

Recommendation for Future Work¶

Approach A (Object Pool) as an incremental win when you need to reduce allocations before implementing streaming. It's cheap, keeps the existing object model, and gets close to the allocation floor for plays.

Approach C (Columnar Buffer) as the end state when you implement the streaming output layer to Parquet/Arrow/ClickHouse. At that point the columnar transformation is justified by the output format — you're not adding complexity speculatively, you're aligning the internal representation with the external one.

Do not do Approach B. Mutable structs of this size are strictly worse than pooling.

Open Questions for Object Context Conversion¶

The user flagged a related future need: converting Parquet/Arrow data back into an object context for querying. This deserves its own design session. Key questions to answer:

What is "the object context"? A GameResult object with nested plays? A DTO for API responses? Entity Framework entities for ad-hoc queries?
What queries need to be supported? Full play-by-play replay, aggregate stats, per-matchup probability distributions across worlds?
Is the conversion lazy or eager? Do we load one game at a time, or the full dataset?
How does this interact with Approach C? If we move to columnar buffers internally, do we ever need to reconstitute a Play object, or do queries work directly against the columnar data?
What's the read-side allocation budget? A query returning 1M plays as Play objects re-creates the allocation problem we're trying to avoid on the write side.

These questions need answers before planning that conversion work.

Current State Summary¶

Play remains a class with 38 properties
GameResult.Plays is a flat List<Play> (200 capacity)
~35,000 Play allocations per season, 16.5 MB total
No pooling, no buffers — plain allocations
Simulation runs at ~22 worlds/sec in the real runner, ~80 worlds/sec in the BenchmarkDotNet micro-benchmark
Throughput is stable across 10K worlds with no GC degradation (tested)

No changes recommended until the streaming output layer is being built.