GAME INTRO — IntentGrid v0
This document is the canonical introduction to the game used by IntentGrid. It explains the setting, the rules at a high level, the agent interface, and why this game is a meaningful benchmark.
For the exact action schema and strict legality rules, see:
specs/system-prompt.md.
1. What is IntentGrid?
IntentGrid is a two-player, turn-based combat game played on a 13×13 grid.
Each turn:
- Both teams (Blue and Red) spawn 1 new unit on a random empty cell on the board edge.
- Every living unit chooses one action (Move or Attack).
- The match lasts exactly 40 turns.
- The winner is decided primarily by surviving unit count, with tie-breakers: (1) total remaining HP, (2) central zone control.
The key twist: the board edge is harmful terrain. New units spawn on the edge, but units that stay on the edge for multiple consecutive turns will lose HP and die. This forces units to move inward, leading to congestion and inevitable combat.
2. High-level rules (human summary)
IntentGrid is deliberately simple at the “micro” level:
- Movement: move 1 cell (N/S/E/W) into an empty cell.
- Attack: attack 1 adjacent cell (N/S/E/W); if an enemy is there, it loses 1 HP.
- HP: units only have 2 or 1 HP.
- Death: a unit is removed immediately when HP reaches 0.
What makes it non-trivial is the system-level pressure:
- Spawn pressure: 40 turns × 2 spawns/turn = up to 80 spawned units.
- Board capacity: 13×13 = 169 cells.
- Therefore: the board trends toward saturation; movement becomes constrained and fights become unavoidable.
3. The LLM interface: language-driven command
IntentGrid evaluates models by letting them participate as AI commanders.
Every turn, the model receives a prompt containing:
- Which team it controls
- Turn number
- A state table for both teams (unit ids, coordinates, HP, zones)
- An ASCII board diagram
The model may reason freely in natural language, but must output a strict JSON action plan. This design is intentional:
- It tests whether a model can convert reasoning into valid grounded actions.
- It keeps the control interface minimal and stable.
- It allows models to express strategy in their own “native” style.
4. System architecture context (how IntentGrid runs matches)
IntentGrid’s Board Service is not a game engine.
In production, a match is executed by:
- an external scheduler (selects models, builds prompts, calls LLMs, applies fallbacks)
- a simulator (the only component that executes game rules)
The Board Service only:
- stores match events (append-only)
- renders turn-by-turn artifacts (boards, state tables, narratives, actions)
- aggregates results into a leaderboard
This separation is a core design constraint (see projects/001-match-process.md).
5. Why this benchmark is harder than it looks
At first glance, IntentGrid looks simple. In practice, it is a surprisingly challenging environment for general-purpose LLMs:
- Overcrowding is guaranteed: congestion creates non-linear dynamics.
- Local tactics compound quickly: small mistakes block allies or waste tempo.
- Short horizons are deceptive: the board evolves rapidly as spawns accumulate.
- Legality matters: invalid JSON or illegal actions can be discarded or degraded by fallbacks.
One of the most interesting observations is that a very small, fixed baseline policy can defeat many large models — because the game rewards consistency, constraint-following, and micro-action reliability.
6. References
- Robot Rumble (inspiration): https://robotrumble.org/
- Action rules spec:
specs/system-prompt.md