Skills
Skills are structured knowledge packages that give each agent its domain expertise. They are not just prompts — they contain execution protocols, tech stack references, code templates, error playbooks, quality checklists, and few-shot examples, organized in a two-layer architecture designed for token efficiency.
The Two-Layer Design
Layer 1: SKILL.md (~800 bytes, always loaded)
Every skill has a SKILL.md file at its root. This is always loaded into the context window when the skill is referenced. It contains:
- YAML frontmatter with
nameanddescription(used for routing and display) - When to use / When NOT to use — explicit activation conditions
- Core rules — the 5-15 most critical constraints for the domain
- Architecture overview — how code should be structured
- Library list — approved dependencies and their purposes
- References — pointers to Layer 2 resources (never loaded automatically)
Example frontmatter:
---
name: oma-frontend
description: Frontend specialist for React, Next.js, TypeScript with FSD-lite architecture, shadcn/ui, and design system alignment. Use for UI, component, page, layout, CSS, Tailwind, and shadcn work.
---
The description field is critical — it contains the routing keywords that the skill routing system uses to match tasks to agents.
Layer 2: resources/ (loaded on-demand)
The resources/ directory contains deep execution knowledge. These files are loaded only when:
- The agent is explicitly invoked (via
/commandor agent skills field) - The specific resource is needed for the current task type and difficulty
This on-demand loading is governed by the context-loading guide (.agents/skills/_shared/core/context-loading.md), which maps task types to required resources per agent.
File Structure Example
.agents/skills/oma-frontend/
├── SKILL.md ← Layer 1: always loaded (~800 bytes)
└── resources/
├── execution-protocol.md ← Layer 2: step-by-step workflow
├── tech-stack.md ← Layer 2: detailed technology specs
├── tailwind-rules.md ← Layer 2: Tailwind-specific conventions
├── component-template.tsx ← Layer 2: React component template
├── snippets.md ← Layer 2: copy-paste code patterns
├── error-playbook.md ← Layer 2: error recovery procedures
├── checklist.md ← Layer 2: quality verification checklist
└── examples/ ← Layer 2: few-shot input/output examples
└── examples.md
.agents/skills/oma-backend/
├── SKILL.md
├── resources/
│ ├── execution-protocol.md
│ ├── examples.md
│ ├── orm-reference.md ← Domain-specific (ORM queries, N+1, transactions)
│ ├── checklist.md
│ └── error-playbook.md
└── stack/ ← Generated by /stack-set (language-specific)
├── stack.yaml
├── tech-stack.md
├── snippets.md
└── api-template.*
.agents/skills/oma-design/
├── SKILL.md
├── resources/
│ ├── execution-protocol.md
│ ├── anti-patterns.md
│ ├── checklist.md
│ ├── design-md-spec.md
│ ├── design-tokens.md
│ ├── prompt-enhancement.md
│ ├── stitch-integration.md
│ └── error-playbook.md
├── reference/ ← Deep reference material
│ ├── typography.md
│ ├── color-and-contrast.md
│ ├── spatial-design.md
│ ├── motion-design.md
│ ├── responsive-design.md
│ ├── component-patterns.md
│ ├── accessibility.md
│ └── shader-and-3d.md
└── examples/
├── design-context-example.md
└── landing-page-prompt.md
Per-Skill Resource Types
| Resource Type | Filename Pattern | Purpose | When Loaded |
|---|---|---|---|
| Execution Protocol | execution-protocol.md | Step-by-step workflow: Analyze -> Plan -> Implement -> Verify | Always (with SKILL.md) |
| Tech Stack | tech-stack.md | Detailed technology specs, versions, configuration | Complex tasks |
| Error Playbook | error-playbook.md | Recovery procedures with "3 strikes" escalation | On error only |
| Checklist | checklist.md | Domain-specific quality verification | At Verify step |
| Snippets | snippets.md | Copy-paste ready code patterns | Medium/Complex tasks |
| Examples | examples.md or examples/ | Few-shot input/output examples for the LLM | Medium/Complex tasks |
| Variants | stack/ directory | Language/framework-specific references (generated by /stack-set) | When stack exists |
| Templates | component-template.tsx, screen-template.dart | Boilerplate file templates | On component creation |
| Domain Reference | orm-reference.md, anti-patterns.md, etc. | Deep domain knowledge for specific subtasks | Task-type specific |
Shared Resources (_shared/)
All agents share common foundations from .agents/skills/_shared/. These are organized into three categories:
Core Resources (.agents/skills/_shared/core/)
| Resource | Purpose | When Loaded |
|---|---|---|
skill-routing.md | Maps task keywords to the correct agent. Contains the Skill-Agent Mapping table, Complex Request Routing patterns, Inter-Agent Dependency Rules, Escalation Rules, and Turn Limit Guide. | Referenced by orchestrator and coordination skills |
context-loading.md | Defines which resources to load for which task type and difficulty. Contains per-agent task-type-to-resource mapping tables and conditional protocol loading triggers. | At workflow start (Step 0 / Phase 0) |
prompt-structure.md | Defines the four elements every task prompt must contain: Goal, Context, Constraints, Done When. Includes templates for PM, implementation, and QA agents. Lists anti-patterns (starting with only a Goal). | Referenced by PM agent and all workflows |
clarification-protocol.md | Defines uncertainty levels (LOW/MEDIUM/HIGH) with actions for each. Contains uncertainty triggers, escalation templates, required verification items per agent type, and subagent-mode behavior. | When requirements are ambiguous |
context-budget.md | Token budget management. Defines file reading strategy (use find_symbol not read_file), resource loading budgets per model tier (Flash: ~3,100 tokens / Pro: ~5,000 tokens), large file handling, and context overflow symptoms. | At workflow start |
difficulty-guide.md | Criteria for classifying tasks as Simple/Medium/Complex. Defines expected turn counts, protocol branching (Fast Track / Standard / Extended), and misjudgment recovery. | At task start (Step 0) |
reasoning-templates.md | Structured reasoning fill-in-the-blank templates for common decision patterns (e.g., Exploration Decision template #6 used by the Exploration Loop). | During complex decisions |
quality-principles.md | 4 universal quality principles applied across all agents. | At workflow start for quality-focused workflows (ultrawork) |
vendor-detection.md | Protocol for detecting the current runtime environment (Claude Code, Codex CLI, Gemini CLI, Antigravity, CLI Fallback). Uses marker checks: Agent tool = Claude Code, apply_patch = Codex, @-syntax = Gemini. | At workflow start |
session-metrics.md | Clarification Debt (CD) scoring and session metrics tracking. Defines event types (clarify +10, correct +25, redo +40), thresholds (CD >= 50 = RCA, CD >= 80 = pause), and integration points. | During orchestration sessions |
common-checklist.md | Universal quality checklist applied at final verification of Complex tasks (in addition to agent-specific checklists). | Verify step of Complex tasks |
lessons-learned.md | Repository of past session learnings, auto-generated from Clarification Debt breaches and discarded experiments. Organized by domain section. Includes QA Evaluation Lessons for tracking evaluator blind spots. | Referenced after errors and at session end |
evaluator-tuning.md | Semi-automated QA prompt tuning protocol. Tracks Evaluation Accuracy (EA) events, triggers tuning when EA >= 30, generates patch suggestions for QA checklists and execution protocols. Includes tuning log and positive reinforcement from good_catch events. | When oma retro detects EA threshold breach |
api-contracts/ | Directory containing API contract template and generated contracts. template.md defines the per-endpoint format (method, path, request/response schemas, auth, errors). | When cross-boundary work is planned |
Runtime Resources (.agents/skills/_shared/runtime/)
| Resource | Purpose |
|---|---|
memory-protocol.md | Memory file format and operations for CLI subagents. Defines On Start, During Execution, and On Completion protocols using configurable memory tools (read/write/edit). Includes experiment tracking extension. |
execution-protocols/claude.md | Claude Code-specific execution patterns. Injected by oma agent:spawn when vendor is claude. |
execution-protocols/gemini.md | Gemini CLI-specific execution patterns. |
execution-protocols/codex.md | Codex CLI-specific execution patterns. |
execution-protocols/qwen.md | Qwen CLI-specific execution patterns. |
Vendor-specific execution protocols are injected automatically by oma agent:spawn — agents do not need to manually load them.
Conditional Resources (.agents/skills/_shared/conditional/)
These are loaded only when specific conditions are met during execution:
| Resource | Trigger Condition | Loaded By | Approx. Tokens |
|---|---|---|---|
quality-score.md | VERIFY or SHIP phase begins in a workflow that supports quality measurement | Orchestrator (passes to QA agent prompt) | ~250 |
experiment-ledger.md | First experiment is recorded after establishing an IMPL baseline | Orchestrator (inline, after baseline measurement) | ~250 |
exploration-loop.md | Same gate fails twice on the same issue | Orchestrator (inline, before spawning hypothesis agents) | ~250 |
Budget impact: approximately 750 tokens total if all 3 are loaded. Since loading is conditional, typical sessions load 1-2 of these. Flash-tier budget remains within the approximately 3,100 token allocation.
How Skills Route via skill-routing.md
The skill routing map defines how tasks are matched to agents:
Simple Routing (Single Domain)
A prompt containing "Build a login form with Tailwind CSS" matches the keywords UI, component, form, Tailwind, and routes to oma-frontend.
Complex Request Routing
Multi-domain requests follow established execution orders:
| Request Pattern | Execution Order |
|---|---|
| "Create a fullstack app" | oma-pm -> (oma-backend + oma-frontend) parallel -> oma-qa |
| "Create a mobile app" | oma-pm -> (oma-backend + oma-mobile) parallel -> oma-qa |
| "Fix bug and review" | oma-debug -> oma-qa |
| "Design and build a landing page" | oma-design -> oma-frontend |
| "I have an idea for a feature" | oma-brainstorm -> oma-pm -> relevant agents -> oma-qa |
| "Do everything automatically" | oma-orchestrator (internally: oma-pm -> agents -> oma-qa) |
Inter-Agent Dependency Rules
Can run in parallel (no dependencies):
- oma-backend + oma-frontend (when API contract is pre-defined)
- oma-backend + oma-mobile (when API contract is pre-defined)
- oma-frontend + oma-mobile (independent of each other)
Must run sequentially:
- oma-brainstorm -> oma-pm (design comes before planning)
- oma-pm -> all other agents (planning comes first)
- implementation agent -> oma-qa (review after implementation)
- oma-backend -> oma-frontend/oma-mobile (when no pre-defined API contract)
QA is always last, except when the user requests review of specific files only.
Token Savings Math
Consider a 5-agent orchestration session (pm, backend, frontend, mobile, qa):
Without progressive disclosure:
- Each agent loads all resources: ~4,000 tokens per agent
- Total: 5 x 4,000 = 20,000 tokens consumed before any work
With progressive disclosure:
- Layer 1 only for all agents: 5 x 800 = 4,000 tokens
- Layer 2 loaded only for active agents (typically 1-2 at a time): +1,500 tokens
- Total: ~5,500 tokens
Savings: approximately 72-75%
On flash-tier models (128K context), this is the difference between having 108K tokens available for work versus 125K tokens — a significant margin for complex tasks.
Resource Loading by Task Difficulty
The difficulty guide classifies tasks into three levels, which determine how much of Layer 2 is loaded:
Simple (3-5 turns expected)
Single file change, clear requirements, repeating existing patterns.
Loads: execution-protocol.md only. Skip analysis, proceed directly to implementation with minimal checklist.
Medium (8-15 turns expected)
2-3 file changes, some design decisions needed, applying patterns to new domains.
Loads: execution-protocol.md + examples.md. Standard protocol with brief analysis and full verification.
Complex (15-25 turns expected)
4+ file changes, architecture decisions required, introducing new patterns, dependencies on other agents.
Loads: execution-protocol.md + examples.md + tech-stack.md + snippets.md. Extended protocol with checkpoints, mid-execution progress recording, and full verification including common-checklist.md.
Context-Loading Task Maps (Per Agent)
The context-loading guide provides detailed task-type-to-resource mappings. Here are the key mappings:
Backend Agent
| Task Type | Required Resources |
|---|---|
| CRUD API creation | stack/snippets.md (route, schema, model, test) |
| Authentication | stack/snippets.md (JWT, password) + stack/tech-stack.md |
| DB migration | stack/snippets.md (migration) |
| Performance optimization | examples.md (N+1 example) |
| Existing code modification | examples.md + Serena MCP |
Frontend Agent
| Task Type | Required Resources |
|---|---|
| Component creation | snippets.md + component-template.tsx |
| Form implementation | snippets.md (form + Zod) |
| API integration | snippets.md (TanStack Query) |
| Styling | tailwind-rules.md |
| Page layout | snippets.md (grid) + examples.md |
Design Agent
| Task Type | Required Resources |
|---|---|
| Design system creation | reference/typography.md + reference/color-and-contrast.md + reference/spatial-design.md + design-md-spec.md |
| Landing page design | reference/component-patterns.md + reference/motion-design.md + prompt-enhancement.md + examples/landing-page-prompt.md |
| Design audit | checklist.md + anti-patterns.md |
| Design token export | design-tokens.md |
| 3D / shader effects | reference/shader-and-3d.md + reference/motion-design.md |
| Accessibility review | reference/accessibility.md + checklist.md |
QA Agent
| Task Type | Required Resources |
|---|---|
| Security review | checklist.md (Security section) |
| Performance review | checklist.md (Performance section) |
| Accessibility review | checklist.md (Accessibility section) |
| Full audit | checklist.md (full) + self-check.md |
| Quality scoring | quality-score.md (conditional) |
Orchestrator Prompt Composition
When the orchestrator composes prompts for subagents, it includes only task-relevant resources:
- Agent SKILL.md's Core Rules section
execution-protocol.md- Resources matching the specific task type (from the maps above)
error-playbook.md(always included — recovery is essential)- Serena Memory Protocol (CLI mode)
This targeted composition avoids loading unnecessary resources, maximizing the subagent's available context for actual work.
Clarification Debt & Session Metrics (Deep Dive)
Clarification Debt (CD) measures the cost of unclear requirements during a session. The orchestrator tracks every user correction and scores it:
| Event Type | Points | Description |
|---|---|---|
clarify | +10 | Simple clarification question (expected for MEDIUM uncertainty) |
correct | +25 | Intent misunderstanding requiring direction change |
redo | +40 | Scope/charter violation requiring rollback and restart |
blocked | +0 | Agent correctly stopped and asked (good behavior — not penalized) |
Modifiers: Charter not read (+15), allowlist violation (+20), same error repeated (x1.5).
Thresholds and enforcement:
- CD >= 50 → Mandatory RCA entry added to
lessons-learned.md - CD >= 80 → Session halted, user must re-specify requirements
redo>= 2 → Orchestrator pauses and requests explicit scope confirmation- CD >= 30 across 3 consecutive sessions for the same agent → Agent prompt template review
The session log is maintained in .serena/memories/session-metrics.md with per-event rows (turn, agent, event type, points, detail) and a summary section.
Evaluator Accuracy & QA Tuning
QA agents improve through tracked judgment errors. Unlike CD (real-time), Evaluator Accuracy (EA) is retrospective — most errors are discovered after the session ends.
EA event types:
| Event | Points | When Discovered |
|---|---|---|
false_negative | +30 | Next session or production — bug that QA missed |
false_positive | +15 | During session — impl agent successfully disputes QA finding |
severity_mismatch | +10 | During session or retro — wrong severity assigned |
missed_stub | +20 | Runtime verification catches display-only feature |
good_catch | -10 | QA caught a non-obvious bug (positive reward signal) |
EA is calculated on a rolling 3-session window. Thresholds:
- EA >= 30 →
oma retroflags QA patterns for review (tuning suggested) - EA >= 50 → Tuning required: update QA execution-protocol.md
false_negative>= 3 across window → Add detection pattern to QA checklist.mdgood_catch>= 5 across window → Document and propagate successful pattern
The full tuning loop is defined in evaluator-tuning.md: sessions accumulate EA events → threshold triggers oma retro → report categorizes errors and suggests patches → user reviews and approves → patches applied to QA checklist/protocol → validation over next 3 sessions.
Sprint Decomposition for Complex Tasks
Complex tasks (4+ files, architecture decisions) use sprint-based execution rather than a single long run:
- Decompose into 2-4 feature-focused sprints, each independently testable
- Target 5-8 turns per sprint
- Sprint Gate after each sprint:
- Sprint deliverable complete?
- Lint/test pass?
- If sprint took 2x expected turns → write checkpoint, inform user
- Continue to next sprint on gate pass
Example: Task "JWT auth + CRUD API + tests" decomposes into:
- Sprint 1: User model + auth endpoints (register/login)
- Sprint 2: CRUD endpoints + validation
- Sprint 3: Tests + error handling
Difficulty misjudgment recovery: If a task started as Simple but proves more complex, the agent upgrades to Medium or Complex protocol mid-execution and records the change in progress.
Context Reset Protocol
Long-running agents degrade in quality as context fills up. The Orchestrator (not the agent itself) monitors for this and triggers resets.
Trigger conditions (Orchestrator checks during monitoring):
| Condition | Detection | Action |
|---|---|---|
| Turn budget exhaustion | Agent consumed >= 80% of expected turns AND acceptance criteria < 50% complete | Context Reset |
| Progress stall | No progress file update for 3+ consecutive monitoring cycles | Context Reset |
| Shallow output | Result file contains stub markers or TODO placeholders | Re-spawn with explicit instruction |
Reset procedure:
- Checkpoint — Save agent's current state (completed items, remaining items, key decisions)
- Terminate — Stop the current agent run
- Re-spawn — Start a fresh agent with the checkpoint as context
- Resume — New agent reads checkpoint, continues from remaining items only
For standalone agents (no Orchestrator), the Sprint Gate in difficulty-guide.md serves as the safety net — if a sprint takes 2x expected turns, the agent writes a checkpoint and informs the user.