Skip to main content

Skills

Skills are structured knowledge packages that give each agent its domain expertise. They are not just prompts — they contain execution protocols, tech stack references, code templates, error playbooks, quality checklists, and few-shot examples, organized in a two-layer architecture designed for token efficiency.


The Two-Layer Design

Layer 1: SKILL.md (~800 bytes, always loaded)

Every skill has a SKILL.md file at its root. This is always loaded into the context window when the skill is referenced. It contains:

  • YAML frontmatter with name and description (used for routing and display)
  • When to use / When NOT to use — explicit activation conditions
  • Core rules — the 5-15 most critical constraints for the domain
  • Architecture overview — how code should be structured
  • Library list — approved dependencies and their purposes
  • References — pointers to Layer 2 resources (never loaded automatically)

Example frontmatter:

---
name: oma-frontend
description: Frontend specialist for React, Next.js, TypeScript with FSD-lite architecture, shadcn/ui, and design system alignment. Use for UI, component, page, layout, CSS, Tailwind, and shadcn work.
---

The description field is critical — it contains the routing keywords that the skill routing system uses to match tasks to agents.

Layer 2: resources/ (loaded on-demand)

The resources/ directory contains deep execution knowledge. These files are loaded only when:

  1. The agent is explicitly invoked (via /command or agent skills field)
  2. The specific resource is needed for the current task type and difficulty

This on-demand loading is governed by the context-loading guide (.agents/skills/_shared/core/context-loading.md), which maps task types to required resources per agent.


File Structure Example

.agents/skills/oma-frontend/
├── SKILL.md ← Layer 1: always loaded (~800 bytes)
└── resources/
├── execution-protocol.md ← Layer 2: step-by-step workflow
├── tech-stack.md ← Layer 2: detailed technology specs
├── tailwind-rules.md ← Layer 2: Tailwind-specific conventions
├── component-template.tsx ← Layer 2: React component template
├── snippets.md ← Layer 2: copy-paste code patterns
├── error-playbook.md ← Layer 2: error recovery procedures
├── checklist.md ← Layer 2: quality verification checklist
└── examples/ ← Layer 2: few-shot input/output examples
└── examples.md

.agents/skills/oma-backend/
├── SKILL.md
├── resources/
│ ├── execution-protocol.md
│ ├── examples.md
│ ├── orm-reference.md ← Domain-specific (ORM queries, N+1, transactions)
│ ├── checklist.md
│ └── error-playbook.md
└── stack/ ← Generated by /stack-set (language-specific)
├── stack.yaml
├── tech-stack.md
├── snippets.md
└── api-template.*

.agents/skills/oma-design/
├── SKILL.md
├── resources/
│ ├── execution-protocol.md
│ ├── anti-patterns.md
│ ├── checklist.md
│ ├── design-md-spec.md
│ ├── design-tokens.md
│ ├── prompt-enhancement.md
│ ├── stitch-integration.md
│ └── error-playbook.md
├── reference/ ← Deep reference material
│ ├── typography.md
│ ├── color-and-contrast.md
│ ├── spatial-design.md
│ ├── motion-design.md
│ ├── responsive-design.md
│ ├── component-patterns.md
│ ├── accessibility.md
│ └── shader-and-3d.md
└── examples/
├── design-context-example.md
└── landing-page-prompt.md

Per-Skill Resource Types

Resource TypeFilename PatternPurposeWhen Loaded
Execution Protocolexecution-protocol.mdStep-by-step workflow: Analyze -> Plan -> Implement -> VerifyAlways (with SKILL.md)
Tech Stacktech-stack.mdDetailed technology specs, versions, configurationComplex tasks
Error Playbookerror-playbook.mdRecovery procedures with "3 strikes" escalationOn error only
Checklistchecklist.mdDomain-specific quality verificationAt Verify step
Snippetssnippets.mdCopy-paste ready code patternsMedium/Complex tasks
Examplesexamples.md or examples/Few-shot input/output examples for the LLMMedium/Complex tasks
Variantsstack/ directoryLanguage/framework-specific references (generated by /stack-set)When stack exists
Templatescomponent-template.tsx, screen-template.dartBoilerplate file templatesOn component creation
Domain Referenceorm-reference.md, anti-patterns.md, etc.Deep domain knowledge for specific subtasksTask-type specific

Shared Resources (_shared/)

All agents share common foundations from .agents/skills/_shared/. These are organized into three categories:

Core Resources (.agents/skills/_shared/core/)

ResourcePurposeWhen Loaded
skill-routing.mdMaps task keywords to the correct agent. Contains the Skill-Agent Mapping table, Complex Request Routing patterns, Inter-Agent Dependency Rules, Escalation Rules, and Turn Limit Guide.Referenced by orchestrator and coordination skills
context-loading.mdDefines which resources to load for which task type and difficulty. Contains per-agent task-type-to-resource mapping tables and conditional protocol loading triggers.At workflow start (Step 0 / Phase 0)
prompt-structure.mdDefines the four elements every task prompt must contain: Goal, Context, Constraints, Done When. Includes templates for PM, implementation, and QA agents. Lists anti-patterns (starting with only a Goal).Referenced by PM agent and all workflows
clarification-protocol.mdDefines uncertainty levels (LOW/MEDIUM/HIGH) with actions for each. Contains uncertainty triggers, escalation templates, required verification items per agent type, and subagent-mode behavior.When requirements are ambiguous
context-budget.mdToken budget management. Defines file reading strategy (use find_symbol not read_file), resource loading budgets per model tier (Flash: ~3,100 tokens / Pro: ~5,000 tokens), large file handling, and context overflow symptoms.At workflow start
difficulty-guide.mdCriteria for classifying tasks as Simple/Medium/Complex. Defines expected turn counts, protocol branching (Fast Track / Standard / Extended), and misjudgment recovery.At task start (Step 0)
reasoning-templates.mdStructured reasoning fill-in-the-blank templates for common decision patterns (e.g., Exploration Decision template #6 used by the Exploration Loop).During complex decisions
quality-principles.md4 universal quality principles applied across all agents.At workflow start for quality-focused workflows (ultrawork)
vendor-detection.mdProtocol for detecting the current runtime environment (Claude Code, Codex CLI, Gemini CLI, Antigravity, CLI Fallback). Uses marker checks: Agent tool = Claude Code, apply_patch = Codex, @-syntax = Gemini.At workflow start
session-metrics.mdClarification Debt (CD) scoring and session metrics tracking. Defines event types (clarify +10, correct +25, redo +40), thresholds (CD >= 50 = RCA, CD >= 80 = pause), and integration points.During orchestration sessions
common-checklist.mdUniversal quality checklist applied at final verification of Complex tasks (in addition to agent-specific checklists).Verify step of Complex tasks
lessons-learned.mdRepository of past session learnings, auto-generated from Clarification Debt breaches and discarded experiments. Organized by domain section. Includes QA Evaluation Lessons for tracking evaluator blind spots.Referenced after errors and at session end
evaluator-tuning.mdSemi-automated QA prompt tuning protocol. Tracks Evaluation Accuracy (EA) events, triggers tuning when EA >= 30, generates patch suggestions for QA checklists and execution protocols. Includes tuning log and positive reinforcement from good_catch events.When oma retro detects EA threshold breach
api-contracts/Directory containing API contract template and generated contracts. template.md defines the per-endpoint format (method, path, request/response schemas, auth, errors).When cross-boundary work is planned

Runtime Resources (.agents/skills/_shared/runtime/)

ResourcePurpose
memory-protocol.mdMemory file format and operations for CLI subagents. Defines On Start, During Execution, and On Completion protocols using configurable memory tools (read/write/edit). Includes experiment tracking extension.
execution-protocols/claude.mdClaude Code-specific execution patterns. Injected by oma agent:spawn when vendor is claude.
execution-protocols/gemini.mdGemini CLI-specific execution patterns.
execution-protocols/codex.mdCodex CLI-specific execution patterns.
execution-protocols/qwen.mdQwen CLI-specific execution patterns.

Vendor-specific execution protocols are injected automatically by oma agent:spawn — agents do not need to manually load them.

Conditional Resources (.agents/skills/_shared/conditional/)

These are loaded only when specific conditions are met during execution:

ResourceTrigger ConditionLoaded ByApprox. Tokens
quality-score.mdVERIFY or SHIP phase begins in a workflow that supports quality measurementOrchestrator (passes to QA agent prompt)~250
experiment-ledger.mdFirst experiment is recorded after establishing an IMPL baselineOrchestrator (inline, after baseline measurement)~250
exploration-loop.mdSame gate fails twice on the same issueOrchestrator (inline, before spawning hypothesis agents)~250

Budget impact: approximately 750 tokens total if all 3 are loaded. Since loading is conditional, typical sessions load 1-2 of these. Flash-tier budget remains within the approximately 3,100 token allocation.


How Skills Route via skill-routing.md

The skill routing map defines how tasks are matched to agents:

Simple Routing (Single Domain)

A prompt containing "Build a login form with Tailwind CSS" matches the keywords UI, component, form, Tailwind, and routes to oma-frontend.

Complex Request Routing

Multi-domain requests follow established execution orders:

Request PatternExecution Order
"Create a fullstack app"oma-pm -> (oma-backend + oma-frontend) parallel -> oma-qa
"Create a mobile app"oma-pm -> (oma-backend + oma-mobile) parallel -> oma-qa
"Fix bug and review"oma-debug -> oma-qa
"Design and build a landing page"oma-design -> oma-frontend
"I have an idea for a feature"oma-brainstorm -> oma-pm -> relevant agents -> oma-qa
"Do everything automatically"oma-orchestrator (internally: oma-pm -> agents -> oma-qa)

Inter-Agent Dependency Rules

Can run in parallel (no dependencies):

  • oma-backend + oma-frontend (when API contract is pre-defined)
  • oma-backend + oma-mobile (when API contract is pre-defined)
  • oma-frontend + oma-mobile (independent of each other)

Must run sequentially:

  • oma-brainstorm -> oma-pm (design comes before planning)
  • oma-pm -> all other agents (planning comes first)
  • implementation agent -> oma-qa (review after implementation)
  • oma-backend -> oma-frontend/oma-mobile (when no pre-defined API contract)

QA is always last, except when the user requests review of specific files only.


Token Savings Math

Consider a 5-agent orchestration session (pm, backend, frontend, mobile, qa):

Without progressive disclosure:

  • Each agent loads all resources: ~4,000 tokens per agent
  • Total: 5 x 4,000 = 20,000 tokens consumed before any work

With progressive disclosure:

  • Layer 1 only for all agents: 5 x 800 = 4,000 tokens
  • Layer 2 loaded only for active agents (typically 1-2 at a time): +1,500 tokens
  • Total: ~5,500 tokens

Savings: approximately 72-75%

On flash-tier models (128K context), this is the difference between having 108K tokens available for work versus 125K tokens — a significant margin for complex tasks.


Resource Loading by Task Difficulty

The difficulty guide classifies tasks into three levels, which determine how much of Layer 2 is loaded:

Simple (3-5 turns expected)

Single file change, clear requirements, repeating existing patterns.

Loads: execution-protocol.md only. Skip analysis, proceed directly to implementation with minimal checklist.

Medium (8-15 turns expected)

2-3 file changes, some design decisions needed, applying patterns to new domains.

Loads: execution-protocol.md + examples.md. Standard protocol with brief analysis and full verification.

Complex (15-25 turns expected)

4+ file changes, architecture decisions required, introducing new patterns, dependencies on other agents.

Loads: execution-protocol.md + examples.md + tech-stack.md + snippets.md. Extended protocol with checkpoints, mid-execution progress recording, and full verification including common-checklist.md.


Context-Loading Task Maps (Per Agent)

The context-loading guide provides detailed task-type-to-resource mappings. Here are the key mappings:

Backend Agent

Task TypeRequired Resources
CRUD API creationstack/snippets.md (route, schema, model, test)
Authenticationstack/snippets.md (JWT, password) + stack/tech-stack.md
DB migrationstack/snippets.md (migration)
Performance optimizationexamples.md (N+1 example)
Existing code modificationexamples.md + Serena MCP

Frontend Agent

Task TypeRequired Resources
Component creationsnippets.md + component-template.tsx
Form implementationsnippets.md (form + Zod)
API integrationsnippets.md (TanStack Query)
Stylingtailwind-rules.md
Page layoutsnippets.md (grid) + examples.md

Design Agent

Task TypeRequired Resources
Design system creationreference/typography.md + reference/color-and-contrast.md + reference/spatial-design.md + design-md-spec.md
Landing page designreference/component-patterns.md + reference/motion-design.md + prompt-enhancement.md + examples/landing-page-prompt.md
Design auditchecklist.md + anti-patterns.md
Design token exportdesign-tokens.md
3D / shader effectsreference/shader-and-3d.md + reference/motion-design.md
Accessibility reviewreference/accessibility.md + checklist.md

QA Agent

Task TypeRequired Resources
Security reviewchecklist.md (Security section)
Performance reviewchecklist.md (Performance section)
Accessibility reviewchecklist.md (Accessibility section)
Full auditchecklist.md (full) + self-check.md
Quality scoringquality-score.md (conditional)

Orchestrator Prompt Composition

When the orchestrator composes prompts for subagents, it includes only task-relevant resources:

  1. Agent SKILL.md's Core Rules section
  2. execution-protocol.md
  3. Resources matching the specific task type (from the maps above)
  4. error-playbook.md (always included — recovery is essential)
  5. Serena Memory Protocol (CLI mode)

This targeted composition avoids loading unnecessary resources, maximizing the subagent's available context for actual work.


Clarification Debt & Session Metrics (Deep Dive)

Clarification Debt (CD) measures the cost of unclear requirements during a session. The orchestrator tracks every user correction and scores it:

Event TypePointsDescription
clarify+10Simple clarification question (expected for MEDIUM uncertainty)
correct+25Intent misunderstanding requiring direction change
redo+40Scope/charter violation requiring rollback and restart
blocked+0Agent correctly stopped and asked (good behavior — not penalized)

Modifiers: Charter not read (+15), allowlist violation (+20), same error repeated (x1.5).

Thresholds and enforcement:

  • CD >= 50 → Mandatory RCA entry added to lessons-learned.md
  • CD >= 80 → Session halted, user must re-specify requirements
  • redo >= 2 → Orchestrator pauses and requests explicit scope confirmation
  • CD >= 30 across 3 consecutive sessions for the same agent → Agent prompt template review

The session log is maintained in .serena/memories/session-metrics.md with per-event rows (turn, agent, event type, points, detail) and a summary section.


Evaluator Accuracy & QA Tuning

QA agents improve through tracked judgment errors. Unlike CD (real-time), Evaluator Accuracy (EA) is retrospective — most errors are discovered after the session ends.

EA event types:

EventPointsWhen Discovered
false_negative+30Next session or production — bug that QA missed
false_positive+15During session — impl agent successfully disputes QA finding
severity_mismatch+10During session or retro — wrong severity assigned
missed_stub+20Runtime verification catches display-only feature
good_catch-10QA caught a non-obvious bug (positive reward signal)

EA is calculated on a rolling 3-session window. Thresholds:

  • EA >= 30oma retro flags QA patterns for review (tuning suggested)
  • EA >= 50 → Tuning required: update QA execution-protocol.md
  • false_negative >= 3 across window → Add detection pattern to QA checklist.md
  • good_catch >= 5 across window → Document and propagate successful pattern

The full tuning loop is defined in evaluator-tuning.md: sessions accumulate EA events → threshold triggers oma retro → report categorizes errors and suggests patches → user reviews and approves → patches applied to QA checklist/protocol → validation over next 3 sessions.


Sprint Decomposition for Complex Tasks

Complex tasks (4+ files, architecture decisions) use sprint-based execution rather than a single long run:

  1. Decompose into 2-4 feature-focused sprints, each independently testable
  2. Target 5-8 turns per sprint
  3. Sprint Gate after each sprint:
    • Sprint deliverable complete?
    • Lint/test pass?
    • If sprint took 2x expected turns → write checkpoint, inform user
  4. Continue to next sprint on gate pass

Example: Task "JWT auth + CRUD API + tests" decomposes into:

  • Sprint 1: User model + auth endpoints (register/login)
  • Sprint 2: CRUD endpoints + validation
  • Sprint 3: Tests + error handling

Difficulty misjudgment recovery: If a task started as Simple but proves more complex, the agent upgrades to Medium or Complex protocol mid-execution and records the change in progress.


Context Reset Protocol

Long-running agents degrade in quality as context fills up. The Orchestrator (not the agent itself) monitors for this and triggers resets.

Trigger conditions (Orchestrator checks during monitoring):

ConditionDetectionAction
Turn budget exhaustionAgent consumed >= 80% of expected turns AND acceptance criteria < 50% completeContext Reset
Progress stallNo progress file update for 3+ consecutive monitoring cyclesContext Reset
Shallow outputResult file contains stub markers or TODO placeholdersRe-spawn with explicit instruction

Reset procedure:

  1. Checkpoint — Save agent's current state (completed items, remaining items, key decisions)
  2. Terminate — Stop the current agent run
  3. Re-spawn — Start a fresh agent with the checkpoint as context
  4. Resume — New agent reads checkpoint, continues from remaining items only

For standalone agents (no Orchestrator), the Sprint Gate in difficulty-guide.md serves as the safety net — if a sprint takes 2x expected turns, the agent writes a checkpoint and informs the user.