Skills

Skills are structured knowledge packages that give each agent its domain expertise. They are not just prompts — they contain execution protocols, tech stack references, code templates, error playbooks, quality checklists, and few-shot examples, organized in a two-layer architecture designed for token efficiency.

The Two-Layer Design

Layer 1: SKILL.md (~800 bytes, always loaded)

Every skill has a SKILL.md file at its root. This is always loaded into the context window when the skill is referenced. It contains:

YAML frontmatter with name and description (used for routing and display)
When to use / When NOT to use — explicit activation conditions
Core rules — the 5-15 most critical constraints for the domain
Architecture overview — how code should be structured
Library list — approved dependencies and their purposes
References — pointers to Layer 2 resources (never loaded automatically)

Example frontmatter:

---
name: oma-frontend
description: Frontend specialist for React, Next.js, TypeScript with FSD-lite architecture, shadcn/ui, and design system alignment. Use for UI, component, page, layout, CSS, Tailwind, and shadcn work.
---

The description field is critical — it contains the routing keywords that the skill routing system uses to match tasks to agents.

Layer 2: resources/ (loaded on-demand)

The resources/ directory contains deep execution knowledge. These files are loaded only when:

The agent is explicitly invoked (via /command or agent skills field)
The specific resource is needed for the current task type and difficulty

This on-demand loading is governed by the context-loading guide (.agents/skills/_shared/core/context-loading.md), which maps task types to required resources per agent.

File Structure Example

.agents/skills/oma-frontend/
├── SKILL.md                          ← Layer 1: always loaded (~800 bytes)
└── resources/
    ├── execution-protocol.md         ← Layer 2: step-by-step workflow
    ├── tech-stack.md                 ← Layer 2: detailed technology specs
    ├── tailwind-rules.md             ← Layer 2: Tailwind-specific conventions
    ├── component-template.tsx        ← Layer 2: React component template
    ├── snippets.md                   ← Layer 2: copy-paste code patterns
    ├── error-playbook.md             ← Layer 2: error recovery procedures
    ├── checklist.md                  ← Layer 2: quality verification checklist
    └── examples/                     ← Layer 2: few-shot input/output examples
        └── examples.md

.agents/skills/oma-backend/
├── SKILL.md
├── resources/
│   ├── execution-protocol.md
│   ├── examples.md
│   ├── orm-reference.md              ← Domain-specific (ORM queries, N+1, transactions)
│   ├── checklist.md
│   └── error-playbook.md
└── stack/                             ← Generated by /stack-set (language-specific)
    ├── stack.yaml
    ├── tech-stack.md
    ├── snippets.md
    └── api-template.*

.agents/skills/oma-design/
├── SKILL.md
├── resources/
│   ├── execution-protocol.md
│   ├── anti-patterns.md
│   ├── checklist.md
│   ├── design-md-spec.md
│   ├── design-tokens.md
│   ├── prompt-enhancement.md
│   ├── stitch-integration.md
│   └── error-playbook.md
├── reference/                         ← Deep reference material
│   ├── typography.md
│   ├── color-and-contrast.md
│   ├── spatial-design.md
│   ├── motion-design.md
│   ├── responsive-design.md
│   ├── component-patterns.md
│   ├── accessibility.md
│   └── shader-and-3d.md
└── examples/
    ├── design-context-example.md
    └── landing-page-prompt.md

Per-Skill Resource Types

Resource Type	Filename Pattern	Purpose	When Loaded
Execution Protocol	`execution-protocol.md`	Step-by-step workflow: Analyze -> Plan -> Implement -> Verify	Always (with SKILL.md)
Tech Stack	`tech-stack.md`	Detailed technology specs, versions, configuration	Complex tasks
Error Playbook	`error-playbook.md`	Recovery procedures with "3 strikes" escalation	On error only
Checklist	`checklist.md`	Domain-specific quality verification	At Verify step
Snippets	`snippets.md`	Copy-paste ready code patterns	Medium/Complex tasks
Examples	`examples.md` or `examples/`	Few-shot input/output examples for the LLM	Medium/Complex tasks
Variants	`stack/` directory	Language/framework-specific references (generated by `/stack-set`)	When stack exists
Templates	`component-template.tsx`, `screen-template.dart`	Boilerplate file templates	On component creation
Domain Reference	`orm-reference.md`, `anti-patterns.md`, etc.	Deep domain knowledge for specific subtasks	Task-type specific

Shared Resources (_shared/)

All agents share common foundations from .agents/skills/_shared/. These are organized into three categories:

Core Resources (`.agents/skills/_shared/core/`)

Resource	Purpose	When Loaded
`skill-routing.md`	Maps task keywords to the correct agent. Contains the Skill-Agent Mapping table, Complex Request Routing patterns, Inter-Agent Dependency Rules, Escalation Rules, and Turn Limit Guide.	Referenced by orchestrator and coordination skills
`context-loading.md`	Defines which resources to load for which task type and difficulty. Contains per-agent task-type-to-resource mapping tables and conditional protocol loading triggers.	At workflow start (Step 0 / Phase 0)
`prompt-structure.md`	Defines the four elements every task prompt must contain: Goal, Context, Constraints, Done When. Includes templates for PM, implementation, and QA agents. Lists anti-patterns (starting with only a Goal).	Referenced by PM agent and all workflows
`clarification-protocol.md`	Defines uncertainty levels (LOW/MEDIUM/HIGH) with actions for each. Contains uncertainty triggers, escalation templates, required verification items per agent type, and subagent-mode behavior.	When requirements are ambiguous
`context-budget.md`	Token budget management. Defines file reading strategy (use `find_symbol` not `read_file`), resource loading budgets per model tier (Flash: ~3,100 tokens / Pro: ~5,000 tokens), large file handling, and context overflow symptoms.	At workflow start
`difficulty-guide.md`	Criteria for classifying tasks as Simple/Medium/Complex. Defines expected turn counts, protocol branching (Fast Track / Standard / Extended), and misjudgment recovery.	At task start (Step 0)
`reasoning-templates.md`	Structured reasoning fill-in-the-blank templates for common decision patterns (e.g., Exploration Decision template #6 used by the Exploration Loop).	During complex decisions
`quality-principles.md`	4 universal quality principles applied across all agents.	At workflow start for quality-focused workflows (ultrawork)
`vendor-detection.md`	Protocol for detecting the current runtime environment (Claude Code, Codex CLI, Gemini CLI, Antigravity, CLI Fallback). Uses marker checks: Agent tool = Claude Code, apply_patch = Codex, @-syntax = Gemini.	At workflow start
`session-metrics.md`	Clarification Debt (CD) scoring and session metrics tracking. Defines event types (clarify +10, correct +25, redo +40), thresholds (CD >= 50 = RCA, CD >= 80 = pause), and integration points.	During orchestration sessions
`common-checklist.md`	Universal quality checklist applied at final verification of Complex tasks (in addition to agent-specific checklists).	Verify step of Complex tasks
`lessons-learned.md`	Repository of past session learnings, auto-generated from Clarification Debt breaches and discarded experiments. Organized by domain section. Includes QA Evaluation Lessons for tracking evaluator blind spots.	Referenced after errors and at session end
`evaluator-tuning.md`	Semi-automated QA prompt tuning protocol. Tracks Evaluation Accuracy (EA) events, triggers tuning when EA >= 30, generates patch suggestions for QA checklists and execution protocols. Includes tuning log and positive reinforcement from `good_catch` events.	When `oma retro` detects EA threshold breach
`api-contracts/`	Directory containing API contract template and generated contracts. `template.md` defines the per-endpoint format (method, path, request/response schemas, auth, errors).	When cross-boundary work is planned

Runtime Resources (`.agents/skills/_shared/runtime/`)

Resource	Purpose
`memory-protocol.md`	Memory file format and operations for CLI subagents. Defines On Start, During Execution, and On Completion protocols using configurable memory tools (read/write/edit). Includes experiment tracking extension.
`execution-protocols/claude.md`	Claude Code-specific execution patterns. Injected by `oma agent:spawn` when vendor is claude.
`execution-protocols/gemini.md`	Gemini CLI-specific execution patterns.
`execution-protocols/codex.md`	Codex CLI-specific execution patterns.
`execution-protocols/qwen.md`	Qwen CLI-specific execution patterns.

Vendor-specific execution protocols are injected automatically by oma agent:spawn — agents do not need to manually load them.

Conditional Resources (`.agents/skills/_shared/conditional/`)

These are loaded only when specific conditions are met during execution:

Resource	Trigger Condition	Loaded By	Approx. Tokens
`quality-score.md`	VERIFY or SHIP phase begins in a workflow that supports quality measurement	Orchestrator (passes to QA agent prompt)	~250
`experiment-ledger.md`	First experiment is recorded after establishing an IMPL baseline	Orchestrator (inline, after baseline measurement)	~250
`exploration-loop.md`	Same gate fails twice on the same issue	Orchestrator (inline, before spawning hypothesis agents)	~250

Budget impact: approximately 750 tokens total if all 3 are loaded. Since loading is conditional, typical sessions load 1-2 of these. Flash-tier budget remains within the approximately 3,100 token allocation.

How Skills Route via skill-routing.md

The skill routing map defines how tasks are matched to agents:

Simple Routing (Single Domain)

A prompt containing "Build a login form with Tailwind CSS" matches the keywords UI, component, form, Tailwind, and routes to oma-frontend.

Complex Request Routing

Multi-domain requests follow established execution orders:

Request Pattern	Execution Order
"Create a fullstack app"	oma-pm -> (oma-backend + oma-frontend) parallel -> oma-qa
"Create a mobile app"	oma-pm -> (oma-backend + oma-mobile) parallel -> oma-qa
"Fix bug and review"	oma-debug -> oma-qa
"Design and build a landing page"	oma-design -> oma-frontend
"I have an idea for a feature"	oma-brainstorm -> oma-pm -> relevant agents -> oma-qa
"Do everything automatically"	oma-orchestrator (internally: oma-pm -> agents -> oma-qa)

Inter-Agent Dependency Rules

Can run in parallel (no dependencies):

oma-backend + oma-frontend (when API contract is pre-defined)
oma-backend + oma-mobile (when API contract is pre-defined)
oma-frontend + oma-mobile (independent of each other)

Must run sequentially:

oma-brainstorm -> oma-pm (design comes before planning)
oma-pm -> all other agents (planning comes first)
implementation agent -> oma-qa (review after implementation)
oma-backend -> oma-frontend/oma-mobile (when no pre-defined API contract)

QA is always last, except when the user requests review of specific files only.

Token Savings Math

Consider a 5-agent orchestration session (pm, backend, frontend, mobile, qa):

Without progressive disclosure:

Each agent loads all resources: ~4,000 tokens per agent
Total: 5 x 4,000 = 20,000 tokens consumed before any work

With progressive disclosure:

Layer 1 only for all agents: 5 x 800 = 4,000 tokens
Layer 2 loaded only for active agents (typically 1-2 at a time): +1,500 tokens
Total: ~5,500 tokens

Savings: approximately 72-75%

On flash-tier models (128K context), this is the difference between having 108K tokens available for work versus 125K tokens — a significant margin for complex tasks.

Resource Loading by Task Difficulty

The difficulty guide classifies tasks into three levels, which determine how much of Layer 2 is loaded:

Simple (3-5 turns expected)

Single file change, clear requirements, repeating existing patterns.

Loads: execution-protocol.md only. Skip analysis, proceed directly to implementation with minimal checklist.

Medium (8-15 turns expected)

2-3 file changes, some design decisions needed, applying patterns to new domains.

Loads: execution-protocol.md + examples.md. Standard protocol with brief analysis and full verification.

Complex (15-25 turns expected)

4+ file changes, architecture decisions required, introducing new patterns, dependencies on other agents.

Loads: execution-protocol.md + examples.md + tech-stack.md + snippets.md. Extended protocol with checkpoints, mid-execution progress recording, and full verification including common-checklist.md.

Context-Loading Task Maps (Per Agent)

The context-loading guide provides detailed task-type-to-resource mappings. Here are the key mappings:

Backend Agent

Task Type	Required Resources
CRUD API creation	stack/snippets.md (route, schema, model, test)
Authentication	stack/snippets.md (JWT, password) + stack/tech-stack.md
DB migration	stack/snippets.md (migration)
Performance optimization	examples.md (N+1 example)
Existing code modification	examples.md + Serena MCP

Frontend Agent

Task Type	Required Resources
Component creation	snippets.md + component-template.tsx
Form implementation	snippets.md (form + Zod)
API integration	snippets.md (TanStack Query)
Styling	tailwind-rules.md
Page layout	snippets.md (grid) + examples.md

Design Agent

Task Type	Required Resources
Design system creation	reference/typography.md + reference/color-and-contrast.md + reference/spatial-design.md + design-md-spec.md
Landing page design	reference/component-patterns.md + reference/motion-design.md + prompt-enhancement.md + examples/landing-page-prompt.md
Design audit	checklist.md + anti-patterns.md
Design token export	design-tokens.md
3D / shader effects	reference/shader-and-3d.md + reference/motion-design.md
Accessibility review	reference/accessibility.md + checklist.md

QA Agent

Task Type	Required Resources
Security review	checklist.md (Security section)
Performance review	checklist.md (Performance section)
Accessibility review	checklist.md (Accessibility section)
Full audit	checklist.md (full) + self-check.md
Quality scoring	quality-score.md (conditional)

Orchestrator Prompt Composition

When the orchestrator composes prompts for subagents, it includes only task-relevant resources:

Agent SKILL.md's Core Rules section
execution-protocol.md
Resources matching the specific task type (from the maps above)
error-playbook.md (always included — recovery is essential)
Serena Memory Protocol (CLI mode)

This targeted composition avoids loading unnecessary resources, maximizing the subagent's available context for actual work.

Clarification Debt & Session Metrics (Deep Dive)

Clarification Debt (CD) measures the cost of unclear requirements during a session. The orchestrator tracks every user correction and scores it:

Event Type	Points	Description
`clarify`	+10	Simple clarification question (expected for MEDIUM uncertainty)
`correct`	+25	Intent misunderstanding requiring direction change
`redo`	+40	Scope/charter violation requiring rollback and restart
`blocked`	+0	Agent correctly stopped and asked (good behavior — not penalized)

Modifiers: Charter not read (+15), allowlist violation (+20), same error repeated (x1.5).

Thresholds and enforcement:

CD >= 50 → Mandatory RCA entry added to lessons-learned.md
CD >= 80 → Session halted, user must re-specify requirements
redo >= 2 → Orchestrator pauses and requests explicit scope confirmation
CD >= 30 across 3 consecutive sessions for the same agent → Agent prompt template review

The session log is maintained in .serena/memories/session-metrics.md with per-event rows (turn, agent, event type, points, detail) and a summary section.

Evaluator Accuracy & QA Tuning

QA agents improve through tracked judgment errors. Unlike CD (real-time), Evaluator Accuracy (EA) is retrospective — most errors are discovered after the session ends.

EA event types:

Event	Points	When Discovered
`false_negative`	+30	Next session or production — bug that QA missed
`false_positive`	+15	During session — impl agent successfully disputes QA finding
`severity_mismatch`	+10	During session or retro — wrong severity assigned
`missed_stub`	+20	Runtime verification catches display-only feature
`good_catch`	-10	QA caught a non-obvious bug (positive reward signal)

EA is calculated on a rolling 3-session window. Thresholds:

EA >= 30 → oma retro flags QA patterns for review (tuning suggested)
EA >= 50 → Tuning required: update QA execution-protocol.md
false_negative >= 3 across window → Add detection pattern to QA checklist.md
good_catch >= 5 across window → Document and propagate successful pattern

The full tuning loop is defined in evaluator-tuning.md: sessions accumulate EA events → threshold triggers oma retro → report categorizes errors and suggests patches → user reviews and approves → patches applied to QA checklist/protocol → validation over next 3 sessions.

Sprint Decomposition for Complex Tasks

Complex tasks (4+ files, architecture decisions) use sprint-based execution rather than a single long run:

Decompose into 2-4 feature-focused sprints, each independently testable
Target 5-8 turns per sprint
Sprint Gate after each sprint:
- Sprint deliverable complete?
- Lint/test pass?
- If sprint took 2x expected turns → write checkpoint, inform user
Continue to next sprint on gate pass

Example: Task "JWT auth + CRUD API + tests" decomposes into:

Sprint 1: User model + auth endpoints (register/login)
Sprint 2: CRUD endpoints + validation
Sprint 3: Tests + error handling

Difficulty misjudgment recovery: If a task started as Simple but proves more complex, the agent upgrades to Medium or Complex protocol mid-execution and records the change in progress.

Context Reset Protocol

Long-running agents degrade in quality as context fills up. The Orchestrator (not the agent itself) monitors for this and triggers resets.

Trigger conditions (Orchestrator checks during monitoring):

Condition	Detection	Action
Turn budget exhaustion	Agent consumed >= 80% of expected turns AND acceptance criteria < 50% complete	Context Reset
Progress stall	No progress file update for 3+ consecutive monitoring cycles	Context Reset
Shallow output	Result file contains stub markers or TODO placeholders	Re-spawn with explicit instruction

Reset procedure:

Checkpoint — Save agent's current state (completed items, remaining items, key decisions)
Terminate — Stop the current agent run
Re-spawn — Start a fresh agent with the checkpoint as context
Resume — New agent reads checkpoint, continues from remaining items only

For standalone agents (no Orchestrator), the Sprint Gate in difficulty-guide.md serves as the safety net — if a sprint takes 2x expected turns, the agent writes a checkpoint and informs the user.

The Two-Layer Design​

Layer 1: SKILL.md (~800 bytes, always loaded)​

Layer 2: resources/ (loaded on-demand)​

File Structure Example​

Per-Skill Resource Types​

Shared Resources (_shared/)​

Core Resources (.agents/skills/_shared/core/)​

Runtime Resources (.agents/skills/_shared/runtime/)​

Conditional Resources (.agents/skills/_shared/conditional/)​

How Skills Route via skill-routing.md​

Simple Routing (Single Domain)​

Complex Request Routing​

Inter-Agent Dependency Rules​

Token Savings Math​

Resource Loading by Task Difficulty​

Simple (3-5 turns expected)​

Medium (8-15 turns expected)​

Complex (15-25 turns expected)​

Context-Loading Task Maps (Per Agent)​

Backend Agent​

Frontend Agent​

Design Agent​

QA Agent​

Orchestrator Prompt Composition​

Clarification Debt & Session Metrics (Deep Dive)​

Evaluator Accuracy & QA Tuning​

Sprint Decomposition for Complex Tasks​

Context Reset Protocol​