Subagent-Driven Development

Execution framework that dispatches fresh subagents for each task with quality gates between iterations, enabling fast parallel development while maintaining code quality.

Focused on:

  • Fresh context per task - Each subagent starts clean without context pollution from previous tasks

  • Quality gates - Code review between tasks catches issues early before they compound

  • Parallel execution - Independent tasks run concurrently for faster completion

  • Sequential execution - Dependent tasks execute in order with review checkpoints

Plugin Target

  • Prevent context pollution - Fresh subagents avoid accumulated confusion from long sessions

  • Catch issues early - Code review between tasks prevents bugs from compounding

  • Faster iteration - Parallel execution of independent tasks saves time

  • Maintain quality at scale - Quality gates ensure standards are met on every task

Overview

The SADD plugin provides skills and commands for executing work through coordinated subagents. Instead of executing all tasks in a single long session where context accumulates and quality degrades, SADD dispatches fresh subagents with quality gates.

Core capabilities:

  • Sequential/Parallel Execution - Execute implementation plans task-by-task with code review gates

  • Competitive Execution - Generate multiple solutions, evaluate with judges, synthesize best elements

  • Work Evaluation - Assess completed work using LLM-as-Judge with structured rubrics

This approach solves the "context pollution" problem - when an agent accumulates confusion, outdated assumptions, or implementation drift over long sessions. Each fresh subagent starts clean, implements its specific scope, and reports back for quality validation.

The plugin supports multiple execution strategies based on task characteristics, all with built-in quality gates.

Quick Start

Usage Examples

Commands Overview

launch-sub-agent

This command launches a focused sub-agent to execute the provided task. Analyze the task to intelligently select the optimal model and agent configuration, then dispatch a sub-agent with Zero-shot Chain-of-Thought reasoning at the beginning and mandatory self-critique verification at the end. It implements the Supervisor/Orchestrator pattern from multi-agent architectures where you (the orchestrator) dispatch focused sub-agents with isolated context. The primary benefit is context isolation - each sub-agent operates in a clean context window focused on its specific task without accumulated context pollution.

Usage

Agent output:

Advanced Options

Explicit Model Override

When you know the appropriate model tier, override automatic selection:

Explicit Agent Selection

Force use of a specific specialized agent:

Output Location

Specify where results should be written:

Combined Options

Core design principles:

  • Context isolation: Sub-agents operate with fresh context, preventing confirmation bias and attention scarcity

  • Intelligent model selection: Match model capability to task complexity for optimal quality/cost tradeoff

  • Specialized agent routing: Domain experts handle domain-specific tasks

  • Zero-shot CoT: Systematic reasoning at task start improves quality by 20-60%

  • Self-critique: Verification loop catches 40-60% of issues before delivery

When to use this command:

  • Tasks that benefit from fresh, focused context

  • Tasks where model selection matters (quality vs. cost tradeoffs)

  • Delegating work while maintaining quality gates

  • Single, well-defined tasks with clear deliverables

When NOT to use:

  • Simple tasks you can complete directly (overhead not justified)

  • Tasks requiring conversation history or accumulated session context

  • Exploratory work where scope is undefined

Theoretical Foundation

Zero-shot Chain-of-Thought (Kojima et al., 2022)

Constitutional AI / Self-Critique (Bai et al., 2022)

Multi-Agent Context Isolation (Multi-agent architecture patterns)

  • Fresh context prevents accumulated confusion and attention scarcity

  • Focused tasks produce better results than context-polluted sessions

  • Supervisor pattern enables quality gates between delegated work

/do-in-parallel

Execute tasks in parallel across multiple targets with intelligent model selection, independence validation, and quality-focused prompting.

  • Purpose - Execute the same task across multiple independent targets in parallel

  • Pattern - Supervisor/Orchestrator with parallel dispatch and context isolation

  • Output - Multiple solutions, one per target, with aggregated summary

  • Quality - Enhanced with Zero-shot CoT, Constitutional AI self-critique, and intelligent model selection

  • Efficiency - Dramatic time savings through concurrent execution of independent work

Pattern: Parallel Orchestration with Independence Validation

This command implements a six-phase parallel orchestration pattern:

Usage

Advanced Options

When to Use

Good use cases:

  • Same operation across multiple files (refactoring, formatting)

  • Independent transformations (each file stands alone)

  • Batch documentation generation (API docs per module)

  • Parallel analysis tasks (security audit per component)

  • Multi-file code generation (tests per service)

Do NOT use when:

  • Only one target → use /launch-sub-agent instead

  • Targets have dependencies → use /do-in-steps instead

  • Tasks require sequential ordering → use /do-in-steps instead

  • Shared state needed between executions → use /do-in-steps instead

  • Quality-critical tasks needing comparison → use /do-competitively instead

Context Isolation Best Practices

  • Minimal context: Each sub-agent receives only what it needs for its target

  • No cross-references: Don't tell Agent A about Agent B's target

  • Let them discover: Sub-agents read files to understand local patterns

  • File system as truth: Changes are coordinated through the filesystem

Theoretical Foundation

Zero-shot Chain-of-Thought (Kojima et al., 2022)

Constitutional AI / Self-Critique (Bai et al., 2022)

Multi-Agent Context Isolation (Multi-agent architecture patterns)

  • Fresh context prevents accumulated confusion

  • Focused tasks produce better results than context-polluted sessions

  • Reference: Multi-Agent Debatearrow-up-right (Du et al., 2023)

/do-in-steps

Execute complex tasks through sequential sub-agent orchestration with automatic decomposition, intelligent model selection, context passing between steps, and mandatory self-critique verification.

  • Purpose - Execute dependent tasks sequentially where each step builds on previous outputs

  • Pattern - Supervisor/Orchestrator with sequential dispatch and context accumulation

  • Output - Comprehensive report with all step results and integration summary

  • Quality - Enhanced with Zero-shot CoT, Constitutional AI self-critique, and per-step model optimization

  • Key Benefit - Prevents context pollution while maintaining necessary continuity between dependent steps

Usage

Pattern: Sequential Orchestration with Context Passing

This command implements a four-phase sequential orchestration pattern:

When to Use

Good use cases:

  • Changes that cascade through multiple files/layers

  • Interface modifications with consumers to update

  • Feature additions spanning multiple components

  • Bug fixes with rippling effects

  • Refactoring with dependency chains

  • Any task where "Step N depends on Step N-1"

Do NOT use when:

  • Independent tasks that could run in parallel → use /do-in-parallel

  • Single-step tasks → use /launch-sub-agent

  • Tasks needing exploration before commitment → use /tree-of-thoughts

  • High-stakes tasks needing multiple approaches → use /do-competitively

Theoretical Foundation

Chain-of-Thought Prompting (Wei et al., 2022)

Zero-shot Chain of Thought (Kojima et al., 2022)

Constitutional AI / Self-Critique (Bai et al., 2022)

  • Self-critique loops catch 40-60% of issues before delivery

  • Each sub-agent verifies integration with previous steps

Multi-Agent Context Isolation (Du et al., 2023)

  • Fresh context per sub-agent prevents accumulated confusion

  • Context passing maintains necessary continuity without pollution

do-competitively - Competitive Multi-Agent Synthesis

Execute tasks through competitive generation, multi-judge evaluation, and evidence-based synthesis to produce superior results.

  • Purpose - Generate multiple solutions competitively, evaluate with independent judges, synthesize best elements

  • Pattern - Generate-Critique-Synthesize (GCS) with self-critique, verification loops, and adaptive strategy selection

  • Output - Superior solution combining best elements from all candidates

  • Quality - Enhanced with Constitutional AI self-critique, Chain of Verification, and intelligent strategy selection

  • Efficiency - 15-20% average cost savings through adaptive strategy (polish clear winners, redesign failures)

Pattern: Generate-Critique-Synthesize (GCS)

This command implements a four-phase adaptive competitive orchestration pattern with quality enhancement loops:

Usage

When to Use

Use this command when:

  • Quality is critical - Multiple perspectives catch flaws single agents miss

  • Novel/ambiguous tasks - No clear "right answer", exploration needed

  • High-stakes decisions - Architecture choices, API design, critical algorithms

  • Learning/evaluation - Compare approaches to understand trade-offs

  • Avoiding local optima - Competitive generation explores solution space better

Do NOT use when:

  • Simple, well-defined tasks with obvious solutions

  • Time-sensitive changes

  • Trivial bug fixes or typos

  • Tasks with only one viable approach

Quality Enhancement Techniques

Techniques that were used to enhance the quality of the competitive execution pattern.

Phase
Technique
Benefit

Phase 1

Constitutional AI Self-Critique

Generators review and fix their own solutions before submission, catching 40-60% of issues

Phase 2

Chain of Verification

Judges verify their evaluations with structured questions, improving calibration and reducing bias

Phase 2.5

Adaptive Strategy Selection

Orchestrator parses structured judge outputs (VOTE+SCORES) to select optimal strategy, saving 15-20% cost on average

Phase 3

Evidence-Based Synthesis

Combines proven best elements rather than creating new solutions (only when needed)

Theoretical Foundation

The competitive execution pattern combines insights from:

Academic Research:

Engineering Practices:

  • Design Studio Method - Parallel design, critique, synthesis

  • Spike Solutions (XP/Agile) - Explore approaches, combine best

  • A/B Testing - Compare alternatives with clear metrics

  • Ensemble Methods - Combining multiple models improves performance

tree-of-thoughts - Tree of Thoughts with Adaptive Strategy

Execute complex reasoning tasks through systematic exploration of solution space, pruning unpromising branches, expanding viable approaches, and synthesizing the best solution.

  • Purpose - Explore multiple solution paths before committing to full implementation

  • Pattern - Tree of Thoughts (ToT) with adaptive strategy selection

  • Output - Superior solution combining systematic exploration with evidence-based synthesis

  • Quality - Enhanced with probability estimates, multi-stage evaluation, and adaptive strategy

  • Efficiency - 15-20% average cost savings through adaptive strategy (polish clear winners, redesign failures)

Pattern: Tree of Thoughts (ToT)

This command implements a six-phase systematic reasoning pattern with adaptive strategy selection:

Usage

When to Use

Use ToT when:

  • Solution space is large and poorly understood

  • Wrong approach chosen early would waste significant effort

  • Task has multiple valid approaches with different trade-offs

  • Quality is more important than speed

  • You need to explore before committing

Don't use ToT when:

  • Solution approach is obvious

  • Task is simple or well-defined

  • Speed matters more than exploration

  • Only one reasonable approach exists

Quality Enhancement Techniques

Phase
Technique
Benefit

Phase 1

Probabilistic Sampling

Explorers generate approaches with probability estimates, encouraging diversity

Phase 2

Multi-Judge Pruning

Independent judges vote on top 3 proposals, reducing groupthink

Phase 3

Feedback-Aware Expansion

Expanders address concerns raised during pruning

Phase 4

Chain of Verification

Judges verify evaluations with structured questions, reducing bias

Phase 4.5

Adaptive Strategy Selection

Orchestrator parses structured outputs to select optimal strategy

Phase 5

Evidence-Based Synthesis

Combines proven best elements rather than creating new solutions

Theoretical Foundation

Based on:

judge-with-debate - Multi-Agent Debate Evaluation

Evaluate solutions through iterative multi-judge debate where independent judges analyze, challenge each other's assessments, and refine evaluations until reaching consensus or maximum rounds.

  • Purpose - Rigorous evaluation through adversarial critique and evidence-based argumentation

  • Pattern - Independent Analysis → Iterative Debate → Consensus or Disagreement Report

  • Output - Consensus evaluation report with averaged scores and debate summary, or disagreement report flagging unresolved issues

  • Quality - Enhanced through multi-perspective analysis, evidence-based argumentation, and iterative refinement

  • Efficiency - Early termination when consensus reached or judges stop converging

Pattern: Debate-Based Evaluation

This command implements iterative multi-judge debate with filesystem-based communication:

Usage

When to Use

Use debate when:

  • High-stakes decisions requiring rigorous evaluation

  • Subjective criteria where perspectives differ legitimately

  • Complex solutions with many evaluation dimensions

  • Quality is more important than speed/cost

  • Initial judge assessments show significant disagreement

  • You need defensible, evidence-based evaluation

Skip debate when:

  • Objective pass/fail criteria (use simple validation)

  • Trivial solutions (single judge sufficient)

  • Time/cost constraints prohibit multiple rounds

  • Clear rubrics leave little room for interpretation

  • Evaluation criteria are purely mechanical (linting, formatting)

Quality Enhancement Techniques

Phase
Technique
Benefit

Phase 1

Chain of Verification

Judges generate verification questions and self-critique before submitting initial assessment

Phase 1

Evidence Requirement

All scores must be supported by specific quotes from solution

Phase 2

Filesystem Communication

Judges read each other's reports directly, orchestrator never mediates (prevents context overflow)

Phase 2

Structured Argumentation

Judges must defend positions AND challenge others with counter-evidence

Phase 2

Explicit Revision

Judges must document what changed their mind or why they maintained their position

Consensus

Adaptive Termination

Stops early if consensus reached, max rounds hit, or judges stop converging

Process Flow

Step 1: Independent Analysis

  • 3 judges analyze solution in parallel

  • Each writes comprehensive report to report.[1|2|3].md

  • Includes per-criterion scores, evidence, overall assessment

Step 2: Check Consensus

  • Extract all scores from reports

  • Consensus if: overall scores within 0.5 AND all criterion scores within 1.0

  • If achieved → generate consensus report and complete

Step 3: Debate Round (if no consensus, max 3 rounds)

  • Each judge reads their own report + others' reports from filesystem

  • Identifies disagreements (>1 point gap on any criterion)

  • Defends their ratings with evidence

  • Challenges others' ratings with counter-evidence

  • Revises scores if convinced by others' arguments

  • Appends "Debate Round N" section to their own report

Step 4: Repeat until consensus, max rounds, or lack of convergence

Step 5: Final Report

  • If consensus: averaged scores, strengths/weaknesses, debate summary

  • If no consensus: disagreement report with flag for human review

Theoretical Foundation

Based on:

Key Insight: Debate forces judges to explicitly defend positions with evidence and consider counter-arguments, reducing individual bias and improving calibration.

judge - Single-Agent Work Evaluation

Evaluate completed work using LLM-as-Judge with structured rubrics, context isolation, and evidence-based scoring.

  • Purpose - Assess quality of work produced earlier in conversation with isolated context

  • Pattern - Context Extraction → Judge Sub-Agent → Validation → Report

  • Output - Evaluation report with weighted scores, evidence citations, and actionable improvements

  • Quality - Enhanced with Chain-of-Thought scoring, self-verification, and bias mitigation

  • Efficiency - Single focused judge for fast evaluation without multi-agent overhead

Pattern: LLM-as-Judge with Context Isolation

This command implements a three-phase evaluation pattern:

Usage

When to Use

Use single judge when:

  • Quick quality check needed

  • Work is straightforward with clear criteria

  • Speed/cost matters more than multi-perspective analysis

  • Evaluation is formative (guiding improvements), not summative

  • Low-to-medium stakes decisions

Use judge-with-debate instead when:

  • High-stakes decisions requiring rigorous evaluation

  • Subjective criteria where perspectives differ legitimately

  • Complex solutions with many evaluation dimensions

  • You need defensible, consensus-based evaluation

Default Evaluation Criteria

Criterion
Weight
What It Measures

Instruction Following

0.30

Does output fulfill original request? All requirements addressed?

Output Completeness

0.25

All components covered? Appropriate depth? No gaps?

Solution Quality

0.25

Sound approach? Best practices? No correctness issues?

Reasoning Quality

0.10

Clear decision-making? Appropriate methods used?

Response Coherence

0.10

Well-structured? Easy to understand? Professional?

Scoring Interpretation

Score Range
Verdict
Recommendation

4.50 - 5.00

EXCELLENT

Ready as-is

4.00 - 4.49

GOOD

Minor improvements optional

3.50 - 3.99

ACCEPTABLE

Improvements recommended

3.00 - 3.49

NEEDS IMPROVEMENT

Address issues before use

1.00 - 2.99

INSUFFICIENT

Significant rework needed

Quality Enhancement Techniques

Technique
Benefit

Context Isolation

Judge receives only extracted context, preventing confirmation bias from session state

Chain-of-Thought Scoring

Justification BEFORE score improves reliability by 15-25%

Evidence Requirement

Every score requires specific citations (file paths, line numbers, quotes)

Self-Verification

Judge generates verification questions and documents adjustments

Bias Mitigation

Explicit warnings against length bias, verbosity bias, and authority bias

Theoretical Foundation

Based on:

Skills Overview

subagent-driven-development - Task Execution with Quality Gates

Use when executing implementation plans with independent tasks or facing multiple independent issues that can be investigated without shared state - dispatches fresh subagent for each task with code review between tasks.

  • Purpose - Execute plans through coordinated subagents with quality checkpoints

  • Output - Completed implementation with all tasks verified and reviewed

When to Use SADD

Use SADD when:

  • You have an implementation plan with 3+ distinct tasks

  • Tasks can be executed independently (or in clear sequence)

  • You need quality gates between implementation steps

  • Context would accumulate over a long implementation session

  • Multiple unrelated failures need parallel investigation

  • Different subsystems need changes that do not conflict

Use regular development when:

  • Single task or simple change

  • Tasks are tightly coupled and need shared understanding

  • Exploratory work where scope is undefined

  • You need human-in-the-loop feedback between every step

Usage

How It Works

SADD supports four execution strategies based on task characteristics:

Sequential Execution

For dependent tasks that must be executed in order:

Parallel Execution

For independent tasks that can run concurrently:

Parallel Investigation

Special case for fixing multiple unrelated failures:

Multi-Agent Analysis Orchestration

Commands often orchestrate multiple agents to provide comprehensive analysis:

Sequential Analysis:

Parallel Analysis:

Debate Pattern:

Processes

Sequential Execution Process

  1. Load Plan: Read plan file and create TodoWrite with all tasks

  2. Execute Task with Subagent: For each task, dispatch a fresh subagent:

    • Subagent reads the specific task from the plan

    • Implements exactly what the task specifies

    • Writes tests following project conventions

    • Verifies implementation works

    • Commits the work

    • Reports back with summary

  3. Review Subagent's Work: Dispatch a code-reviewer subagent:

    • Reviews what was implemented against the plan

    • Returns: Strengths, Issues (Critical/Important/Minor), Assessment

    • Quality gate: Must pass before proceeding

  4. Apply Review Feedback:

    • Fix Critical issues immediately (dispatch fix subagent)

    • Fix Important issues before next task

    • Note Minor issues for later

  5. Mark Complete, Next Task: Update TodoWrite and proceed to next task

  6. Final Review: After all tasks, dispatch final reviewer for overall assessment

  7. Complete Development: Use finishing-a-development-branch skill to verify and close

Parallel Execution Process

  1. Load and Review Plan: Read plan, identify concerns, create TodoWrite

  2. Execute Batch: Execute first 3 tasks (default batch size):

    • Mark each as in_progress

    • Follow each step exactly

    • Run verifications as specified

    • Mark as completed

  3. Report: Show what was implemented and verification output

  4. Continue: Apply feedback if needed, execute next batch

  5. Complete Development: Final verification and close

Parallel Investigation Process

For multiple unrelated failures (different files, subsystems, bugs):

  1. Identify Independent Domains: Group failures by what is broken

  2. Create Focused Agent Tasks: Each agent gets specific scope, clear goal, constraints

  3. Dispatch in Parallel: All agents run concurrently

  4. Review and Integrate: Verify fixes do not conflict, run full suite

Quality Gates

Quality gates are enforced at key checkpoints:

Checkpoint
Gate Type
Action on Failure

After each task (sequential)

Code review

Fix issues before next task

After batch (parallel)

Human review

Apply feedback, continue

Final review

Comprehensive review

Address all findings

Before merge

Full test suite

All tests must pass

Issue Severity Handling:

  • Critical: Fix immediately, do not proceed until resolved

  • Important: Fix before next task or batch

  • Minor: Note for later, do not block progress

multi-agent-patterns

Use when single-agent context limits are exceeded, when tasks decompose naturally into subtasks, or when specializing agents improves quality.

Why Multi-Agent Architectures:

Problem
Solution

Context Bottleneck

Partition work across multiple context windows

Sequential Bottleneck

Parallelize independent subtasks across agents

Generalist Overhead

Specialize agents with lean, focused context

Architecture Patterns:

Pattern
When to Use
Trade-offs

Supervisor/Orchestrator

Clear task decomposition, need human oversight

Central bottleneck, "telephone game" risk

Peer-to-Peer/Swarm

Flexible exploration, emergent requirements

Coordination complexity, divergence risk

Hierarchical

Large projects with layered abstraction

Overhead between layers, alignment challenges

Example of Implementation:

Foundation

The SADD plugin is based on the following foundations:

Agent Skills for Context Engineering

Research Papers

Multi-Agent Patterns:

Evaluation and Critique:

Engineering Methodologies

  • Design Studio Method - Parallel design exploration with critique and synthesis

  • Spike Solutions (Extreme Programming) - Time-boxed exploration of multiple approaches

  • Ensemble Methods (Machine Learning) - Combining multiple models for improved performance

Last updated