test-skill

Verify skills work under pressure and resist rationalization using the RED-GREEN-REFACTOR cycle. Critical for discipline-enforcing skills.

  • Purpose - Test skill effectiveness with pressure scenarios

  • Output - Verification report with rationalization table

/customaize-agent:test-skill ["skill path or name"]

Usage Examples

# Test a TDD enforcement skill
> /customaize-agent:test-skill tdd

# Test a custom skill by path
> /customaize-agent:test-skill ~/.claude/skills/code-review/

# Start testing workflow
> /customaize-agent:test-skill

Arguments

Optional path to skill being tested or skill name.

How It Works

  1. RED Phase - Baseline Testing: Run scenarios WITHOUT the skill

    • Create pressure scenarios (3+ combined pressures)

    • Document agent behavior and rationalizations verbatim

    • Identify patterns in failures

  2. GREEN Phase - Write Minimal Skill: Address baseline failures

    • Write skill addressing specific observed rationalizations

    • Run same scenarios WITH skill

    • Verify agent now complies

  3. REFACTOR Phase - Close Loopholes: Iterate until bulletproof

    • Identify NEW rationalizations from testing

    • Add explicit counters for each loophole

    • Build rationalization table

    • Create red flags list

    • Re-test until bulletproof

Pressure Types

Pressure
Example

Time

Emergency, deadline, deploy window closing

Sunk cost

Hours of work, "waste" to delete

Authority

Senior says skip it, manager overrides

Economic

Job, promotion, company survival at stake

Exhaustion

End of day, already tired, want to go home

Social

Looking dogmatic, seeming inflexible

Pragmatic

"Being pragmatic vs dogmatic"

Best Practices

  • Combine 3+ pressures - Single pressure tests are too weak

  • Document verbatim - Capture exact rationalizations, not summaries

  • Iterate completely - Continue REFACTOR until no new rationalizations

  • Use meta-testing - Ask agents how skill could have been clearer

  • Test all skill types - Discipline-enforcing, technique, pattern, and reference skills need different tests

Last updated