Reflexion

Self-refinement framework that introduces feedback and refinement loops to improve output quality through iterative improvement, complexity triage, and verification.

Focused on:

  • Self-refinement - Agents review and improve their own outputs

  • Multi-agent review - Specialized agents critique from different perspectives

  • Iterative improvement - Systematic loops that converge on higher quality

  • Memory integration - Lessons learned persist across interactions

Plugin Target

  • Decrease hallucinations - reflection usually allows you to get rid of hallucinations by verifying the output

  • Make output quality more predictable - same model usually produces more similar output after reflection, rather than after one shot prompt

  • Improve output quality - reflection usually allows you to improve the output by identifying areas that were missed or misunderstood in one shot prompt

Overview

The Reflexion plugin implements multiple scientifically-proven techniques for improving LLM outputs through self-reflection, critique, and memory updates. It enables Claude to evaluate its own work, identify weaknesses, and generate improved versions.

Plugin is based on papers like Self-Refinearrow-up-right and Reflexionarrow-up-right. These techniques improve the output of large language models by introducing feedback and refinement loops.

They are proven to increase output quality by 8–21% based on both automatic metrics and human preferences across seven diverse tasks, including dialogue generation, coding, and mathematical reasoning, when compared to standard one-step model outputs.

On top of that, the plugin is based on the Agentic Context Engineeringarrow-up-right paper that uses memory updates after reflection, and consistently outperforms strong baselines by 10.6% on agents.

Quick Start

Alternatively, you can use the reflect word in initial prompt:

In order to use this hook, need to have bun installed. But for overall command it is not required.

Usage Examples

Automatic Reflection with Hooks

The plugin includes optional hooks that automatically trigger reflection when you include the word "reflect" in your prompt. This removes the need to manually run /reflexion:reflect after each task.

How It Works

  1. Include the word "reflect" anywhere in your prompt

  2. Claude completes your task

  3. The hook automatically triggers /reflexion:reflect

  4. Claude reviews and improves its work

Important: Only the exact word "reflect" triggers automatic reflection. Words like "reflection", "reflective", or "reflects" do not trigger it.

Commands Overview

/reflexion:reflect - Self-Refinement

Reflect on previous response and output, based on Self-refinement framework for iterative improvement with complexity triage and verification

  • Purpose - Review and improve previous response

  • Output - Refined output with improvements

Arguments

Optional areas to focus or confidence threshold to use, for example "security" or "deep reflect if less than 90% confidence"

How It Works

  1. Complexity Triage: Automatically determines appropriate reflection depth

    • Quick Path (5s): Simple tasks get fast verification

    • Standard Path: Multi-file changes get full reflection

    • Deep Path: Critical systems get comprehensive analysis

  2. Self-Assessment: Evaluates output against quality criteria

    • Completeness check

    • Quality assessment

    • Correctness verification

    • Fact-checking

  3. Refinement Planning: If improvements needed, generates specific plan

    • Identifies issues

    • Proposes solutions

    • Prioritizes fixes

  4. Implementation: Produces refined output addressing identified issues

Confidence Thresholds

The command uses confidence levels to determine if further iteration is needed:

  • Quick Path: No specific threshold (fast verification only)

  • Standard Path: Requires >70% confidence

  • Deep Reflection: Requires >90% confidence

If confidence threshold isn't met, the command will iterate automatically.

Usage Examples

Best practices

  • Reflect after significant work - Don't reflect on trivial tasks

  • Be specific - Provide context about what to focus on

  • Iterate when needed - Sometimes multiple reflection cycles are valuable

  • Capture learnings - Use /reflexion:memorize to preserve insights

/reflexion:critique - Multi-Perspective Critique

Memorize insights from reflections and updates CLAUDE.md file with this knowledge. Curates insights from reflections and critiques into CLAUDE.md using Agentic Context Engineering

  • Purpose - Multi-perspective comprehensive review

  • Output - Structured feedback from multiple judges

Arguments

Optional file paths, commits, or context to review (defaults to recent changes)

How It Works

  1. Context Gathering: Identifies scope of work to review

  2. Parallel Review: Spawns three specialized judge agents

    • Requirements Validator: Checks alignment with original requirements

    • Solution Architect: Evaluates technical approach and design

    • Code Quality Reviewer: Assesses implementation quality

  3. Cross-Review & Debate: Judges review each other's findings and debate disagreements

  4. Consensus Report: Generates comprehensive report with actionable recommendations

Judge Scoring

Each judge provides a score out of 10:

  • 9-10: Exceptional quality, minimal improvements needed

  • 7-8: Good quality, minor improvements suggested

  • 5-6: Acceptable quality, several improvements recommended

  • 3-4: Below standards, significant rework needed

  • 1-2: Major issues, substantial rework required

Usage Examples

Best practices

  • For important decisions - Use critique for architectural or design choices

  • Before major commits - Get multi-perspective review before committing

  • Learn from debates - Pay attention to different perspectives in the critique

  • Address all concerns - Don't cherry-pick feedback

/reflexion:memorize - Memory Updates

Comprehensive multi-perspective review using specialized judges with debate and consensus building

  • Purpose - Save insights to project memory

  • Output - Updated CLAUDE.md with learnings

Arguments

Optional source specification (last, selection, chat:) or --dry-run for preview

How It Works

  1. Context Harvesting: Gathers insights from recent work

    • Reflection outputs

    • Critique findings

    • Problem-solving patterns

    • Failed approaches and lessons

  2. Curation Process: Transforms raw insights into structured knowledge

    • Extracts key insights

    • Categorizes by impact

    • Applies curation rules (relevance, non-redundancy, actionability)

    • Prevents context collapse

  3. CLAUDE.md Updates: Adds curated insights to appropriate sections

    • Project Context

    • Code Quality Standards

    • Architecture Decisions

    • Testing Strategies

    • Development Guidelines

    • Strategies and Hard Rules

  4. Memory Validation: Ensures quality of updates

    • Coherence check

    • Actionability test

    • Consolidation review

    • Evidence verification

Usage Examples

Best practices

  • Regular memorization - Periodically save insights to CLAUDE.md

  • Review memory - Occasionally review CLAUDE.md to ensure it stays relevant

  • Curate carefully - Only memorize significant, reusable insights

  • Organize by topic - Keep CLAUDE.md well-structured

Theoretical Foundation

The Reflexion plugin is based on peer-reviewed research demonstrating 8-21% improvement in output quality across diverse tasks:

Core Papers

Additional Techniques

Last updated