Reflexion
Self-refinement framework that introduces feedback and refinement loops to improve output quality through iterative improvement, complexity triage, and verification.
Focused on:
Self-refinement - Agents review and improve their own outputs
Multi-agent review - Specialized agents critique from different perspectives
Iterative improvement - Systematic loops that converge on higher quality
Memory integration - Lessons learned persist across interactions
Plugin Target
Decrease hallucinations - reflection usually allows you to get rid of hallucinations by verifying the output
Make output quality more predictable - same model usually produces more similar output after reflection, rather than after one shot prompt
Improve output quality - reflection usually allows you to improve the output by identifying areas that were missed or misunderstood in one shot prompt
Overview
The Reflexion plugin implements multiple scientifically-proven techniques for improving LLM outputs through self-reflection, critique, and memory updates. It enables Claude to evaluate its own work, identify weaknesses, and generate improved versions.
Plugin is based on papers like Self-Refine and Reflexion. These techniques improve the output of large language models by introducing feedback and refinement loops.
They are proven to increase output quality by 8–21% based on both automatic metrics and human preferences across seven diverse tasks, including dialogue generation, coding, and mathematical reasoning, when compared to standard one-step model outputs.
On top of that, the plugin is based on the Agentic Context Engineering paper that uses memory updates after reflection, and consistently outperforms strong baselines by 10.6% on agents.
Quick Start
Alternatively, you can use the reflect word in initial prompt:
In order to use this hook, need to have bun installed. But for overall command it is not required.
Automatic Reflection with Hooks
The plugin includes optional hooks that automatically trigger reflection when you include the word "reflect" in your prompt. This removes the need to manually run /reflexion:reflect after each task.
How It Works
Include the word "reflect" anywhere in your prompt
Claude completes your task
The hook automatically triggers
/reflexion:reflectClaude reviews and improves its work
Important: Only the exact word "reflect" triggers automatic reflection. Words like "reflection", "reflective", or "reflects" do not trigger it.
Commands
/reflexion:reflect - Self-Refinement. Reflect on previous response and output, based on Self-refinement framework for iterative improvement with complexity triage and verification
/reflexion:critique - Multi-Perspective Critique. Memorize insights from reflections and updates CLAUDE.md file with this knowledge. Curates insights from reflections and critiques into CLAUDE.md using Agentic Context Engineering
/reflexion:memorize - Memorize insights from reflections and updates CLAUDE.md file with this knowledge. Curates insights from reflections and critiques into CLAUDE.md using Agentic Context Engineering
Theoretical Foundation
Based on papers like Self-Refine and Reflexion. These techniques improve the output of large language models by introducing feedback and refinement loops.
They are proven to increase output quality by 8–21% based on both automatic metrics and human preferences across seven diverse tasks, including dialogue generation, coding, and mathematical reasoning, when compared to standard one-step model outputs.
Full list of included patterns and techniques:
Self-Refinement / Iterative Refinement - One model generates, then reviews and improves its own output
Constitutional AI (CAI) / RLAIF - One model generates responses, another critiques them based on principles
Critic-Generator or Verifier-Generator Architecture - Generator model creates outputs, Critic/verifier model evaluates and provides feedback
LLM-as-a-Judge - One LLM evaluates/scores outputs from another LLM
Debate / Multi-Agent Debate - Multiple models propose and critique solutions
Generate-Verify-Refine (GVR) - Three-stage process: generate → verify → refine based on verification
On top of that, the plugin is based on the Agentic Context Engineering paper that uses memory updates after reflection, and consistently outperforms strong baselines by 10.6% on agents.
Also includes the following techniques:
Chain-of-Verification (CoVe) - Model generates answer, then verification questions, then revises
Tree of Thoughts (ToT) - Explores multiple reasoning paths with evaluation
Process Reward Models (PRM) - Evaluates reasoning steps rather than just final answers
Last updated