Chapter 8.2: Reflexion & Self-Critique in AI Agents
1. Mathematical Foundation of Self-Reflection
Reflexion involves iterative self-improvement through systematic critique and refinement. We can model this as an optimization process over agent performance:
Reflexion Function:
R(x, h) = (x', h')
Where:
• x = current attempt/solution
• h = historical experience
• x' = improved attempt
• h' = updated experience memory
R(x, h) = (x', h')
Where:
• x = current attempt/solution
• h = historical experience
• x' = improved attempt
• h' = updated experience memory
Performance Improvement Metric:
Δp = p(x_{t+1}) - p(x_t)
Where p(x) measures performance of attempt x
Δp = p(x_{t+1}) - p(x_t)
Where p(x) measures performance of attempt x
Convergence Condition:
lim_{t→∞} ||p(x_{t+1}) - p(x_t)|| < ε
Where ε is the convergence threshold
lim_{t→∞} ||p(x_{t+1}) - p(x_t)|| < ε
Where ε is the convergence threshold
Reflexion Cycle Visualization
2. The Reflexion Framework
Reflexion Components
- Actor: Generates solutions/actions
- Evaluator: Provides performance feedback
- Self-Reflection: Analyzes failures and generates insights
- Memory: Stores experiences for future reference
Reflexion Algorithm
Iteration 1
Attempt: Generate initial solution using current knowledge
Evaluate: Assess performance and identify failure modes
Reflect: Generate self-critique and improvement strategies
Store: Update memory with lessons learned
Mathematical Model of Self-Critique
The critique function maps attempts to improvement insights:
Critique Function:
C(x, y, f) = {c₁, c₂, ..., cₖ}
Where:
• x = attempt
• y = expected output
• f = feedback signal
• cᵢ = specific critique points
C(x, y, f) = {c₁, c₂, ..., cₖ}
Where:
• x = attempt
• y = expected output
• f = feedback signal
• cᵢ = specific critique points
class ReflexionAgent:
def __init__(self, actor, evaluator, max_iterations=5):
self.actor = actor
self.evaluator = evaluator
self.memory = []
self.max_iterations = max_iterations
def solve_with_reflexion(self, task):
"""Solve task using reflexion-based improvement"""
performance_history = []
for iteration in range(self.max_iterations):
# Generate attempt
attempt = self.actor.generate(task, self.memory)
# Evaluate performance
performance = self.evaluator.score(attempt, task)
performance_history.append(performance)
# Check if satisfactory
if performance >= self.threshold:
return attempt, performance_history
# Generate self-reflection
reflection = self._generate_reflection(attempt, performance, task)
self.memory.append(reflection)
return attempt, performance_history
def _generate_reflection(self, attempt, performance, task):
"""Generate insight from failed attempt"""
critique_prompt = f"""
Task: {task}
Attempt: {attempt}
Performance: {performance}
Analyze what went wrong and how to improve:"""
return self.actor.reflect(critique_prompt)
Performance Improvement Over Iterations
3. Self-Critique Mechanisms
Types of Self-Critique
- Logical Consistency: Check for contradictions and invalid inferences
- Factual Accuracy: Verify claims against knowledge base
- Completeness: Assess if all requirements are addressed
- Efficiency: Evaluate resource usage and optimization
Multi-Dimensional Critique Score:
S = w₁ × consistency + w₂ × accuracy + w₃ × completeness + w₄ × efficiency
Where wᵢ are importance weights summing to 1
S = w₁ × consistency + w₂ × accuracy + w₃ × completeness + w₄ × efficiency
Where wᵢ are importance weights summing to 1
Critique Quality Metrics
Specificity × Actionability × Accuracy = Overall Quality
Specificity × Actionability × Accuracy = Overall Quality
Self-Critique Analysis
4. Memory-Augmented Reflexion
Experience Memory Function:
M(t) = {(task_i, attempt_i, reflection_i, outcome_i)}
For all previous experiences up to time t
M(t) = {(task_i, attempt_i, reflection_i, outcome_i)}
For all previous experiences up to time t
Relevance-Weighted Retrieval:
retrieve(query) = argmax_i (similarity(query, task_i) × recency(i) × success(i))
retrieve(query) = argmax_i (similarity(query, task_i) × recency(i) × success(i))
Memory Consolidation Process
Periodic compression of experiences into general principles and patterns
class ExperienceMemory:
def __init__(self, capacity=1000):
self.experiences = []
self.capacity = capacity
self.principles = [] # Abstracted insights
def store_experience(self, task, attempt, reflection, outcome):
"""Store new experience with metadata"""
experience = {
'task': task,
'attempt': attempt,
'reflection': reflection,
'outcome': outcome,
'timestamp': time.time(),
'embedding': self._embed(task)
}
if len(self.experiences) >= self.capacity:
self._consolidate_oldest()
self.experiences.append(experience)
def retrieve_relevant(self, current_task, k=3):
"""Retrieve most relevant past experiences"""
query_embedding = self._embed(current_task)
similarities = [
self._compute_relevance(exp, query_embedding)
for exp in self.experiences
]
top_indices = sorted(range(len(similarities)),
key=lambda i: similarities[i],
reverse=True)[:k]
return [self.experiences[i] for i in top_indices]
Memory Retrieval Patterns
5. Iterative Improvement Dynamics
Learning Rate Adaptation:
α_t = α_0 × decay^{failure_count}
Where α decreases with repeated failures on similar tasks
α_t = α_0 × decay^{failure_count}
Where α decreases with repeated failures on similar tasks
Convergence Analysis:
E[performance_t] ≥ E[performance_{t-1}] + η × critique_quality_t
Expected performance increase proportional to critique quality
E[performance_t] ≥ E[performance_{t-1}] + η × critique_quality_t
Expected performance increase proportional to critique quality
Adaptive Stopping Criteria
- Performance plateau detection
- Diminishing returns threshold
- Resource budget exhaustion
- Satisfactory performance achieved
Convergence Analysis
6. Meta-Cognition and Strategy Selection
Reflexion Strategies
Different reflection strategies for different failure modes:
Strategy Selection Function:
strategy = argmax_s P(success | failure_type, strategy_s, context)
strategy = argmax_s P(success | failure_type, strategy_s, context)
Decomposition Strategy: Break complex problems into simpler sub-problems
Analogical Reasoning: Find similar solved problems and adapt solutions
Constraint Relaxation: Temporarily remove constraints to find feasible solutions
Alternative Perspective: Approach from different angles or viewpoints
Strategy Selection Network
7. Empirical Results and Benchmarks
| Task Domain | Baseline Accuracy | With Reflexion | Improvement |
|---|---|---|---|
| Code Generation | 68% | 91% | +23% |
| Decision Making | 45% | 67% | +22% |
| Reasoning Tasks | 52% | 74% | +22% |