Chapter 8.2: Reflexion & Self-Critique in AI Agents

1. Mathematical Foundation of Self-Reflection

Reflexion involves iterative self-improvement through systematic critique and refinement. We can model this as an optimization process over agent performance:

Reflexion Function:
R(x, h) = (x', h')
Where:
• x = current attempt/solution
• h = historical experience
• x' = improved attempt
• h' = updated experience memory
Performance Improvement Metric:
Δp = p(x_{t+1}) - p(x_t)
Where p(x) measures performance of attempt x
Convergence Condition:
lim_{t→∞} ||p(x_{t+1}) - p(x_t)|| < ε
Where ε is the convergence threshold

Reflexion Cycle Visualization

2. The Reflexion Framework

Reflexion Components

  • Actor: Generates solutions/actions
  • Evaluator: Provides performance feedback
  • Self-Reflection: Analyzes failures and generates insights
  • Memory: Stores experiences for future reference

Reflexion Algorithm

Iteration 1
Attempt: Generate initial solution using current knowledge
Evaluate: Assess performance and identify failure modes
Reflect: Generate self-critique and improvement strategies
Store: Update memory with lessons learned

Mathematical Model of Self-Critique

The critique function maps attempts to improvement insights:

Critique Function:
C(x, y, f) = {c₁, c₂, ..., cₖ}
Where:
• x = attempt
• y = expected output
• f = feedback signal
• cᵢ = specific critique points
class ReflexionAgent: def __init__(self, actor, evaluator, max_iterations=5): self.actor = actor self.evaluator = evaluator self.memory = [] self.max_iterations = max_iterations def solve_with_reflexion(self, task): """Solve task using reflexion-based improvement""" performance_history = [] for iteration in range(self.max_iterations): # Generate attempt attempt = self.actor.generate(task, self.memory) # Evaluate performance performance = self.evaluator.score(attempt, task) performance_history.append(performance) # Check if satisfactory if performance >= self.threshold: return attempt, performance_history # Generate self-reflection reflection = self._generate_reflection(attempt, performance, task) self.memory.append(reflection) return attempt, performance_history def _generate_reflection(self, attempt, performance, task): """Generate insight from failed attempt""" critique_prompt = f""" Task: {task} Attempt: {attempt} Performance: {performance} Analyze what went wrong and how to improve:""" return self.actor.reflect(critique_prompt)

Performance Improvement Over Iterations

3. Self-Critique Mechanisms

Types of Self-Critique

  • Logical Consistency: Check for contradictions and invalid inferences
  • Factual Accuracy: Verify claims against knowledge base
  • Completeness: Assess if all requirements are addressed
  • Efficiency: Evaluate resource usage and optimization
Multi-Dimensional Critique Score:
S = w₁ × consistency + w₂ × accuracy + w₃ × completeness + w₄ × efficiency
Where wᵢ are importance weights summing to 1
Critique Quality Metrics
Specificity × Actionability × Accuracy = Overall Quality

Self-Critique Analysis

4. Memory-Augmented Reflexion

Experience Memory Function:
M(t) = {(task_i, attempt_i, reflection_i, outcome_i)}
For all previous experiences up to time t
Relevance-Weighted Retrieval:
retrieve(query) = argmax_i (similarity(query, task_i) × recency(i) × success(i))

Memory Consolidation Process

Periodic compression of experiences into general principles and patterns

class ExperienceMemory: def __init__(self, capacity=1000): self.experiences = [] self.capacity = capacity self.principles = [] # Abstracted insights def store_experience(self, task, attempt, reflection, outcome): """Store new experience with metadata""" experience = { 'task': task, 'attempt': attempt, 'reflection': reflection, 'outcome': outcome, 'timestamp': time.time(), 'embedding': self._embed(task) } if len(self.experiences) >= self.capacity: self._consolidate_oldest() self.experiences.append(experience) def retrieve_relevant(self, current_task, k=3): """Retrieve most relevant past experiences""" query_embedding = self._embed(current_task) similarities = [ self._compute_relevance(exp, query_embedding) for exp in self.experiences ] top_indices = sorted(range(len(similarities)), key=lambda i: similarities[i], reverse=True)[:k] return [self.experiences[i] for i in top_indices]

Memory Retrieval Patterns

5. Iterative Improvement Dynamics

Learning Rate Adaptation:
α_t = α_0 × decay^{failure_count}
Where α decreases with repeated failures on similar tasks
Convergence Analysis:
E[performance_t] ≥ E[performance_{t-1}] + η × critique_quality_t
Expected performance increase proportional to critique quality

Adaptive Stopping Criteria

  • Performance plateau detection
  • Diminishing returns threshold
  • Resource budget exhaustion
  • Satisfactory performance achieved

Convergence Analysis

6. Meta-Cognition and Strategy Selection

Reflexion Strategies

Different reflection strategies for different failure modes:

Strategy Selection Function:
strategy = argmax_s P(success | failure_type, strategy_s, context)
Decomposition Strategy: Break complex problems into simpler sub-problems
Analogical Reasoning: Find similar solved problems and adapt solutions
Constraint Relaxation: Temporarily remove constraints to find feasible solutions
Alternative Perspective: Approach from different angles or viewpoints

Strategy Selection Network

7. Empirical Results and Benchmarks

Task Domain Baseline Accuracy With Reflexion Improvement
Code Generation 68% 91% +23%
Decision Making 45% 67% +22%
Reasoning Tasks 52% 74% +22%

Performance Comparison

Next: Tree of Thoughts & Graph Search →