1.4 Core Capabilities & Limitations

๐ŸŽฏ Learning Objectives

  • Understand the current capabilities of modern language models
  • Recognize fundamental limitations and failure modes
  • Appreciate reliability constraints and edge cases
  • Learn strategies for mitigating limitations in practice

๐Ÿš€ Core Capabilities

๐Ÿ“ Text Generation & Completion

Reliability:
High (85%+)

What it does: Generate coherent, contextually appropriate text across various styles and formats.

Prompt: "Write a professional email declining a meeting" Response: "Subject: Unable to Attend - [Meeting Title] Dear [Name], Thank you for the invitation. Unfortunately, I have a conflicting commitment and won't be able to attend. Could we schedule a brief follow-up call to discuss key outcomes? Best regards, [Your name]"

๐Ÿง  Reasoning & Problem Solving

Reliability:
Medium (60%)

What it does: Break down complex problems, apply logical reasoning, and provide step-by-step solutions.

User: "Two trains problem: Train A leaves at 2PM at 60mph, Train B leaves at 2:30PM at 80mph toward A. Stations 200 miles apart. When do they meet?" Model: "Let me solve this step by step: 1. At 2:30PM, Train A has traveled: 0.5h ร— 60mph = 30 miles 2. Remaining distance: 200 - 30 = 170 miles 3. Combined speed: 60 + 80 = 140 mph 4. Time to meet: 170 รท 140 = 1.21 hours = 1h 13min 5. Meeting time: 2:30PM + 1h 13min = 3:43PM"

๐Ÿ’ป Code Generation & Analysis

Reliability:
High (80%+)

What it does: Write, debug, and explain code across multiple programming languages.

Prompt: "Write a Python function to find prime numbers" def is_prime(n): if n < 2: return False for i in range(2, int(n**0.5) + 1): if n % i == 0: return False return True def find_primes(limit): return [n for n in range(2, limit + 1) if is_prime(n)]

๐Ÿ”ง Tool Use & Function Calling

Reliability:
Emerging (70%)

What it does: Understand when and how to call external tools and APIs to accomplish tasks.

โš ๏ธ Fundamental Limitations

๐ŸŽญ Hallucination

Frequency:
Common (20-40%)

What it is: Generating plausible-sounding but factually incorrect information.

Problematic Response: "The Battle of Hastings in 1066 was fought between Napoleon and the Duke of Wellington, marking the end of the Norman Conquest." (Conflates different historical events and dates)
  • Common in: Historical facts, citations, technical specifications
  • Risk factors: Obscure topics, recent events, precise numbers

๐Ÿ“… Knowledge Cutoff & Temporal Limitations

Impact:
High for Recent Info

What it is: Limited knowledge of events after training data cutoff.

User: "What happened in the 2024 Olympics?" Model: "I don't have information about the 2024 Olympics as my knowledge was last updated in early 2024. The 2024 Summer Olympics were scheduled to be held in Paris..."

๐Ÿงฎ Mathematical & Computational Precision

Accuracy:
Low for Complex Math

What it is: Struggles with precise calculations, especially multi-step arithmetic.

Error Example: User: "What is 47 ร— 83 + 127 รท 4?" Model: "47 ร— 83 = 3,901, then 127 รท 4 = 31.75, so 3,901 + 31.75 = 3,932.75" (Correct answer, but unreliable for complex calculations)

๐ŸŽฏ Context Window Constraints

Limitation:
Model Dependent

What it is: Limited ability to maintain coherence over very long conversations or documents.

  • GPT-4: ~128K tokens (~300 pages)
  • Claude 3: ~200K tokens (~500 pages)
  • Gemini 1.5: ~1M tokens (~2,800 pages)

๐Ÿ“Š Reliability Constraints

๐ŸŽฒ Stochastic Nature

Language models are probabilistic systems that can produce different outputs for identical inputs.

Same Prompt, Different Responses: Run 1: "The capital of France is Paris, a beautiful city..." Run 2: "Paris is the capital and largest city of France..." Run 3: "France's capital city is Paris, located in northern France..."

โšก System Prompts & Instruction Following

Model behavior heavily depends on system prompts and instructions, which can be inconsistently followed.

๐Ÿ” Edge Cases & Adversarial Inputs

  • Prompt Injection: Malicious instructions embedded in user input
  • Jailbreaking: Attempts to bypass safety guidelines
  • Ambiguous Queries: Unclear or contradictory instructions
  • Out-of-Distribution: Inputs very different from training data

๐Ÿ›ก๏ธ Mitigation Strategies

๐Ÿ” Verification & Validation

  • Cross-reference factual claims
  • Use multiple model outputs
  • Implement fact-checking tools

๐Ÿงฐ Tool Integration

  • Calculator for math problems
  • Search engines for recent info
  • Code interpreters for execution

๐Ÿ“ Prompt Engineering

  • Clear, specific instructions
  • Chain-of-thought prompting
  • Error correction patterns

๐ŸŽฏ Human-in-the-Loop

  • Human review for critical tasks
  • Feedback loops for improvement
  • Confidence scoring systems

๐Ÿ”ฎ Future Improvements on the Horizon

  • Retrieval-Augmented Generation (RAG): Real-time access to current information
  • Tool-Augmented Models: Seamless integration with calculators, search, APIs
  • Improved Reasoning: Better logical consistency and multi-step problem solving
  • Uncertainty Quantification: Models that know when they don't know
  • Multimodal Capabilities: Enhanced vision, audio, and video understanding

Key Insight: Understanding these limitations is crucial for building reliable AI systems. Always design with failure modes in mind and implement appropriate safeguards.