← Previous Contents Next →

1.4 Core Capabilities & Limitations

        🎯 Learning Objectives
        Understand the current capabilities of modern language models
Recognize fundamental limitations and failure modes
Appreciate reliability constraints and edge cases
Learn strategies for mitigating limitations in practice

    

🚀 Core Capabilities

📝 Text Generation & Completion

Reliability:

High (85%+)

What it does: Generate coherent, contextually appropriate text across various styles and formats.

Prompt: "Write a professional email declining a meeting" Response: "Subject: Unable to Attend - [Meeting Title] Dear [Name], Thank you for the invitation. Unfortunately, I have a conflicting commitment and won't be able to attend. Could we schedule a brief follow-up call to discuss key outcomes? Best regards, [Your name]"

🧠 Reasoning & Problem Solving

Reliability:

Medium (60%)

What it does: Break down complex problems, apply logical reasoning, and provide step-by-step solutions.

User: "Two trains problem: Train A leaves at 2PM at 60mph, Train B leaves at 2:30PM at 80mph toward A. Stations 200 miles apart. When do they meet?" Model: "Let me solve this step by step: 1. At 2:30PM, Train A has traveled: 0.5h × 60mph = 30 miles 2. Remaining distance: 200 - 30 = 170 miles 3. Combined speed: 60 + 80 = 140 mph 4. Time to meet: 170 ÷ 140 = 1.21 hours = 1h 13min 5. Meeting time: 2:30PM + 1h 13min = 3:43PM"

💻 Code Generation & Analysis

Reliability:

High (80%+)

What it does: Write, debug, and explain code across multiple programming languages.

Prompt: "Write a Python function to find prime numbers" def is_prime(n): if n < 2: return False for i in range(2, int(n**0.5) + 1): if n % i == 0: return False return True def find_primes(limit): return [n for n in range(2, limit + 1) if is_prime(n)]

🔧 Tool Use & Function Calling

Reliability:

Emerging (70%)

What it does: Understand when and how to call external tools and APIs to accomplish tasks.

⚠️ Fundamental Limitations

🎭 Hallucination

Frequency:

Common (20-40%)

What it is: Generating plausible-sounding but factually incorrect information.

Problematic Response: "The Battle of Hastings in 1066 was fought between Napoleon and the Duke of Wellington, marking the end of the Norman Conquest." (Conflates different historical events and dates)

Common in: Historical facts, citations, technical specifications
Risk factors: Obscure topics, recent events, precise numbers

📅 Knowledge Cutoff & Temporal Limitations

Impact:

High for Recent Info

What it is: Limited knowledge of events after training data cutoff.

User: "What happened in the 2024 Olympics?" Model: "I don't have information about the 2024 Olympics as my knowledge was last updated in early 2024. The 2024 Summer Olympics were scheduled to be held in Paris..."

🧮 Mathematical & Computational Precision

Accuracy:

Low for Complex Math

What it is: Struggles with precise calculations, especially multi-step arithmetic.

Error Example: User: "What is 47 × 83 + 127 ÷ 4?" Model: "47 × 83 = 3,901, then 127 ÷ 4 = 31.75, so 3,901 + 31.75 = 3,932.75" (Correct answer, but unreliable for complex calculations)

🎯 Context Window Constraints

Limitation:

Model Dependent

What it is: Limited ability to maintain coherence over very long conversations or documents.

GPT-4: ~128K tokens (~300 pages)
Claude 3: ~200K tokens (~500 pages)
Gemini 1.5: ~1M tokens (~2,800 pages)

📊 Reliability Constraints

🎲 Stochastic Nature

Language models are probabilistic systems that can produce different outputs for identical inputs.

Same Prompt, Different Responses: Run 1: "The capital of France is Paris, a beautiful city..." Run 2: "Paris is the capital and largest city of France..." Run 3: "France's capital city is Paris, located in northern France..."

⚡ System Prompts & Instruction Following

Model behavior heavily depends on system prompts and instructions, which can be inconsistently followed.

🔍 Edge Cases & Adversarial Inputs

Prompt Injection: Malicious instructions embedded in user input
Jailbreaking: Attempts to bypass safety guidelines
Ambiguous Queries: Unclear or contradictory instructions
Out-of-Distribution: Inputs very different from training data

🛡️ Mitigation Strategies

🔍 Verification & Validation

Cross-reference factual claims
Use multiple model outputs
Implement fact-checking tools

🧰 Tool Integration

Calculator for math problems
Search engines for recent info
Code interpreters for execution

📝 Prompt Engineering

Clear, specific instructions
Chain-of-thought prompting
Error correction patterns

🎯 Human-in-the-Loop

Human review for critical tasks
Feedback loops for improvement
Confidence scoring systems

🔮 Future Improvements on the Horizon

Retrieval-Augmented Generation (RAG): Real-time access to current information
Tool-Augmented Models: Seamless integration with calculators, search, APIs
Improved Reasoning: Better logical consistency and multi-step problem solving
Uncertainty Quantification: Models that know when they don't know
Multimodal Capabilities: Enhanced vision, audio, and video understanding

Key Insight: Understanding these limitations is crucial for building reliable AI systems. Always design with failure modes in mind and implement appropriate safeguards.

← Previous: Model Taxonomy Next: Chapter 2 - SLMs →

← Previous Contents Next →