What is a Token?
AI models don't read words - they read tokens. A token is a chunk of text, roughly 3/4 of a word in English. This is the fundamental unit of AI billing.
Text to Tokens
"Hello world" = 2 tokens
"Artificial intelligence" = 2 tokens
"I love programming" = 3 tokens
1 page of text = ~500 tokens
Quick Math
1,000 tokens = ~750 words
1 page = ~500 tokens
1 book (300 pages) = ~150K tokens
1 codebase (10K lines) = ~40K tokens
Why It Matters
You pay per token. Input tokens (what you send) AND output tokens (what Claude says back). Output tokens cost 3-5x more than input tokens.
A long system prompt that you send with every message? That's input tokens billed every single time. A 2,000-token system prompt across 1,000 API calls = 2 million input tokens = $6 on Sonnet. Prompt caching (Module 07) cuts this by 90%.
What is a Context Window?
The context window is the total amount of text (in tokens) that Claude can "see" at once. Think of it as Claude's working memory - everything in the conversation must fit inside this window.
| Model | Context Window | That's roughly... |
|---|---|---|
| Claude Haiku 4.5 | 200K tokens | 1.5 novels or 400 pages of code |
| Claude Sonnet 4.6 | 200K (1M beta) | 1.5 novels standard, 7.5 novels with beta header |
| Claude Opus 4.6 | 200K (1M beta) | Same as Sonnet, with 128K max output |
| GPT-4o | 128K tokens | About 1 novel |
| Gemini 2.0 | 2M tokens | ~10 novels (but quality degrades) |
All models struggle with information buried in the middle of very long contexts. Claude handles this better than most, but the rule stands: put important information at the beginning or end of your prompt, never buried in the middle.
How Context Windows Affect Your Design
The Conversation Grows Problem
Every message in a conversation gets sent back to Claude. A 50-message chat might be 20,000 tokens - all billed as input every time Claude responds.
// Message 1: User sends 100 tokens, Claude responds 200 tokens
// Total input billed: 100 tokens
// Message 2: Entire history (300) + new message (100) = 400 input tokens
// Total input billed: 100 + 400 = 500 tokens
// Message 10: History is now 5,000 tokens + new message = 5,100 input tokens
// You're paying for the FULL history every single message
// Solution: Summarize older messages, keep only recent onesStrategies for Context Management
Sliding Window
Keep only the last N messages. Simple but loses old context. Good for casual chatbots.
Summarize + Recent
Summarize old messages into a paragraph, keep last 5-10 messages verbatim. Best balance of cost and context.
RAG (Retrieval)
Store everything in a database, retrieve only relevant parts per query. Most sophisticated, covered in Module 06.
Cost Optimization Cheat Sheet
| Technique | Savings | Difficulty |
|---|---|---|
| Choose the right model (Haiku vs Sonnet vs Opus) | Up to 60x | Easy |
| Prompt caching (reuse system prompts) | Up to 90% | Medium |
| Shorter prompts (cut fluff, be precise) | 20-50% | Easy |
| Context management (summarize old messages) | 30-70% | Medium |
| Batch API (non-real-time processing) | 50% | Easy |
| Brain+Muscles (Opus decides, Haiku executes) | 40-80% | Medium |
A naive implementation sends the full conversation history with a long system prompt to Opus for every message. Apply all six techniques above and the same app costs $49/month instead of $500. This course pays for itself in the first week.
Real-World Token Budget Example
Let's say you're building a customer support bot that handles 1,000 conversations per day:
| Component | Tokens | Cost (Sonnet 4.6) |
|---|---|---|
| System prompt (per message) | 2,000 input | $0.006 |
| Conversation history (avg) | 3,000 input | $0.009 |
| User message | 200 input | $0.0006 |
| Claude's response | 500 output | $0.0075 |
| Per message total | 5,700 | $0.023 |
| Per conversation (8 msgs) | ~45,000 | $0.18 |
| Daily (1,000 convos) | 45M | $180 |
| Monthly | 1.35B | $5,400 |
Now apply optimizations:
| Optimization | New Monthly Cost |
|---|---|
| + Prompt caching (90% off system prompt) | $4,860 |
| + Context summarization (50% off history) | $3,240 |
| + Use Haiku 4.5 for simple queries (60% of volume) | $1,620 |
| Final monthly cost | $1,620 (70% savings) |
Key Takeaways
- Tokens are the billing unit. ~750 words = 1,000 tokens. Output costs 3-5x more than input.
- Context window = Claude's working memory. 200K tokens for all Claude models.
- Conversations grow in cost because full history is re-sent every message.
- Put important info at the start or end of prompts, not the middle.
- Six optimization techniques can reduce costs by 60-90%.
- Model selection alone can save 60x - it's the biggest lever you have.