← Course | Module 01 - Foundations Lesson 3 of 3 (Free)
FREE LESSON - MODULE 01

Tokens, Context Windows, and Why They Matter for Your Wallet

🕑 10 min read 🎯 Beginner 💰 Cost optimization

What is a Token?

AI models don't read words - they read tokens. A token is a chunk of text, roughly 3/4 of a word in English. This is the fundamental unit of AI billing.

Text to Tokens

"Hello world" = 2 tokens
"Artificial intelligence" = 2 tokens
"I love programming" = 3 tokens
1 page of text = ~500 tokens

Quick Math

1,000 tokens = ~750 words
1 page = ~500 tokens
1 book (300 pages) = ~150K tokens
1 codebase (10K lines) = ~40K tokens

Why It Matters

You pay per token. Input tokens (what you send) AND output tokens (what Claude says back). Output tokens cost 3-5x more than input tokens.

Key insight

A long system prompt that you send with every message? That's input tokens billed every single time. A 2,000-token system prompt across 1,000 API calls = 2 million input tokens = $6 on Sonnet. Prompt caching (Module 07) cuts this by 90%.

What is a Context Window?

The context window is the total amount of text (in tokens) that Claude can "see" at once. Think of it as Claude's working memory - everything in the conversation must fit inside this window.

ModelContext WindowThat's roughly...
Claude Haiku 4.5200K tokens1.5 novels or 400 pages of code
Claude Sonnet 4.6200K (1M beta)1.5 novels standard, 7.5 novels with beta header
Claude Opus 4.6200K (1M beta)Same as Sonnet, with 128K max output
GPT-4o128K tokensAbout 1 novel
Gemini 2.02M tokens~10 novels (but quality degrades)
The "lost in the middle" problem

All models struggle with information buried in the middle of very long contexts. Claude handles this better than most, but the rule stands: put important information at the beginning or end of your prompt, never buried in the middle.

How Context Windows Affect Your Design

The Conversation Grows Problem

Every message in a conversation gets sent back to Claude. A 50-message chat might be 20,000 tokens - all billed as input every time Claude responds.

// Message 1: User sends 100 tokens, Claude responds 200 tokens // Total input billed: 100 tokens // Message 2: Entire history (300) + new message (100) = 400 input tokens // Total input billed: 100 + 400 = 500 tokens // Message 10: History is now 5,000 tokens + new message = 5,100 input tokens // You're paying for the FULL history every single message // Solution: Summarize older messages, keep only recent ones

Strategies for Context Management

Sliding Window

Keep only the last N messages. Simple but loses old context. Good for casual chatbots.

Summarize + Recent

Summarize old messages into a paragraph, keep last 5-10 messages verbatim. Best balance of cost and context.

RAG (Retrieval)

Store everything in a database, retrieve only relevant parts per query. Most sophisticated, covered in Module 06.

Cost Optimization Cheat Sheet

TechniqueSavingsDifficulty
Choose the right model (Haiku vs Sonnet vs Opus)Up to 60xEasy
Prompt caching (reuse system prompts)Up to 90%Medium
Shorter prompts (cut fluff, be precise)20-50%Easy
Context management (summarize old messages)30-70%Medium
Batch API (non-real-time processing)50%Easy
Brain+Muscles (Opus decides, Haiku executes)40-80%Medium
The $49 vs $500 difference

A naive implementation sends the full conversation history with a long system prompt to Opus for every message. Apply all six techniques above and the same app costs $49/month instead of $500. This course pays for itself in the first week.

Real-World Token Budget Example

Let's say you're building a customer support bot that handles 1,000 conversations per day:

ComponentTokensCost (Sonnet 4.6)
System prompt (per message)2,000 input$0.006
Conversation history (avg)3,000 input$0.009
User message200 input$0.0006
Claude's response500 output$0.0075
Per message total5,700$0.023
Per conversation (8 msgs)~45,000$0.18
Daily (1,000 convos)45M$180
Monthly1.35B$5,400

Now apply optimizations:

OptimizationNew Monthly Cost
+ Prompt caching (90% off system prompt)$4,860
+ Context summarization (50% off history)$3,240
+ Use Haiku 4.5 for simple queries (60% of volume)$1,620
Final monthly cost$1,620 (70% savings)

Key Takeaways

🖨 Download PDF 🐦 Share on X ✈ Share on Telegram
← Previous: Model Lineup