Not long ago, feeding a large language model more than a few paragraphs of text was a significant technical challenge. Today, models like Gemini 1.5 Pro can process 2 million tokens — roughly 1,500 books — in a single context window. This expansion is not just impressive engineering; it fundamentally changes what AI can do.
What Is a Context Window?
A context window is the total amount of text an LLM can process at once — both as input and output. Everything the model knows about your current conversation, the document you uploaded, the instructions you gave — all of it must fit within the context window.
Why Size Matters
Early GPT models had context windows of just 2,048 tokens. GPT-3 expanded this to 4,096. Today, Claude 3 offers 200K tokens, Gemini 1.5 offers 1M tokens, and some models support 2M or more. This isn't just a quantitative improvement — it's qualitative.
What Becomes Possible With Large Contexts
- Analyzing entire codebases in a single prompt
- Summarizing full books without chunking
- Long-running conversations with complete memory
- Document analysis without retrieval systems
- Multi-document synthesis and comparison
The Challenges
Larger context windows come with real costs. Processing 1M tokens requires enormous compute — both in memory and computation. There's also the lost in the middle problem: research shows LLMs perform worse at retrieving information from the middle of very long contexts compared to the beginning and end.
The Future
Context windows will continue to expand. The more interesting question is whether models will learn to use large contexts effectively — not just process them. The combination of large context windows with improved retrieval and reasoning capabilities may be one of the most important near-term developments in AI.