A context window is the maximum amount of text — measured in tokens — that an AI model can consider at once when generating a response. Everything the model “sees” (your prompt, the conversation so far, and any retrieved documents) must fit inside it.
How it works
If the combined input exceeds the window, older content is truncated or must be summarized. Early LLMs had windows of a few thousand tokens; frontier models in 2026 support hundreds of thousands to millions, enabling whole books, codebases, or long conversations to be processed at once.
Why it matters
Larger context windows expand what AI can do — analyzing long documents, sustaining longer agentic tasks, and supplying more retrieved context — though processing more tokens also raises cost. See AI Models & Benchmarks Statistics 2026.
Related terms: Tokens · LLM · RAG · All glossary entries