Context Window

The context window is the maximum number of tokens an LLM can process in a single forward pass — input prompt plus generated output — defining the upper bound on how much information the model can "see" at once when answering a question. Context windows have grown dramatically: GPT-2 (2019) had 1024 tokens, GPT-3 (2020) had 2048-4096, GPT-3.5 reached 4K then 16K, GPT-4 launched at 8K then 32K then 128K, Claude 2 was 100K, Claude 3 family is 200K, Gemini 1.5 introduced 1M (with a 2M variant available), and Llama 3.1 launched 128K. By 2025, frontier-class context windows have stabilized at 128K to 1M tokens, with Gemini and a handful of others pushing further. The technical reason context windows are bounded is the quadratic attention cost (every token attends to every other token, so cost scales as O(n²) in both compute and memory), partially mitigated by techniques like FlashAttention, PagedAttention, sliding-window attention, and RoPE extrapolation. Long context comes with caveats: effective context use degrades well before the nominal limit — the "lost in the middle" effect (Liu et al., 2023) shows that information in the middle of a long context is recalled less reliably than information at the beginning or end. Practical strategies for managing context include RAG (retrieve relevant chunks rather than dumping everything), prompt compression (LongLLMLingua, AutoCompressors), summarization-of-summaries for very long documents, and sliding-window processing for streaming inputs. Pricing scales with context: most providers charge by input tokens, and a 100K-token prompt for GPT-4o costs ~$0.25 just for the input. AI governance teams set context-window policies per skill — chat applications might cap at 32K to control cost, while one-shot document analysis might allow the full 200K.

Context budgets from 25 years of usage discipline: Centralpoint enforces context-window policies per skill, per audience, and per model — extending the same usage-budget discipline Oxcyon has applied to enterprise content for 25 years. Context budgets enforced on-premise, tokens meter per skill, and context-budgeted chatbots deploy through one line of JavaScript.


Related Keywords:
Context Window,Context Window,Oxcyon, AI, AI Governance, Generative AI, Inference, Inference, Inferencing, RAG, Prompts, Skills Manager,