LLM Token Price Index 2026: How Fast AI Inference Costs Are Falling

The price of a given level of AI capability is collapsing faster than almost any technology in history — roughly an order of magnitude per year. The cost to run a GPT-3.5-class model fell more than 280× in about two years, even as frontier prices held steadier.

The big picture · inference economics

  • 📉~280× in ~2 years. Stanford HAI’s 2025 AI Index puts the cost of a GPT-3.5-level system at $20.00 per million tokens (Nov 2022) falling to $0.07 (Oct 2024).
  • “LLMflation.” a16z estimates the cost to reach a fixed capability bar drops roughly 10× per year; Epoch AI’s longer series finds task-specific declines of up to ~50× per year.
  • 🔮Frontier prices fall slower. Top-tier model pricing declines more gently — the cliff is in capability-per-dollar, not in the sticker price of the newest model.
  • 🏷️Why it matters. Falling token costs are the engine behind agentic and test-time-compute workloads — use cases uneconomic in 2023 are routine by 2026.
280×
cost drop, GPT-3.5-class
Nov 2022 → Oct 2024
~10×/yr
cost decline at fixed
capability (a16z)
$0.07
per 1M tokens for a
GPT-3.5-level system (Oct 2024)
Confidence & freshness key. High primary/company data · Medium analyst/index estimate · Vendor-listed price not independently verified here. Freshness: 🟢 Active (current pricing) · 🟡 Stale (recheck) · ⚫ Historical (point-in-time).

The capability-cost curve (Stanford HAI)

The clearest single series comes from Stanford HAI’s AI Index 2025, which tracks the inference price of reaching GPT-3.5 performance (MMLU 64.8) over time. The decline is steep and roughly log-linear.

Endpoints (Nov 2022 $20.00 → Oct 2024 $0.07) from Stanford HAI AI Index 2025; intermediate points interpolated to illustrate the log-linear trend. Medium ⚫ Historical

Current frontier API pricing (per 1M tokens)

Below is current published API pricing. Anthropic’s figures are confirmed against the provider’s own pricing; other vendors’ list prices are shown as Vendor-listed because they could not be independently re-verified for this edition — always confirm against the provider’s pricing page before relying on them.

ModelInput ($/1M)Output ($/1M)Confidence
Anthropic Claude Opus 4.8$5.00$25.00High 🟢
Anthropic Claude Sonnet 4.6$3.00$15.00High 🟢
Anthropic Claude Haiku 4.5$1.00$5.00High 🟢
Other frontier vendors (OpenAI, Google, DeepSeek, xAI)List prices vary; confirm on each provider’s pricing pageVendor-listed 🟡
Anthropic pricing verified against provider data, June 2026. Other vendors flagged for reader verification.

Why “price per token” understates the drop

  • Capability is rising at the same time. A dollar buys both cheaper tokens and a more capable model, so cost-per-useful-task falls faster than cost-per-token.
  • Open-weight models reset the floor. Competitive open models (e.g. DeepSeek, Llama-class) pull hosted prices down across the board.
  • Frontier is the exception. The newest top model is priced for willingness-to-pay; the deflation shows up once a capability tier becomes commoditized.

Data lineage

Sources & method. Capability-cost series: Stanford HAI AI Index 2025. “LLMflation” ~10×/yr: Andreessen Horowitz (a16z). Longer-run task-cost declines: Epoch AI. Anthropic pricing: provider pricing data (June 2026). Sample: GPT-3.5-level (MMLU 64.8) inference price. Window: Nov 2022 – Oct 2024 (curve); June 2026 (current API prices). Confidence: Medium–High; intermediate chart points interpolated.

FAQ

How fast are AI inference costs falling?

For a fixed capability level, roughly 10× per year per a16z; Stanford HAI documents a ~280× fall for GPT-3.5-class inference between November 2022 and October 2024.

Does this mean frontier models are getting cheaper?

Not as quickly. The steep decline applies to reaching a given capability bar. The newest frontier models hold their price longer; deflation hits a tier once cheaper or open-weight models match it.