The price of a given level of AI capability is collapsing faster than almost any technology in history — roughly an order of magnitude per year. The cost to run a GPT-3.5-class model fell more than 280× in about two years, even as frontier prices held steadier.
The big picture · inference economics
- 📉~280× in ~2 years. Stanford HAI’s 2025 AI Index puts the cost of a GPT-3.5-level system at $20.00 per million tokens (Nov 2022) falling to $0.07 (Oct 2024).
- ⚡“LLMflation.” a16z estimates the cost to reach a fixed capability bar drops roughly 10× per year; Epoch AI’s longer series finds task-specific declines of up to ~50× per year.
- 🔮Frontier prices fall slower. Top-tier model pricing declines more gently — the cliff is in capability-per-dollar, not in the sticker price of the newest model.
- 🏷️Why it matters. Falling token costs are the engine behind agentic and test-time-compute workloads — use cases uneconomic in 2023 are routine by 2026.
Nov 2022 → Oct 2024
capability (a16z)
GPT-3.5-level system (Oct 2024)
The capability-cost curve (Stanford HAI)
The clearest single series comes from Stanford HAI’s AI Index 2025, which tracks the inference price of reaching GPT-3.5 performance (MMLU 64.8) over time. The decline is steep and roughly log-linear.
Endpoints (Nov 2022 $20.00 → Oct 2024 $0.07) from Stanford HAI AI Index 2025; intermediate points interpolated to illustrate the log-linear trend. Medium ⚫ Historical
Current frontier API pricing (per 1M tokens)
Below is current published API pricing. Anthropic’s figures are confirmed against the provider’s own pricing; other vendors’ list prices are shown as Vendor-listed because they could not be independently re-verified for this edition — always confirm against the provider’s pricing page before relying on them.
| Model | Input ($/1M) | Output ($/1M) | Confidence |
|---|---|---|---|
| Anthropic Claude Opus 4.8 | $5.00 | $25.00 | High 🟢 |
| Anthropic Claude Sonnet 4.6 | $3.00 | $15.00 | High 🟢 |
| Anthropic Claude Haiku 4.5 | $1.00 | $5.00 | High 🟢 |
| Other frontier vendors (OpenAI, Google, DeepSeek, xAI) | List prices vary; confirm on each provider’s pricing page | Vendor-listed 🟡 | |
Why “price per token” understates the drop
- Capability is rising at the same time. A dollar buys both cheaper tokens and a more capable model, so cost-per-useful-task falls faster than cost-per-token.
- Open-weight models reset the floor. Competitive open models (e.g. DeepSeek, Llama-class) pull hosted prices down across the board.
- Frontier is the exception. The newest top model is priced for willingness-to-pay; the deflation shows up once a capability tier becomes commoditized.
Data lineage
FAQ
How fast are AI inference costs falling?
For a fixed capability level, roughly 10× per year per a16z; Stanford HAI documents a ~280× fall for GPT-3.5-class inference between November 2022 and October 2024.
Does this mean frontier models are getting cheaper?
Not as quickly. The steep decline applies to reaching a given capability bar. The newest frontier models hold their price longer; deflation hits a tier once cheaper or open-weight models match it.