This post is that one place. Every lever that moves your Anthropic invoice, with concrete numbers and worked examples. Current as of April 2026.


The Base Rates

Claude's pricing is per million tokens, split between input and output. Output tokens are always more expensive than input tokens — a ratio that usually lands around 5×.

As of April 2026:

ModelInput / M tokensOutput / M tokens
Claude Opus 4.x$15.00$75.00
Claude Sonnet 4.x$3.00$15.00
Claude Sonnet 4.x (1M context)$6.00$30.00
Claude Haiku 4.x$0.80$4.00

These are the "direct" list rates. Everything else on this page is either a discount off these, a multiplier on top, or a conversion rule that turns non-text content (images, PDFs, tool calls) into billable tokens.

Why output costs 5× input. Generating tokens requires the full forward pass through the model for each token produced. Reading tokens is cheaper because they're processed in parallel. The economics of transformer inference favor readers over writers.

Why 1M-context Sonnet costs 2×. Longer context windows require more memory and more expensive infrastructure per call. Anthropic prices the 1M variant at exactly 2× the base Sonnet rate.


Prompt Caching: The Biggest Discount

Prompt caching can save you up to 90% on cached input tokens. It's the single largest cost lever Anthropic offers, and most teams leave it partly on the table.

How it's billed (April 2026):

Concrete example. You have a 10,000-token system prompt you send with every call.

Without caching, after 1,000 calls:

With caching, after 1,000 calls:

Gotchas:

  1. Caching only works for exact prefix matches. Change the first character of your system prompt and you invalidate the cache.
  2. Dynamic content (dates, user IDs, "Today is…") must live at the end of the prompt, not the beginning, or you defeat the whole system.
  3. The cache expires on its TTL regardless of activity. High-frequency calls benefit more than low-frequency.

Batch API: 50% Off For Patient Workloads

If a workload can wait up to 24 hours, the Batch API gives you 50% off both input and output tokens. You submit a batch of requests, Anthropic processes them asynchronously, and results come back to a bucket you pull from. In practice, batches usually complete in minutes, not hours.

When batch wins:

When it doesn't:

A reasonable mental model: if you could do the work in a cron job, use batch. If someone is staring at a loading spinner, use the standard API.


Tool Use: The Round-Trip Tax

When Claude uses a tool, you pay for two completions minimum, often more:

  1. First call: Claude reads the prompt + tool definitions, decides to call a tool, emits the tool call. You pay for input + output.
  2. Your code runs the tool and returns the result.
  3. Second call: Claude reads the original prompt + tool definitions + tool call + tool result, and either emits the final answer or calls another tool. You pay for input + output again.

Each round trip re-reads the full context. If your tool definitions are 3,000 tokens and you have a 10-tool agent loop, you're reading those 3,000 tokens 10+ times unless you cache them.

Practical rule. Tool definitions are the single best candidate for caching. They're stable (you're not redefining get_weather per-call) and they're re-read on every round trip. Wrapping tool schemas in a cache marker typically cuts agent-loop costs 40–70%.


Vision: Images Are Tokens

Anthropic converts images to tokens at a rate that depends on image dimensions:

Real-world framing: a typical screenshot pasted into a conversation is ~1,200–2,000 tokens. A full-resolution photo from a phone is ~3,000+ tokens. At Sonnet input rates ($3/M), that's roughly a penny per image. At Opus rates ($15/M), it's five cents per image.

If you're running vision at volume — say, a product that analyzes 10,000 screenshots a day on Sonnet — that's:

10,000 × 1,500 tokens × $3/M = $45/day = ~$1,350/month in vision input alone.

Worth modeling before shipping.


PDF Input: Pages Are Tokens (With A Multiplier)

PDFs are processed as a mix of text extraction + rasterized images for each page. Anthropic's effective rate lands around ~2,000–3,000 tokens per page for typical documents, sometimes higher for scanned PDFs with lots of imagery.

A 40-page research paper is roughly 100,000 tokens. At Sonnet rates that's 30 cents to have Claude read it once. At Opus rates it's $1.50. Summarize it 1,000 times across your user base and you're at $300–$1,500 just for document ingestion.

Takeaway: cache parsed PDFs. If 100 users are going to ask questions about the same document, the document itself should be cached after the first parse, not re-billed each time.


The 1M Context Window: Priced Accordingly

Claude Sonnet's 1M-context variant is double the normal Sonnet rate: $6 input / $30 output per million tokens.

It's genuinely useful — whole codebases, entire knowledge bases, giant audit trails — but it's easy to burn through money if you default to it. Most prompts don't need 1M tokens of context. Use the standard context window unless you have a specific reason not to.

A real example: feeding a 400K-token codebase into Claude Sonnet 1M once is 400,000 × $6/M = $2.40. Do that on every call and an active user costs you $100+/day.


The Complete Cost Formula

If you want the full math for a single call:

call_cost =
    (cached_input_tokens       × rate_cache_read)        // ~0.10× input rate
  + (new_cached_input_tokens   × rate_cache_write)       // ~1.25× input rate
  + (uncached_input_tokens     × rate_input)             // full input rate
  + (image_token_equivalent    × rate_input)             // vision
  + (pdf_token_equivalent      × rate_input)             // pdf
  + (output_tokens             × rate_output)            // full output rate
  + (tool_roundtrips - 1)      × per_roundtrip_cost      // each extra call re-bills context

Apply × 0.5 if the whole thing went through the Batch API.

It's a lot. That's why most teams don't actually model it and then get surprised by the invoice.


What Actually Shows Up On Your Invoice

Anthropic itemizes by model and by input/output/cache category. A typical monthly bill for a mid-size team might look like:

If your invoice is imbalanced — say, output dwarfing everything else — that's a signal to shorten responses or use structured output. If uncached input dwarfs cache-read, you have caching left on the table. If Opus dominates, you're over-spec'd and Sonnet probably handles 80% of it.


The Alternative: A BYOK Savings Layer

If you've read this far, you probably want one of two things:

  1. Full control. Model your cost end-to-end, tune every knob, squeeze the invoice yourself. Work the levers above — caching, batch, model selection, output discipline.
  2. Less invoice. Don't want to become a token-optimization specialist. Just want the same Claude output for a fraction of the price.

If it's #2, that's what aiusage does. Same Claude, same Claude Code, same SDK — routed through proprietary infrastructure that delivers the same output while billing Anthropic ~20× less per call. You pay us a flat credit-pack fee ($10/15 runs, $25/50, $50/120), your Anthropic key stays in your account, no subscription, credits never expire. Most of the levers in this post still apply (caching is still worth it, model selection still matters) — we just make the underlying per-call Anthropic cost dramatically smaller.

If you want #1, skip us and work the list. Either way, you're going to get a smaller bill than the default.


Quick Reference (Save This)

Print this. Tape it to your monitor. The next time your bill spikes, the answer is probably on the list.

Drop your Claude bill 20×.

Paste your key at aiusage.ai — takes 60 seconds. BYOK, credit packs from $10, credits never expire.

Get started →

Written by the team at aiusage.ai — the BYOK Claude proxy that makes your existing Anthropic account ~20× cheaper. See the math or grab a $10 pack to try it.