This is the practical, tested guide to making that number smaller without giving up the tool. We use Claude Code every day for our own engineering work, and everything below is a technique we've actually applied to our own bill.
We'll go top to bottom:
- Understand where the money is going — because you can't optimize what you can't see
- Configuration tweaks — the cheapest wins, done in minutes
- Model routing — where the 5× leverage lives
- Agent iteration discipline — the advanced stuff that keeps saving you money forever
- Infrastructure layer — when you're done with everything above and the bill is still too big
If you're reading this because your Anthropic invoice scared you this month, start at the top and work down. Each section layers.
1. Know Where The Money Is Going
The single biggest mistake we see: someone is surprised by the bill but has no idea which sessions caused it. Before you optimize anything, get visibility.
What to look at:
- Anthropic console → Usage. Shows token volume by day and by model. Opus vs Sonnet vs Haiku breakdown matters a lot.
- Claude Code's built-in
/costcommand. Shows you the running cost of the current session. Use it mid-task to understand where money goes. - Your OS-level shell history. Which prompts kicked off the expensive sessions? Usually it's a pattern — "refactor everything," "run all tests until they pass," that kind of thing.
Once you see the shape of the spend, you can target the problem. If 80% of it is agent loops, that's where to optimize. If 80% is long single prompts, that's a different fix.
2. Configuration Tweaks (Do These First)
Drop to Sonnet by default
Claude Code ships with sensible defaults, but many users run it on Opus for everything "to be safe." Opus is 5× the cost of Sonnet. For 80%+ of coding tasks, Sonnet is indistinguishable from Opus on the outcome.
Set your default to Sonnet. Reach for Opus explicitly when you're doing deep planning, architecture discussions, or genuinely ambiguous problems. Most days, you won't need it.
Real numbers: a user switching from Opus-default to Sonnet-default typically sees their bill drop ~70% overnight, with no meaningful quality change on day-to-day work.
Cap the session context
Claude Code sessions carry context forward. A long session accumulates everything — all file reads, all tool calls, all prior reasoning — and every new turn pays to read that growing transcript.
Use /clear between unrelated tasks. Don't let a single session grow past 100k tokens unless the task actually requires it. Fresh session = fresh context = fresh, cheap turns.
Turn on extended prompt caching
Claude Code automatically caches the stable portion of your session (system prompt, tool schemas, early conversation) at the 5-minute cache tier. That's free and on by default.
If your workflow has gaps longer than 5 minutes between turns, flipping to the 1-hour cache tier keeps the cache hot between bursts of activity. Costs slightly more per cache write, saves a lot more on cache reads. Net win if you're bursty.
Disable what you don't use
Vision input, PDF handling, and some tool integrations have their own token costs. If you never send images or parse PDFs through Claude Code, nothing changes. If you do it occasionally, know that a single big PDF can cost more than an hour of plain coding.
3. Model Routing Inside Claude Code
This is where the biggest leverage lives, and where most users never go.
The three-tier mental model
For any task Claude Code is about to handle, ask: does this need Opus, Sonnet, or Haiku?
- Haiku is enough for: file searches, read-and-summarize, JSON extraction, routing decisions, formatting, quick edits, boilerplate, "which file contains X," anything where the reasoning step is mechanical.
- Sonnet is enough for: most code generation, refactors, debugging, test writing, PR reviews, walking through a codebase.
- Opus is for: architecture debates, multi-hour planning, genuinely novel algorithms, ambiguous requirements you have to unwind.
Most Claude Code sessions are 80% Haiku-capable tasks. If you're running everything on Sonnet (let alone Opus), you're leaving 3–10× savings on the floor.
Practical routing inside an agent loop
Claude Code's default will use the model you specified for every turn. You can do better:
- Run the planning step on Sonnet or Opus (smart, expensive, once).
- Run the execution steps on Haiku wherever possible (fast, cheap, many).
- Keep Opus in your back pocket for genuinely hard debates.
In our own usage, splitting like this cut our Claude Code spend 4× without any user-visible quality loss.
Reflect on what the model is doing
Before clicking "run" on a new Claude Code session, spend 30 seconds thinking: what does this task actually need from the model? If the answer is "find and fix a typo," don't send it to Opus. If it's "design a new auth system," Haiku isn't going to cut it.
That 30-second pause is worth $20 on a busy day.
4. Agent Iteration Discipline
If you're running Claude Code more than casually, this is probably your biggest leak.
The iteration compounding problem
Claude Code iterates. Every tool call and every response extends the conversation. Each new turn re-reads the whole transcript. By iteration 30, that transcript might be 80k tokens, and every new turn is paying $0.24 just to read what came before.
If the session goes for 100 turns, you could easily spend $20 on context regrowth alone, before any actual reasoning.
The fixes:
- Cap task scope. Don't say "refactor the whole repo." Say "refactor this file, then stop." Long open-ended missions are where the bill blooms.
- Close and restart. When a task finishes,
/clearand start fresh. Don't use one mega-session for unrelated work. - Summarize and reset. If you're deep in a long task and the context is getting expensive, ask Claude Code to summarize progress, then start a new session with the summary as seed context.
- Interrupt hallucinated loops. Sometimes the agent decides to "run tests 17 more times." Cancel, reset scope, give clearer instructions.
Watch for the "let me just try that again" pattern
Claude Code occasionally gets stuck and retries the same thing. Every retry costs money. Watch your terminal. If you see the same command running three times, kill it and rewrite the instructions. Don't let it thrash.
Cache the heavy stuff
If you routinely point Claude Code at the same large codebase and ask similar questions, you're re-loading that codebase into context every session. Consider:
- A
CLAUDE.mdat the repo root with the architecture summary. Claude Code reads it automatically. Fewer tool calls to discover structure. - Smaller, focused sub-tasks instead of "understand the entire system."
- Restart with fresh context when the old context has stopped being useful.
A small CLAUDE.md can cut per-session costs by 20–40% on repeated sessions against the same repo.
5. Infrastructure Layer
You've dropped to Sonnet default. You're routing Haiku for the easy stuff. Your sessions are disciplined. Prompt caching is on. And the bill is still bigger than you want it to be.
At this point, the remaining savings live in the infrastructure layer — the proxy between Claude Code and Anthropic's API.
How it works
Claude Code uses the ANTHROPIC_API_KEY and ANTHROPIC_BASE_URL environment variables. Change the base URL to a proxy, and traffic flows through it instead of hitting Anthropic direct. Your prompts don't change, your workflow doesn't change, your Claude Code commands don't change.
The good proxies in this category use proprietary infrastructure to deliver the same outputs at a fraction of the per-call cost. The best ones preserve BYOK — you keep your Anthropic account and key; the proxy just routes on top.
What aiusage specifically does
We're a BYOK Claude proxy. You paste your Anthropic key on aiusage.ai, we give you a replacement ANTHROPIC_BASE_URL that points at our infrastructure. Claude Code continues to work exactly as before, and your effective per-call cost drops ~20×. Credit packs from $10, credits never expire, no subscription.
Nick, our first paying Claude Code customer, was running Claude Code 6–8 hours a day. Before aiusage, his month's Anthropic budget ran out by day 6. After: it lasted the full month and then some. He didn't change how he uses the tool.
When a proxy won't help
- If your bill is <$50/mo, skip this layer. You won't notice the savings and you'll add cognitive overhead.
- If you're multi-model shopping (Claude + GPT + Gemini), use OpenRouter for that. Savings proxies are Claude-specific.
- If you haven't done steps 1–4, do those first. A 20× infrastructure saving on top of a still-bloated prompt is a 20× bigger savings on a smaller base when you clean up prompts first.
A Recommended Order
Here's the order we'd recommend walking through if you're serious about lowering the Claude Code bill:
- Today: Get visibility. Anthropic console +
/costcommand. Know your split. - Today: Default to Sonnet, not Opus.
- Tomorrow: Start using
/clearaggressively. Cap session context. - This week: Add a
CLAUDE.mdto repos you use often. - This week: Route Haiku for the easy tasks, Sonnet for reasoning.
- Next week: Audit your agent loops. Cap scope. Kill thrashing sessions early.
- Then: Evaluate an infrastructure layer. Try aiusage for $10 if you're still bigger than you want to be.
A committed user walking this list typically gets 80–95% of the way to a sane bill by the end of week two. It's very rare to need something more exotic than the above.
Final Thought: Bill Size Is A Habit
The reason most Claude Code bills are too big isn't a specific misconfiguration — it's that nobody was watching. Once you start paying attention to the /cost counter and your session hygiene, the number naturally drifts down over time.
Treat your Claude Code bill like you'd treat your AWS bill: something to monitor, budget for, and optimize monthly. Teams that do this stay cheap forever. Teams that don't get a surprise.
If you want the fast path — same Claude Code, 20× cheaper, no config changes beyond one env var — start with a $10 pack on aiusage. Otherwise, work the list above. Either way, your bill next month will be smaller.
Drop your Claude Code bill 20×.
Paste your key at aiusage.ai — takes 60 seconds. BYOK, credit packs from $10, credits never expire.
Get started →Written by the team at aiusage.ai. We run a BYOK Claude proxy that makes your existing Claude Code setup ~20× cheaper. If that's interesting, read more or grab a $10 pack to try it.