This is the practical, tested guide to making that number smaller without giving up the tool. We use Claude Code every day for our own engineering work, and everything below is a technique we've actually applied to our own bill.

We'll go top to bottom:

  1. Understand where the money is going — because you can't optimize what you can't see
  2. Configuration tweaks — the cheapest wins, done in minutes
  3. Model routing — where the 5× leverage lives
  4. Agent iteration discipline — the advanced stuff that keeps saving you money forever
  5. Infrastructure layer — when you're done with everything above and the bill is still too big

If you're reading this because your Anthropic invoice scared you this month, start at the top and work down. Each section layers.


1. Know Where The Money Is Going

The single biggest mistake we see: someone is surprised by the bill but has no idea which sessions caused it. Before you optimize anything, get visibility.

What to look at:

Once you see the shape of the spend, you can target the problem. If 80% of it is agent loops, that's where to optimize. If 80% is long single prompts, that's a different fix.


2. Configuration Tweaks (Do These First)

Drop to Sonnet by default

Claude Code ships with sensible defaults, but many users run it on Opus for everything "to be safe." Opus is 5× the cost of Sonnet. For 80%+ of coding tasks, Sonnet is indistinguishable from Opus on the outcome.

Set your default to Sonnet. Reach for Opus explicitly when you're doing deep planning, architecture discussions, or genuinely ambiguous problems. Most days, you won't need it.

Real numbers: a user switching from Opus-default to Sonnet-default typically sees their bill drop ~70% overnight, with no meaningful quality change on day-to-day work.

Cap the session context

Claude Code sessions carry context forward. A long session accumulates everything — all file reads, all tool calls, all prior reasoning — and every new turn pays to read that growing transcript.

Use /clear between unrelated tasks. Don't let a single session grow past 100k tokens unless the task actually requires it. Fresh session = fresh context = fresh, cheap turns.

Turn on extended prompt caching

Claude Code automatically caches the stable portion of your session (system prompt, tool schemas, early conversation) at the 5-minute cache tier. That's free and on by default.

If your workflow has gaps longer than 5 minutes between turns, flipping to the 1-hour cache tier keeps the cache hot between bursts of activity. Costs slightly more per cache write, saves a lot more on cache reads. Net win if you're bursty.

Disable what you don't use

Vision input, PDF handling, and some tool integrations have their own token costs. If you never send images or parse PDFs through Claude Code, nothing changes. If you do it occasionally, know that a single big PDF can cost more than an hour of plain coding.


3. Model Routing Inside Claude Code

This is where the biggest leverage lives, and where most users never go.

The three-tier mental model

For any task Claude Code is about to handle, ask: does this need Opus, Sonnet, or Haiku?

Most Claude Code sessions are 80% Haiku-capable tasks. If you're running everything on Sonnet (let alone Opus), you're leaving 3–10× savings on the floor.

Practical routing inside an agent loop

Claude Code's default will use the model you specified for every turn. You can do better:

In our own usage, splitting like this cut our Claude Code spend 4× without any user-visible quality loss.

Reflect on what the model is doing

Before clicking "run" on a new Claude Code session, spend 30 seconds thinking: what does this task actually need from the model? If the answer is "find and fix a typo," don't send it to Opus. If it's "design a new auth system," Haiku isn't going to cut it.

That 30-second pause is worth $20 on a busy day.


4. Agent Iteration Discipline

If you're running Claude Code more than casually, this is probably your biggest leak.

The iteration compounding problem

Claude Code iterates. Every tool call and every response extends the conversation. Each new turn re-reads the whole transcript. By iteration 30, that transcript might be 80k tokens, and every new turn is paying $0.24 just to read what came before.

If the session goes for 100 turns, you could easily spend $20 on context regrowth alone, before any actual reasoning.

The fixes:

Watch for the "let me just try that again" pattern

Claude Code occasionally gets stuck and retries the same thing. Every retry costs money. Watch your terminal. If you see the same command running three times, kill it and rewrite the instructions. Don't let it thrash.

Cache the heavy stuff

If you routinely point Claude Code at the same large codebase and ask similar questions, you're re-loading that codebase into context every session. Consider:

A small CLAUDE.md can cut per-session costs by 20–40% on repeated sessions against the same repo.


5. Infrastructure Layer

You've dropped to Sonnet default. You're routing Haiku for the easy stuff. Your sessions are disciplined. Prompt caching is on. And the bill is still bigger than you want it to be.

At this point, the remaining savings live in the infrastructure layer — the proxy between Claude Code and Anthropic's API.

How it works

Claude Code uses the ANTHROPIC_API_KEY and ANTHROPIC_BASE_URL environment variables. Change the base URL to a proxy, and traffic flows through it instead of hitting Anthropic direct. Your prompts don't change, your workflow doesn't change, your Claude Code commands don't change.

The good proxies in this category use proprietary infrastructure to deliver the same outputs at a fraction of the per-call cost. The best ones preserve BYOK — you keep your Anthropic account and key; the proxy just routes on top.

What aiusage specifically does

We're a BYOK Claude proxy. You paste your Anthropic key on aiusage.ai, we give you a replacement ANTHROPIC_BASE_URL that points at our infrastructure. Claude Code continues to work exactly as before, and your effective per-call cost drops ~20×. Credit packs from $10, credits never expire, no subscription.

Nick, our first paying Claude Code customer, was running Claude Code 6–8 hours a day. Before aiusage, his month's Anthropic budget ran out by day 6. After: it lasted the full month and then some. He didn't change how he uses the tool.

When a proxy won't help


A Recommended Order

Here's the order we'd recommend walking through if you're serious about lowering the Claude Code bill:

  1. Today: Get visibility. Anthropic console + /cost command. Know your split.
  2. Today: Default to Sonnet, not Opus.
  3. Tomorrow: Start using /clear aggressively. Cap session context.
  4. This week: Add a CLAUDE.md to repos you use often.
  5. This week: Route Haiku for the easy tasks, Sonnet for reasoning.
  6. Next week: Audit your agent loops. Cap scope. Kill thrashing sessions early.
  7. Then: Evaluate an infrastructure layer. Try aiusage for $10 if you're still bigger than you want to be.

A committed user walking this list typically gets 80–95% of the way to a sane bill by the end of week two. It's very rare to need something more exotic than the above.


Final Thought: Bill Size Is A Habit

The reason most Claude Code bills are too big isn't a specific misconfiguration — it's that nobody was watching. Once you start paying attention to the /cost counter and your session hygiene, the number naturally drifts down over time.

Treat your Claude Code bill like you'd treat your AWS bill: something to monitor, budget for, and optimize monthly. Teams that do this stay cheap forever. Teams that don't get a surprise.

If you want the fast path — same Claude Code, 20× cheaper, no config changes beyond one env var — start with a $10 pack on aiusage. Otherwise, work the list above. Either way, your bill next month will be smaller.

Drop your Claude Code bill 20×.

Paste your key at aiusage.ai — takes 60 seconds. BYOK, credit packs from $10, credits never expire.

Get started →

Written by the team at aiusage.ai. We run a BYOK Claude proxy that makes your existing Claude Code setup ~20× cheaper. If that's interesting, read more or grab a $10 pack to try it.