Mischa Sigtermans

Thoughts
· AI Open Source

How I saved 4.7 billion tokens with a Claude Code hook

A Claude Code hook routes my bash commands through rtk, a CLI proxy that strips token waste before it hits the model. 4.7 billion tokens saved across 9,343 commands.

I ran one command on my machine this morning and the number it spat out is the most embarrassing stat I've collected in a year of using Claude Code. 4.7 billion tokens saved. Across 9,343 rewritten commands. At a 99.4% efficiency rate. That's the delta between what Claude Code would have fed into its context window and what it ended up actually seeing.

rtk gain showing 4.7 billion tokens saved at 99.4% efficiency

The saving isn't Claude Code getting smarter. It's a Claude Code hook I run called rtk, and it does something embarrassingly obvious in hindsight. It looks at every bash command Claude Code is about to run, checks whether there's a token-efficient version of that command, and quietly rewrites the command before it executes. The model still thinks it ran 'git diff'. What it got back was a version of 'git diff' that had already been stripped of noise. Same signal, a tiny fraction of the tokens.

I want to walk through what it does, how it's wired up, and why I think this pattern matters more than a neat trick.

The problem nobody talks about

The first time you hand Claude Code a real repository, you notice a specific failure mode. You ask it to look at something simple, and it runs 'ls' on a directory with six hundred files. It runs 'cat' on a 4,000-line log. It runs 'git diff' on a branch that's three weeks behind main. Each of those commands is cheap to run. Each of them returns an output that Claude Code has to read, tokenize, and carry forward as context for the rest of the conversation.

This is the invisible cost of the agent loop. It isn't the model's thinking, and it isn't your prompt. It's the ambient noise in the world the model has to look at. And because you're charged and context-budgeted by tokens, every line of whitespace, every duplicate error, every ASCII-art progress bar in a test runner is a tiny tax on the rest of your session. A bad 'ls' doesn't just waste tokens now. It also shortens how many turns Claude Code has left before the context window starts aging out.

I'd been grumbling about this for months. I would pin Opus to do something non-trivial, watch it burn 30,000 tokens reading a grep result I didn't care about, and curse my own shell setup for not being smarter about it. That's the problem rtk solves.

What rtk actually is

rtk is a high-performance CLI proxy. Technically, it's a Rust binary that wraps dozens of common commands and replaces their output with a token-optimized version. In practice, it means that 'rtk ls' gives you a scannable directory listing without the noise. 'rtk grep' groups matches by file and truncates long lines. 'rtk git diff' returns only the parts of the diff that actually changed. 'rtk read' loads a file but strips the parts an LLM doesn't care about.

It's not a wrapper that calls the underlying command and pipes it through a regex. It's a real binary that knows the semantics of each tool and reformats the output for machine consumption instead of human consumption. There are rtk variants for ls, tree, find, grep, read, git, gh, docker, kubectl, cargo, pytest, npm, pnpm, tsc, vitest, prisma, eslint, prettier, and about forty other things I personally use.

The project is open source and you can install it as a standalone tool and use it manually. That's fine, but it misses the point. The value of rtk in a Claude Code workflow is that it fires automatically, without the model having to know it exists.

That's where the hook comes in.

The hook itself

Claude Code has a hook called PreToolUse that fires before any tool runs. You can match it to specific tools, including Bash. When it fires, it hands you the command the model is about to execute, and you can rewrite it, allow it, deny it, or ask the user to confirm. Most people use PreToolUse for safety checks. I use it as a silent optimizer.

My setup is four lines in ~/.claude/settings.json:

"hooks": {
  "PreToolUse": [
    {
      "matcher": "Bash",
      "hooks": [
        { "type": "command", "command": "/Users/Mischa/.claude/hooks/rtk-rewrite.sh" }
      ]
    }
  ]
}

The script itself is about a hundred lines of bash and does one thing. It reads the incoming command, calls rtk rewrite <cmd>, and interprets the exit code. If rtk knows a better version, the hook swaps it in and auto-allows. If rtk doesn't have a rewrite, the command passes through untouched. If a deny rule matches, Claude Code's native deny handling takes over. If an ask rule matches, the rewritten command is returned but the user gets prompted before it runs.

None of that logic lives in my shell script. The shell script is a thin delegator. All the real rewrite knowledge lives inside the rtk Rust binary. That split matters because it means the rewrite rules can be improved and versioned and tested without me touching the hook. I wrote the hook once, six months ago, and haven't edited it since. The savings keep compounding because rtk keeps getting smarter.

What the numbers look like on my machine

When I run rtk gain on my machine, I get a summary of everything the hook has done since I installed it. The headline number is the one I opened this post with: 4.7 billion tokens saved, across 9,343 rewritten commands, at 99.4% efficiency. The breakdown is instructive.

rtk gain breakdown by command, showing rtk read at 4.6 billion tokens saved

The biggest single contributor is rtk read. 819 reads across six months. 4.6 billion tokens saved on that command alone, at an average of 25.9% savings per invocation. That's because 'read' is where Claude Code bleeds the most context in agent mode. The next largest contributors are the obvious ones. 1,907 rewrites of ls. 735 rewrites of grep. 459 rewrites of cargo test, which averages 86.9% token savings because test output is mostly repeated passing lines. A handful of curl rewrites that each individually saved 30 to 40 million tokens because someone pulled a large JSON file that didn't need to be materialized fully.

The thing I'd like to stress is that none of this was me being clever during the session. The hook fires whether I'm paying attention or not. Every command Claude Code runs is already going through this filter. The savings are the background hum of a decision I made once in a settings file.

Why this pattern matters more than it looks

I wrote two months ago that the economics around AI models are what matter more than the model weights, and that the smart move is to watch the knobs, not the weights. rtk is the flip side of that same argument. If you can't control what the lab is doing to your quota, you can control what your quota is being spent on.

Every token you save on a 'ls' is a token you don't have to pay for with an 'overloaded' retry two hours later. Every 4K log that gets trimmed to 200 lines is more headroom for the actual reasoning the model needs to do. And the better your tools are at stripping ambient noise before it hits the context window, the longer your sessions last before they need to compact.

This is the real reason I care about the hook. Not the 4.7 billion tokens, which is a fun number at a dinner. The real reason is that it changes what Claude Code feels like to use in long Ralph loops. A loop that used to die from context poisoning at story six now runs through story fifteen because the noise isn't there anymore. That's not a tooling improvement. That's a product improvement I got for free because I put a hook in front of the bash tool.

How to set it up

If you want to try it, the steps are:

  1. Install rtk from github.com/rtk-ai/rtk. You'll need Rust or a prebuilt binary.
  2. Make sure jq is installed. The hook script uses it to parse Claude Code's tool input JSON.
  3. Drop the rewrite hook script in ~/.claude/hooks/rtk-rewrite.sh and mark it executable.
  4. Add the PreToolUse block to ~/.claude/settings.json pointing at the script.
  5. Run rtk gain after a day or two and watch the savings pile up.

The hook is a thin delegator, so there's nothing to maintain. The rtk binary does the work, and upgrading the binary upgrades the rewrite rules. I've been running this setup since October. I will not be turning it off. The 4.7 billion token counter goes up every time Claude Code opens its mouth, and every one of those tokens is one I didn't have to think about.

That's the goal. Every hook you add to Claude Code is a decision to stop thinking about something. The good hooks pay you back in quota and context for the rest of the session. This is one of the best ones I've installed.

thanks for reading

Hi, I'm Mischa. I've been Shipping products and building ventures for over a decade. First exit at 25, second at 30. Now Partner & CPO at Ryde Ventures, an AI venture studio in Amsterdam. Currently shipping Stagent and Onoma. Based in Hong Kong. I write about what I learn along the way.

Keep reading: Joining Ryde Ventures as partner and chief product officer.

Thoughts