Tips & Tricks (Updated: 6/3/2026)

Claude Code Token Optimization: Use /usage to Cut Cost Without Losing Quality

Cut Claude Code token use with /usage, CLAUDE.md, hooks, subagents, scoped inputs, and OpenTelemetry.

Claude Code Token Optimization: Use /usage to Cut Cost Without Losing Quality

Token optimization in Claude Code is not just a billing topic. When the conversation gets too heavy, responses slow down, old assumptions leak into the current task, and you pay for Claude to reread logs, diffs, and decisions that no longer matter. The real goal is not the smallest possible token count; it is the same engineering quality with less irrelevant context.

As of June 2026, the practical entry point is /usage. The official command reference describes /usage as the command for session cost, plan usage limits, and activity stats. /cost and /stats exist as aliases, but /usage is clearer for documentation and team habits. For Pro, Max, Team, and Enterprise users, the screen can also attribute usage to skills, subagents, plugins, and MCP servers.

This guide uses four layers: observe usage, reduce base context, split noisy work into isolated contexts, and monitor repeatable team workflows. A few terms first: a hook is a shell command or endpoint that runs at a fixed Claude Code lifecycle event; a subagent is a separate agent context for a focused task; a harness is the scaffolding that makes agent work repeatable and safe.

flowchart LR
  A["Observe: /usage and /context"] --> B["Reduce: CLAUDE.md and scoped inputs"]
  B --> C["Split: hooks / skills / subagents"]
  C --> D["Measure: OpenTelemetry and team rules"]

Start with /usage

Do not run /usage after every prompt. Check it at useful boundaries: when a session starts feeling slow, after a large tool run, before a handoff, and after a workflow you expect to repeat. Pair it with /context so you can see whether the cost is coming from the current task, stale conversation history, memory files, or tool context.

The billing nuance matters. The Session block in /usage is a local estimate based on token counts. API users should confirm authoritative billing in the Claude Console. Subscription users should not treat the session dollar figure as their bill; plan usage bars and activity attribution are more relevant for day-to-day limits.

# Run these inside Claude Code
/usage
/context

# When the conversation is long but the work should continue
/compact Preserve changed files, failing tests, decisions, and unresolved questions.

# When switching to unrelated work
/clear

/compact and /clear solve different problems. Compact keeps the session moving by summarizing important state. Clear starts a new context. If you clear too early, Claude loses decisions. If you never clear, every unrelated task pays for stale context.

Keep Always-Loaded Memory Short

Tokens are not only the text you just typed. They also come from conversation history, CLAUDE.md, auto memory, logs, tool outputs, MCP servers, and previous investigation notes. Separate information into three buckets:

BucketBest homeExample
Needed every sessionShort CLAUDE.mdBuild commands, test commands, hard constraints
Needed for this taskConversation and /compactChanged files, failing tests, unresolved decisions
Disposable after inspectionFiltered command outputLong logs, generated diffs, broad search results

The official memory docs say CLAUDE.md files are loaded into context, and long files create recurring overhead. Imports can organize instructions, but they do not help if the imported content is still loaded at startup. Keep CLAUDE.md for the small set of rules that truly apply every day.

# CLAUDE.md

## Project commands
- Build: npm run build
- Test: npm run test
- Type check: npm run typecheck

## Fast navigation
- API code: src/api/
- UI components: src/components/
- Tests: tests/

## Avoid by default
- Do not scan node_modules/, dist/, coverage/, or generated clients.
- Do not paste full logs. Ask for the failing command and relevant lines.

## Compact instructions
When compacting, preserve changed files, failing tests, decisions, credentials policy, and next actions.

Detailed PR review checklists, translation rules, migration runbooks, or release playbooks should usually become skills or separate docs. A skill loads when you invoke it; CLAUDE.md is paid for from the start of the session.

Filter Inputs Before Claude Reads Them

The fastest way to waste tokens is to paste an entire log or diff. Claude needs evidence, not a dump. Keep the full artifact on disk, then pass only the lines that can change the next decision.

# Give Claude the request ID and nearby errors, not the whole production log
tail -n 800 logs/app.log | grep -E -n -C 4 "request_id=abc123|ERROR|WARN"

# Inspect PR size before reading the whole diff
git diff --stat
git diff -- src/auth.ts tests/auth.test.ts

# Keep the full test output locally, but show Claude the failure neighborhood
npm test 2>&1 | tee test.log
grep -E -n -C 6 "FAIL|ERROR|Error|failed|Assertion" test.log | head -160

This pattern works because it does not hide data. It changes the first slice Claude sees. If the slice is insufficient, ask for the next file or the next section of the log.

Use Hooks Carefully

A hook can automate that filtering. It should not hide failures or silently approve risky commands. Start with ask, inspect the rewritten command, and only tighten the workflow after you have tested it locally.

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hooks": [
          {
            "type": "command",
            "command": "~/.claude/hooks/filter-test-output.sh"
          }
        ]
      }
    ]
  }
}
#!/usr/bin/env bash
set -euo pipefail

input=$(cat)
cmd=$(echo "$input" | jq -r '.tool_input.command // ""')

case "$cmd" in
  npm\ test*|pnpm\ test*|pytest*|go\ test*)
    filtered="$cmd 2>&1 | grep -E -n -C 6 '(FAIL|ERROR|Error|failed|Assertion)' | head -160"
    jq -n --arg command "$filtered" '{
      hookSpecificOutput: {
        hookEventName: "PreToolUse",
        permissionDecision: "ask",
        permissionDecisionReason: "Run test command with filtered output",
        updatedInput: { command: $command }
      }
    }'
    ;;
  *)
    echo '{}'
    ;;
esac

This example requires jq. For a production team hook, store the complete test log in a file and preserve the original exit code. The hook earns its keep only when it reduces noise without reducing evidence.

Split Verbose Work

Subagents help when the work is noisy but the main conversation only needs the conclusion. Good tasks are narrow: “read these official docs and return only changed facts,” “check these 10 localized files and return only blockers,” or “run the failing test and summarize the first actionable stack trace.”

Do not spawn subagents just because you can. Each one has its own context, memory, tools, and cost. Use them to protect the main decision context, not to parallelize vague work. The same rule applies to skills: move detailed, occasional playbooks out of CLAUDE.md and load them only when the workflow needs them.

Monitor Team Usage

For a solo developer, /usage and /context are often enough. For repeatable team workflows, OpenTelemetry turns cost, token counts, model choice, duration, and tool activity into shared data. Start with the console exporter before wiring a collector or dashboard.

export CLAUDE_CODE_ENABLE_TELEMETRY=1
export OTEL_METRICS_EXPORTER=console
export OTEL_LOGS_EXPORTER=console
export OTEL_METRIC_EXPORT_INTERVAL=10000
export OTEL_LOGS_EXPORT_INTERVAL=5000

claude

Track quality next to token usage. If tokens drop by 30% but review defects and regenerations go up, the process got worse. I usually compare usage, number of correction rounds, verification status, and post-merge fixes.

Practical Use Cases

Log Investigation

Pass the request ID, the latest errors, timestamps, expected behavior, and nearby lines. Start with a small slice, let Claude form a hypothesis, then widen only if the evidence is missing.

Code Review

Start with git diff --stat, changed files, test output, and review concerns. Split a large PR into security, performance, and compatibility passes so Claude does not repeatedly reread the whole patch.

Multilingual Publishing

Keep the canonical article decisions in the main session. Send translation, link checks, description length, and CTA checks into separate contexts. Return changed files and verification evidence, not every intermediate note.

Training Days

During a workshop, concurrent Claude Code usage is higher than normal. Give participants rules such as “read five files first,” “paste the last 160 failure lines,” and “explain why scope needs to widen.” That keeps both cost and teaching flow predictable.

Pitfalls to Avoid

  • Writing a /cost-only guide. Use /usage as the main command and mention /cost and /stats as aliases.
  • Cutting context that is actually evidence. Reproduction steps, expected output, failing commands, and decisions must survive.
  • Turning CLAUDE.md into an operations notebook. Rare workflows belong in skills or separate docs.
  • Enabling every MCP server. Use /mcp and prefer CLI tools when they give the same answer with less context.
  • Hiding failures with hooks. Keep the full log somewhere and show Claude the filtered slice.
  • Assuming subagents are automatically cheaper. They isolate noise, but they still consume their own context.

A Small Handoff Script

This dependency-free Node.js script creates a compact brief from changed files, diff size, and a test log.

#!/usr/bin/env node
import { execFileSync } from "node:child_process";
import { existsSync, readFileSync } from "node:fs";

function git(args) {
  return execFileSync("git", args, { encoding: "utf8" }).trim();
}

const testLogPath = process.argv[2];
const changedFiles = git(["diff", "--name-only"])
  .split(/\r?\n/)
  .filter(Boolean);
const diffStat = git(["diff", "--stat"]);
const testLog = testLogPath && existsSync(testLogPath)
  ? readFileSync(testLogPath, "utf8")
  : "";
const failures = testLog
  .split(/\r?\n/)
  .filter((line) => /(FAIL|ERROR|Error|failed|Assertion)/.test(line))
  .slice(0, 80);

console.log("# Claude handoff brief\n");
console.log("## Changed files");
console.log(changedFiles.length ? changedFiles.map((file) => `- ${file}`).join("\n") : "- None");
console.log("\n## Diff stat");
console.log(diffStat || "No diff");
console.log("\n## Test failures");
console.log(failures.length ? failures.map((line) => `- ${line}`).join("\n") : "- No matching failure lines");
node scripts/claude-brief.mjs test.log > claude-brief.md

Official Docs Checked

For adjacent workflow improvements, read Claude Code speed optimization, Claude Code permissions guide, and harness engineering guide.

What I Actually Tested

For this rewrite, I treated the Japanese article as the canonical source and checked the official docs before updating the localized versions. The biggest practical win was separating permanent memory, task state, and disposable logs. A compact brief with changed files, failing lines, and verification status produced better answers than pasting a full test run.

If you want reusable prompts and setup material, start with ClaudeCodeLab products. For team rollout, permissions, review policy, telemetry, and training, use Claude Code training and consultation.

#claude-code #token-optimization #cost-reduction #usage #efficiency
Free

Free PDF: Claude Code Cheatsheet

Enter your email and download the one-page Claude Code cheatsheet for commands, review habits, and safe workflows.

We handle your data with care and never send spam.

Level up your Claude Code workflow

Start with the free PDF, use Gumroad guides when you need repeatable workflows, and book consultation when rollout or revenue paths need human judgment.

Masa

About the Author

Masa

Engineer focused on practical Claude Code workflows. Runs claudecode-lab.com, a 10-language technical media site.