Claude Code Token Optimization: /usage से cost घटाएं और quality बचाएं

Claude Code token optimization सिर्फ कम पैसे खर्च करने की technique नहीं है। जब context बहुत बड़ा हो जाता है, response धीमा होता है, पुराने assumptions current task में घुसते हैं, और Claude ऐसे logs, diffs और decisions फिर से पढ़ता है जिनकी जरूरत नहीं रह गई। असली लक्ष्य है: same engineering quality, लेकिन कम irrelevant context के साथ।

June 2026 तक practical starting point /usage है। Official command reference /usage को session cost, plan usage limits और activity stats देखने का command बताता है। /cost और /stats aliases हैं, इसलिए वे पूरी तरह गलत नहीं हैं, लेकिन documentation और team rules में /usage लिखना ज्यादा साफ है। Pro, Max, Team और Enterprise plans में usage skills, subagents, plugins और MCP servers तक attribute हो सकता है।

इस guide में चार layers हैं: usage observe करें, base context घटाएं, noisy work को अलग context में भेजें, और repeatable team workflows को monitor करें। Hook का मतलब है Claude Code lifecycle event पर चलने वाला script या endpoint। Subagent एक छोटा अलग context है जो narrow task करता है। Harness वह scaffolding है जो agent work को repeatable और safe बनाता है।

flowchart LR
  A["Observe: /usage और /context"] --> B["Reduce: CLAUDE.md और scoped inputs"]
  B --> C["Split: hooks / skills / subagents"]
  C --> D["Measure: OpenTelemetry और team rules"]

/usage से शुरू करें

हर prompt के बाद /usage चलाने की जरूरत नहीं। इसे तब देखें जब session धीमा लगे, किसी बड़े tool output के बाद, handoff से पहले, या किसी workflow के complete होने पर जिसे आगे repeat करना है। /context के साथ देखने पर पता चलता है कि cost current task, stale history, memory file या tool context से आ रही है।

Billing और workflow diagnosis अलग चीजें हैं। /usage का Session cost local estimate है, token counts से निकला हुआ। API users को authoritative billing Claude Console में देखनी चाहिए। Subscription users को session dollar figure को direct bill नहीं मानना चाहिए; plan usage bars और activity attribution ज्यादा useful हैं।

# Claude Code conversation में चलाएं
/usage
/context

# Conversation लंबी है लेकिन work जारी रखना है
/compact Preserve changed files, failing tests, decisions, and unresolved questions.

# Unrelated task पर switch करते समय
/clear

/compact important state summarize करता है। /clear नया context शुरू करता है। बहुत जल्दी clear करने से decisions खो जाते हैं; कभी clear न करने से हर नई task old context का cost देती रहती है।

Always-loaded memory छोटी रखें

Tokens सिर्फ last prompt से नहीं आते। Conversation history, CLAUDE.md, auto memory, logs, tool output, MCP servers और investigation notes भी count होते हैं। Information को तीन buckets में बांटें।

Type	Best place	Example
हर session में चाहिए	Short CLAUDE.md	Build, tests, hard constraints
सिर्फ इस task में चाहिए	Conversation और `/compact`	Changed files, failing tests, open decisions
देखकर छोड़ देना है	Filtered command output	Long logs, generated diffs, broad search

Official memory docs बताती हैं कि CLAUDE.md context के रूप में load होता है। File लंबी होगी तो हर session का fixed cost बढ़ेगा। Imports organization में मदद कर सकते हैं, लेकिन अगर content startup पर load होता है तो token saving नहीं होती।

# CLAUDE.md

## Project commands
- Build: npm run build
- Test: npm run test
- Type check: npm run typecheck

## Fast navigation
- API code: src/api/
- UI components: src/components/
- Tests: tests/

## Avoid by default
- Do not scan node_modules/, dist/, coverage/, or generated clients.
- Do not paste full logs. Ask for the failing command and relevant lines.

## Compact instructions
When compacting, preserve changed files, failing tests, decisions, credentials policy, and next actions.

Long PR review checklist, translation rules, migration runbook या release playbook को skill या separate document में रखना बेहतर है। Skill जरूरत पड़ने पर load होती है; CLAUDE.md session start से context में रहता है।

Claude को पढ़ाने से पहले input filter करें

सबसे common waste है पूरा log या पूरा diff paste करना। Claude को evidence चाहिए, dump नहीं। Full artifact disk पर रखें और Claude को पहले वही lines दें जो next decision बदल सकती हैं।

# पूरे production log की जगह request ID और nearby errors दें
tail -n 800 logs/app.log | grep -E -n -C 4 "request_id=abc123|ERROR|WARN"

# पूरा diff पढ़ाने से पहले PR size देखें
git diff --stat
git diff -- src/auth.ts tests/auth.test.ts

# Full test output save करें, Claude को failure area दिखाएं
npm test 2>&1 | tee test.log
grep -E -n -C 6 "FAIL|ERROR|Error|failed|Assertion" test.log | head -160

यह information छिपाना नहीं है। यह first slice तय करना है। अगर slice काफी नहीं है, अगला file या extra log lines दें।

Hooks सावधानी से use करें

अगर same filtering बार-बार करनी पड़ती है, तो hook बना सकते हैं। Hook failures छिपाने के लिए नहीं है और risky command को silently approve नहीं करना चाहिए। पहले ask use करें, rewritten command देखें, फिर local testing के बाद team में share करें।

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hooks": [
          {
            "type": "command",
            "command": "~/.claude/hooks/filter-test-output.sh"
          }
        ]
      }
    ]
  }
}

#!/usr/bin/env bash
set -euo pipefail

input=$(cat)
cmd=$(echo "$input" | jq -r '.tool_input.command // ""')

case "$cmd" in
  npm\ test*|pnpm\ test*|pytest*|go\ test*)
    filtered="$cmd 2>&1 | grep -E -n -C 6 '(FAIL|ERROR|Error|failed|Assertion)' | head -160"
    jq -n --arg command "$filtered" '{
      hookSpecificOutput: {
        hookEventName: "PreToolUse",
        permissionDecision: "ask",
        permissionDecisionReason: "Run test command with filtered output",
        updatedInput: { command: $command }
      }
    }'
    ;;
  *)
    echo '{}'
    ;;
esac

इस example को jq चाहिए। Team version में full log file में save करें और original exit code preserve करें। Hook तभी useful है जब noise घटे लेकिन evidence न घटे।

Verbose work अलग करें

Subagents तब useful हैं जब process लंबा है लेकिन main session को short conclusion चाहिए। Example: official docs पढ़कर सिर्फ changed facts return करना, 10 localized files check करके सिर्फ blockers return करना, या failing test run करके पहला actionable stack trace summarize करना।

Subagent automatically cheap नहीं होता। हर subagent का अपना context, memory, tools और cost होता है। Use case है main decision context को clean रखना, vague work को blindly parallel करना नहीं। Skills के लिए भी यही rule है: rare long playbooks को CLAUDE.md से बाहर रखें और जरूरत पर load करें।

Team usage monitor करें

Single developer के लिए /usage और /context अक्सर काफी हैं। Team workflows में OpenTelemetry cost, token counts, model, duration और tool activity को shared data बनाता है। पहले console exporter से शुरू करें, बाद में collector या dashboard जोड़ें।

export CLAUDE_CODE_ENABLE_TELEMETRY=1
export OTEL_METRICS_EXPORTER=console
export OTEL_LOGS_EXPORTER=console
export OTEL_METRIC_EXPORT_INTERVAL=10000
export OTEL_LOGS_EXPORT_INTERVAL=5000

claude

सिर्फ tokens न देखें। Usage, correction rounds, verification pass और post-merge fixes को साथ देखें। Tokens कम हुए लेकिन review defects बढ़े, तो optimization fail है।

Practical use cases

Log investigation

Request ID, recent errors, timestamp, expected behavior और nearby lines दें। Small slice से hypothesis बनवाएं और evidence कम हो तो scope बढ़ाएं।

Code review

पहले git diff --stat, changed files, test output और review focus दें। बड़े PR को security, performance और compatibility passes में बांटें।

Multilingual publishing

Canonical article decisions main session में रखें। Translation naturalness, links, description length और CTA checks अलग contexts में भेजें। Return में changed files और verification evidence काफी है।

Training day

Workshop में concurrent usage ज्यादा होता है। Rules दें: पहले five files, failure log max 160 lines, scope बढ़ाने से पहले reason। इससे cost और class flow stable रहते हैं।

बचने लायक pitfalls

सिर्फ /cost document करना। /usage main command रखें, /cost और /stats को aliases बताएं।
Evidence काट देना। Reproduction, expected output, failing command और decisions बचाने चाहिए।
CLAUDE.md को operations notebook बनाना। Rare workflows skills में रखें।
बहुत MCP enable करना। /mcp से review करें और simple काम में CLI use करें।
Hook से failures छिपाना। Full log save रखें और Claude को filtered slice दें।
Subagents हमेशा cheaper हैं मान लेना। वे noise isolate करते हैं, लेकिन अपना context consume करते हैं।

छोटा handoff script

यह dependency-free Node.js script changed files, diff size और test failure lines से compact brief बनाता है।

#!/usr/bin/env node
import { execFileSync } from "node:child_process";
import { existsSync, readFileSync } from "node:fs";

function git(args) {
  return execFileSync("git", args, { encoding: "utf8" }).trim();
}

const testLogPath = process.argv[2];
const changedFiles = git(["diff", "--name-only"])
  .split(/\r?\n/)
  .filter(Boolean);
const diffStat = git(["diff", "--stat"]);
const testLog = testLogPath && existsSync(testLogPath)
  ? readFileSync(testLogPath, "utf8")
  : "";
const failures = testLog
  .split(/\r?\n/)
  .filter((line) => /(FAIL|ERROR|Error|failed|Assertion)/.test(line))
  .slice(0, 80);

console.log("# Claude handoff brief\n");
console.log("## Changed files");
console.log(changedFiles.length ? changedFiles.map((file) => `- ${file}`).join("\n") : "- None");
console.log("\n## Diff stat");
console.log(diffStat || "No diff");
console.log("\n## Test failures");
console.log(failures.length ? failures.map((line) => `- ${line}`).join("\n") : "- No matching failure lines");

node scripts/claude-brief.mjs test.log > claude-brief.md

Official docs checked

Commands: /usage, /context, /compact, /clear.
Manage costs effectively: costs, MCP, hooks, skills, subagents, model effort.
How Claude remembers your project: CLAUDE.md और auto memory.
Hooks reference और Monitoring: PreToolUse और OpenTelemetry.

Related reading: speed optimization, permissions guide, और harness engineering guide।

असल में क्या test किया

इस rewrite में Japanese article को canonical रखा, official docs verify किए, और फिर दसों locales update किए। सबसे बड़ा practical improvement था permanent memory, task state और disposable logs को अलग रखना। Changed files, failure lines और verification status वाला छोटा brief full test output paste करने से बेहतर जवाब देता है।

Reusable prompts और setup material के लिए ClaudeCodeLab products देखें। Team rollout, permissions, review policy, telemetry और training के लिए Claude Code training and consultation practical next step है।