Claude Code और Codex से बड़े codebase को सुरक्षित तरीके से पढ़ें

जब आप किसी बड़े codebase में पहली बार आते हैं, तो लक्ष्य हर फाइल पढ़ना नहीं होता। पहला लक्ष्य एक map बनाना है: सिस्टम कहाँ से शुरू होता है, कौन से folders generated हैं, कौन सी files business risk रखती हैं, और किन ownership boundaries को review के बिना पार नहीं करना चाहिए।

Claude Code और Codex इस काम में मदद कर सकते हैं, लेकिन तभी जब investigation controlled हो। अगर आप पूरी files, लंबे logs और सैकड़ों search results main conversation में डाल देंगे, तो text तो बढ़ेगा, पर signal घटेगा। इसे context bloat कह सकते हैं: conversation बड़ी होती जाती है, लेकिन decision लेने लायक जानकारी कम घनी हो जाती है।

यह guide वही workflow देती है जिसे मैं project handoff, legacy audit और ClaudeCodeLab के multilingual article maintenance में use करता हूँ। Tool behavior के लिए official docs देखें: Claude Code common workflows, CLI reference, memory, Codex non-interactive mode, और Codex AGENTS.md।

पहले repo map बनाएं

Agent से “पूरा repo explain करो” कहने से पहले, repository की outer shape खुद कम करें। rg ripgrep है। यह fast है और generated folders को exclude करना आसान बनाता है।

git status --short
git rev-parse --show-toplevel
git branch --show-current

rg --files \
  -g '!*node_modules*' \
  -g '!dist' \
  -g '!build' \
  -g '!coverage' \
  -g '!*.lock' \
  | sed 's#^[^/]*/##' \
  | sort \
  | uniq -c \
  | sort -nr \
  | head -40

find . -maxdepth 3 \( \
  -name package.json -o \
  -name pnpm-workspace.yaml -o \
  -name pyproject.toml -o \
  -name go.mod -o \
  -name Cargo.toml -o \
  -name Dockerfile -o \
  -name docker-compose.yml -o \
  -name AGENTS.md -o \
  -name CLAUDE.md \
\) -print

अब read-only map मांगें। Claude Code official workflow broad question से शुरू करके specific components तक जाने की सलाह देता है। Codex docs के अनुसार codex exec default रूप से read-only sandbox में चलता है, इसलिए first-pass summary के लिए अच्छा है।

codex exec "Read only. Summarize the repository map: apps, packages, entrypoints, test commands, generated folders, and files that define project rules. Do not edit files."

Claude Code में non-interactive answer के लिए -p use करें। CLI reference इसे print mode कहती है।

claude -p "
Read only. इस codebase का repository map बनाइए।
Output इस क्रम में दें:
1. Apps, packages और services
2. Runtime entrypoints, test entrypoints और build entrypoints
3. Generated folders और ignore करने योग्य folders
4. AGENTS.md, CLAUDE.md, README और design notes के key points
5. अगली पढ़ने योग्य 10 files, हर file का एक reason
केवल edits या commands suggest करें; उन्हें run न करें।
"

Search को layers में करें

Text match का मतलब understanding नहीं है। Search केवल candidate बनाती है। पाँच layers रखें: structure, domain vocabulary, references, configuration और history।

# 1. Entrypoint candidates
rg -n "createServer|listen\(|app\.use|router\.|main\(|bootstrap|hydrateRoot|createRoot" \
  src apps packages server web

# 2. Domain vocabulary. अपने product के हिसाब से बदलें।
rg -n "Auth|Billing|Invoice|Notification|Search|FeatureFlag" \
  src apps packages test tests

# 3. जिस area में change हो सकता है, उसके callers
rg -n "AuthService|useAuth|requireAuth|authMiddleware" \
  src apps packages test tests

# 4. Configuration और environment variables
rg -n "process\.env|import\.meta\.env|PUBLIC_|DATABASE_URL|JWT|STRIPE|OPENAI|ANTHROPIC" \
  . -g '!node_modules' -g '!dist' -g '!build'

# 5. History: यह design क्यों है
git log --oneline --decorate --date=short --max-count=30 -- src/auth packages/auth

सारे results agent को न दें। पहले shape देखें, फिर छोटा candidate set देकर classification कराएं।

आप codebase navigation reviewer हैं।
इन rg hits को implementation entrypoint, caller, configuration, test और noise में classify करें।
हर class में next read के लिए max 5 files रखें।
हर चुनी हुई file का एक concrete reason दें।
Uncertain हो तो guess न करें; "needs another search" लिखें।

छोटा dependency graph बनाएं

Full dependency analyzer से पहले छोटा Node script काफी signal दे सकता है। यह TypeScript compiler नहीं है; यह सिर्फ local relative imports follow करता है। फिर भी direct blast radius देखने के लिए उपयोगी है।

#!/usr/bin/env node
import { execFileSync } from "node:child_process";
import { readFileSync } from "node:fs";
import path from "node:path";

const target = process.argv[2]?.replace(/\\/g, "/");
if (!target) {
  console.error("Usage: node scripts/dependency-map.mjs src/path/to/file.ts");
  process.exit(1);
}

const tracked = execFileSync("git", ["ls-files"], { encoding: "utf8" })
  .split(/\r?\n/)
  .filter(Boolean)
  .map((file) => file.replace(/\\/g, "/"));

const trackedSet = new Set(tracked);
const sourceFiles = tracked.filter((file) => /\.(mjs|cjs|js|jsx|ts|tsx)$/.test(file));
const importPattern =
  /(?:from\s+["']([^"']+)["']|import\s*\(\s*["']([^"']+)["']\s*\)|require\s*\(\s*["']([^"']+)["']\s*\))/g;

function resolveLocalImport(specifier, fromFile) {
  if (!specifier.startsWith(".")) return null;
  const base = path.normalize(path.join(path.dirname(fromFile), specifier)).replace(/\\/g, "/");
  const candidates = [
    base,
    `${base}.ts`,
    `${base}.tsx`,
    `${base}.js`,
    `${base}.jsx`,
    `${base}/index.ts`,
    `${base}/index.tsx`,
    `${base}/index.js`,
  ];
  return candidates.find((candidate) => trackedSet.has(candidate)) ?? base;
}

const incoming = [];
for (const file of sourceFiles) {
  const source = readFileSync(file, "utf8");
  for (const match of source.matchAll(importPattern)) {
    const resolved = resolveLocalImport(match[1] || match[2] || match[3], file);
    if (resolved && (resolved === target || resolved.endsWith(`/${path.basename(target)}`))) {
      incoming.push(file);
    }
  }
}

console.log(`Target: ${target}`);
console.log("Direct importers:");
for (const file of incoming.sort()) console.log(`- ${file}`);

इसे save करके run करें।

mkdir -p scripts
$EDITOR scripts/dependency-map.mjs
node scripts/dependency-map.mjs src/services/AuthService.ts

Output को risk map में बदलें।

इन direct importers के आधार पर AuthService.ts बदलने का blast radius assess करें।
Classification high / medium / low हो।
High का मतलब authentication, billing, authorization, persistence या public API behavior से संबंध।
हर class में editing से पहले पढ़ने योग्य tests और config files भी लिखें।

Entrypoints से execution flow trace करें

File list runtime behavior नहीं बताती। Backend में request, middleware, route, controller, service, repository, database trace करें। Frontend में route, loader, state, API client, component और analytics trace करें।

rg -n "middleware|loader|action|controller|handler|route|repository|service" \
  src apps packages \
  -g '*.ts' -g '*.tsx' -g '*.js' -g '*.jsx'

rg -n "fetch\(|axios|graphql|trpc|prisma|drizzle|sequelize|typeorm" \
  src apps packages \
  -g '*.ts' -g '*.tsx' -g '*.js' -g '*.jsx'

Agent से paragraph नहीं, flow table मांगें।

Lens	अच्छा output	कमजोर output
Entry	`POST /login -> auth route -> AuthService.login`	”Auth important है”
State	Cookie, session, DB, cache या queue mutation	state changes नहीं
Failure	error response, logging, audit event, retry	सिर्फ happy path
Tests	existing test और missing cases	generic “tests add करें”

login flow को entrypoint से persistence तक trace करें।
Table columns: order, file, function/class, state change, failure behavior, tests to read।
Gaps को guess से न भरें। Unknown step के लिए अगली inspect करने वाली file लिखें।

Ownership और risk map करें

बड़े repository में पहला सवाल “क्या यह चलता है?” नहीं, बल्कि “यह boundary किसकी है?” होता है। Boundary team, data, release, compliance या revenue की हो सकती है।

find . -maxdepth 4 \( \
  -name CODEOWNERS -o \
  -name OWNERS -o \
  -name README.md -o \
  -name AGENTS.md -o \
  -name CLAUDE.md \
\) -print

rg -n "owner|maintainer|deprecated|legacy|do not edit|generated|migration|rollback|release" \
  . -g '!node_modules' -g '!dist' -g '!build'

Codex official docs के अनुसार AGENTS.md पढ़ता है, इसलिए project rules वहाँ रखना उपयोगी है। Claude Code CLAUDE.md और memory से future sessions को छोटा कर सकता है। Template के लिए CLAUDE.md Best Practices देखें।

## Codebase map

### Entry points
- Web: apps/web/src/main.tsx
- API: services/api/src/server.ts
- Jobs: services/jobs/src/index.ts

### Ownership boundaries
- services/payments: owned by payments team; never change schema without migration review.
- packages/ui: shared design system; visual regression test required.
- legacy/: read-only unless the issue is production severity.

### High-risk files
- services/api/src/auth/AuthService.ts: login, session rotation, audit log.
- packages/db/schema.ts: migrations affect API and jobs.
- apps/web/src/routes/checkout.tsx: revenue path and analytics.

### Handoff notes
- Always start with rg search, then read top files.
- Prefer small diffs with tests.
- Do not paste large logs into the main conversation; summarize first.

Read-only explorer prompts इस्तेमाल करें

पहली investigation में agent को consultative रखें। Files बदलने से पहले map, hypotheses और unknowns चाहिए।

Read-only mode में investigate करें। Files edit न करें, dependencies add न करें, format न करें, tests run न करें।

Goal:
[feature name] का implementation scope और change risk समझना।

Return:
1. Entrypoint files
2. Core data models
3. Direct और indirect dependencies
4. Ownership boundaries
5. Risk map: high / medium / low
6. Files to inspect next
7. Hypotheses जो अभी confirmed नहीं हैं

Rules:
Guess को hypothesis label करें।
File names और evidence शामिल करें।
Large file bodies quote न करें।

Claude Code subagents तब उपयोगी हैं जब exploration main conversation को file reads से भर देगा। Official docs बताते हैं कि subagents अपने context में काम करके summary लौटाते हैं। Codex subagents भी parallel codebase exploration के लिए documented हैं। Scope छोटा रखें क्योंकि tokens खर्च होते हैं। Patterns के लिए Subagent Patterns देखें।

तीन read-only subagents चलाएं:
1. API entrypoints और auth boundaries
2. DB schema और migration boundaries
3. UI routes और revenue path

हर subagent अधिकतम 8 files पढ़े और केवल final findings लौटाए।
अंत में parent agent duplicates, conflicts और unknowns merge करे।

Context bloat से बचें

Simple rule: evidence और decisions अलग रखें। Full files, long logs और 100 search hits बहुत heavy evidence हैं। पहले filter करें, फिर agent से decision कराएं।

OpenAI की compaction API feature है, लेकिन principle same है: जरूरी summary रखें, stale detail हटाएं।

Investigation को 300 words से कम में compress करें ताकि अगला worker continue कर सके।
Keep:
- Confirmed entrypoints
- High-risk files
- Boundaries जिन्हें cross नहीं करना
- Unconfirmed hypotheses
- Next command to run

Drop:
- Already disproven hypotheses
- Long logs
- Generated files जिन्हें पढ़ना जरूरी नहीं

Handoff notes development और content operations दोनों में जरूरी हैं। Multilingual articles में लिखें कि कौन से locale files आपकी responsibility थे। App work में लिखें कि कौन सा entrypoint trace किया। Bug में लिखें कि कौन सी hypotheses fail हुईं। Testing के लिए Claude Code Testing Strategies और permissions के लिए approval and sandbox guide देखें।

तीन concrete use cases

पहला use case SaaS handoff है। पहले दिन login, billing, admin और notifications map करें। AuthService और BillingService के importers खोजें, फिर checkout को route से database तक trace करें। अभी edit न करें। Output risk map और reading order होना चाहिए।

दूसरा legacy migration है। Replacement code मांगने से पहले legacy, deprecated, migration search करें। Agent से compatibility risks पूछें। DB migrations, public APIs, batch jobs और cron खास risk रखते हैं क्योंकि rollback महंगा होता है।

तीसरा ClaudeCodeLab जैसा content और monetization site है। Articles, CTA components, internal links, product pages और translations अलग folders में रहते हैं। Slug के आधार पर ownership रखें। Agent को कहें कि केवल एक slug own करे और बाकी articles सिर्फ link check के लिए पढ़े।

आम failure modes

सिर्फ README पर भरोसा करना। Real entrypoints पुरानी docs से अलग हो सकते हैं।
Search hits को execution flow समझना। Hit candidate है, proof नहीं।
Ownership clear होने से पहले agent से edit कराना।
Generated folders, lockfiles, coverage या dist conversation में डालना।
Subagents को बहुत broad scope देना।
“safe fix” कहना पर high / medium / low risk define न करना।

Teams के लिए CTA

Large codebase navigation revenue work को भी प्रभावित करता है। Checkout, lead forms, paid content, analytics और ads सभी risk boundaries के पीछे होते हैं। Editing से पहले repo map बनाने से rework घटता है।

अगर team Claude Code या Codex adopt कर रही है, तो repo-map template, AGENTS.md/CLAUDE.md rules, review criteria और read-only explorer prompts से शुरू करें। Individual developers free cheat sheet से शुरू कर सकते हैं। Real repositories पर training चाहिए तो Claude Code training and consulting देखें।

Hands-on verification note

इस workflow से बाद के implementation prompts ज्यादा stable रहे, खासकर large files सीधे paste करने की तुलना में। सबसे useful artifacts हैं repo map, classified rg results, छोटी dependency table और high / medium / low risk map। जब ये steps skip होते हैं, generated folders, पुराने tests और दूसरे teams के modules conversation में आ जाते हैं। Publish करने से पहले मैंने commands, prompt packets और Node script को copy-paste shape और basic JavaScript structure के लिए review किया।