How to Navigate Large Codebases with Claude Code and Codex
A practical repo-mapping workflow with rg, dependency tracing, risk maps, prompts, and handoff notes.
When you open a large codebase, the goal is not to understand every file. The first goal is to build a map: where the system starts, which folders are generated, which files carry business risk, and which boundaries should not be crossed without review.
Claude Code and Codex are useful for this work, but only if you keep the investigation disciplined. If you paste whole files, long logs, and hundreds of search hits into the main conversation, the model has more text but less signal. That is context bloat: the conversation grows while the decision quality gets worse.
This guide turns the workflow I use for handoffs, audits, and ClaudeCodeLab article maintenance into copy-pasteable commands and prompt packets. For tool behavior, I stick to official docs: Claude Code common workflows, CLI reference, memory, Codex non-interactive mode, and Codex AGENTS.md.
Start With A Repo Map
Before asking an agent to explain the repository, reduce the outer shape yourself. rg means ripgrep. It is fast, respects common ignore rules, and lets you exclude generated folders before they pollute the investigation.
git status --short
git rev-parse --show-toplevel
git branch --show-current
rg --files \
-g '!*node_modules*' \
-g '!dist' \
-g '!build' \
-g '!coverage' \
-g '!*.lock' \
| sed 's#^[^/]*/##' \
| sort \
| uniq -c \
| sort -nr \
| head -40
find . -maxdepth 3 \( \
-name package.json -o \
-name pnpm-workspace.yaml -o \
-name pyproject.toml -o \
-name go.mod -o \
-name Cargo.toml -o \
-name Dockerfile -o \
-name docker-compose.yml -o \
-name AGENTS.md -o \
-name CLAUDE.md \
\) -print
After that, ask for a read-only map. Claude Code’s official workflow recommends broad questions first and then narrowing into specific areas. Codex documents that codex exec runs in a read-only sandbox by default, which makes it a good fit for first-pass repository summaries.
codex exec "Read only. Summarize the repository map: apps, packages, entrypoints, test commands, generated folders, and files that define project rules. Do not edit files."
For Claude Code, use -p when you want a non-interactive answer. The CLI reference describes -p as print mode.
claude -p "
Read only. Build a repository map for this codebase.
Return the answer in this order:
1. Apps, packages, and services
2. Runtime entrypoints, test entrypoints, and build entrypoints
3. Generated folders and folders to ignore
4. Key points from AGENTS.md, CLAUDE.md, README, and design notes
5. The next 10 files to read, with one reason each
Suggest edits or commands only; do not perform them.
"
Use Search In Layers
The common mistake is treating a text match as understanding. Search is only a candidate generator. Use five layers: structure, domain vocabulary, references, configuration, and history.
# 1. Entrypoint candidates
rg -n "createServer|listen\(|app\.use|router\.|main\(|bootstrap|hydrateRoot|createRoot" \
src apps packages server web
# 2. Domain vocabulary. Change these terms for your product.
rg -n "Auth|Billing|Invoice|Notification|Search|FeatureFlag" \
src apps packages test tests
# 3. Callers and references for the area you may modify
rg -n "AuthService|useAuth|requireAuth|authMiddleware" \
src apps packages test tests
# 4. Configuration and environment variables
rg -n "process\.env|import\.meta\.env|PUBLIC_|DATABASE_URL|JWT|STRIPE|OPENAI|ANTHROPIC" \
. -g '!node_modules' -g '!dist' -g '!build'
# 5. History: why this design exists
git log --oneline --decorate --date=short --max-count=30 -- src/auth packages/auth
Do not paste 100 matches into the agent. First check the shape, then pass a small candidate set and ask for classification.
You are reviewing search results for codebase navigation.
Classify the following rg hits into implementation entrypoint, caller, configuration, test, and noise.
Limit each class to the top 5 files to read next.
For every chosen file, give one concrete reason.
If the result is uncertain, write "needs another search" instead of guessing.
Sketch A Dependency Graph
Before adding a full dependency analyzer, a tiny Node script can expose enough import information to guide the conversation. This is not a complete TypeScript compiler. It only follows local relative imports, but that is often enough to spot direct blast radius.
#!/usr/bin/env node
import { execFileSync } from "node:child_process";
import { readFileSync } from "node:fs";
import path from "node:path";
const target = process.argv[2]?.replace(/\\/g, "/");
if (!target) {
console.error("Usage: node scripts/dependency-map.mjs src/path/to/file.ts");
process.exit(1);
}
const tracked = execFileSync("git", ["ls-files"], { encoding: "utf8" })
.split(/\r?\n/)
.filter(Boolean)
.map((file) => file.replace(/\\/g, "/"));
const trackedSet = new Set(tracked);
const sourceFiles = tracked.filter((file) => /\.(mjs|cjs|js|jsx|ts|tsx)$/.test(file));
const importPattern =
/(?:from\s+["']([^"']+)["']|import\s*\(\s*["']([^"']+)["']\s*\)|require\s*\(\s*["']([^"']+)["']\s*\))/g;
function resolveLocalImport(specifier, fromFile) {
if (!specifier.startsWith(".")) return null;
const base = path.normalize(path.join(path.dirname(fromFile), specifier)).replace(/\\/g, "/");
const candidates = [
base,
`${base}.ts`,
`${base}.tsx`,
`${base}.js`,
`${base}.jsx`,
`${base}/index.ts`,
`${base}/index.tsx`,
`${base}/index.js`,
];
return candidates.find((candidate) => trackedSet.has(candidate)) ?? base;
}
const incoming = [];
for (const file of sourceFiles) {
const source = readFileSync(file, "utf8");
for (const match of source.matchAll(importPattern)) {
const resolved = resolveLocalImport(match[1] || match[2] || match[3], file);
if (resolved && (resolved === target || resolved.endsWith(`/${path.basename(target)}`))) {
incoming.push(file);
}
}
}
console.log(`Target: ${target}`);
console.log("Direct importers:");
for (const file of incoming.sort()) console.log(`- ${file}`);
Save and run it like this.
mkdir -p scripts
$EDITOR scripts/dependency-map.mjs
node scripts/dependency-map.mjs src/services/AuthService.ts
Then turn the output into a risk map instead of a vague summary.
Given these direct importers, assess the blast radius of changing AuthService.ts.
Use high / medium / low.
High means authentication, billing, authorization, persistence, or public API behavior.
For each class, list tests and config files that should be read before editing.
Trace From Entrypoints
A file list does not explain runtime behavior. For a backend, trace request, middleware, route, controller, service, repository, database. For a frontend, trace route, loader, state, API client, component, analytics.
rg -n "middleware|loader|action|controller|handler|route|repository|service" \
src apps packages \
-g '*.ts' -g '*.tsx' -g '*.js' -g '*.jsx'
rg -n "fetch\(|axios|graphql|trpc|prisma|drizzle|sequelize|typeorm" \
src apps packages \
-g '*.ts' -g '*.tsx' -g '*.js' -g '*.jsx'
Ask the agent for a flow table, not a paragraph.
| Lens | Strong Output | Weak Output |
|---|---|---|
| Entry | POST /login -> auth route -> AuthService.login | ”Auth is important” |
| State | Cookie, session, DB, cache, or queue mutation | No state changes listed |
| Failure | Error response, logging, audit event, retry behavior | Happy path only |
| Tests | Existing test file and missing cases | Generic “add tests” |
Trace the login flow from entrypoint to persistence.
Return a table with columns: order, file, function/class, state change, failure behavior, tests to read.
Do not fill gaps with guesses. If a step is unknown, name the next file to inspect.
Map Ownership And Risk
In a large repository, “will it run?” is not the first question. The first question is “whose boundary is this?” Ownership boundaries can be team boundaries, data boundaries, release boundaries, or compliance boundaries.
find . -maxdepth 4 \( \
-name CODEOWNERS -o \
-name OWNERS -o \
-name README.md -o \
-name AGENTS.md -o \
-name CLAUDE.md \
\) -print
rg -n "owner|maintainer|deprecated|legacy|do not edit|generated|migration|rollback|release" \
. -g '!node_modules' -g '!dist' -g '!build'
Codex officially reads AGENTS.md, so it is worth putting repository rules there. Claude Code can use CLAUDE.md and memory to shorten later sessions. For a deeper template, see CLAUDE.md Best Practices.
## Codebase map
### Entry points
- Web: apps/web/src/main.tsx
- API: services/api/src/server.ts
- Jobs: services/jobs/src/index.ts
### Ownership boundaries
- services/payments: owned by payments team; never change schema without migration review.
- packages/ui: shared design system; visual regression test required.
- legacy/: read-only unless the issue is production severity.
### High-risk files
- services/api/src/auth/AuthService.ts: login, session rotation, audit log.
- packages/db/schema.ts: migrations affect API and jobs.
- apps/web/src/routes/checkout.tsx: revenue path and analytics.
### Handoff notes
- Always start with rg search, then read top files.
- Prefer small diffs with tests.
- Do not paste large logs into the main conversation; summarize first.
Use Read-Only Explorer Prompts
For first-pass exploration, keep the agent consultative. You want a map, hypotheses, and unknowns before the tool starts changing files.
Investigate in read-only mode. Do not edit files, add dependencies, format code, or run tests.
Goal:
Understand the implementation scope and change risk for [feature name].
Return:
1. Entrypoint files
2. Core data models
3. Direct and indirect dependencies
4. Ownership boundaries
5. Risk map: high / medium / low
6. Files to inspect next
7. Hypotheses that are not confirmed yet
Rules:
Label guesses as hypotheses.
Include file names and evidence.
Do not quote large file bodies.
Claude Code subagents are useful when exploration would fill the main conversation with file reads. The official docs describe subagents as working in their own context and returning only a summary. Codex subagents are also documented for parallel codebase exploration. Use them for bounded tasks because they still spend tokens. See Subagent Patterns for practical patterns.
Run three read-only subagents:
1. API entrypoints and auth boundaries
2. DB schema and migration boundaries
3. UI routes and revenue path
Each subagent may read up to 8 files and must return only final findings.
After they finish, the parent agent should merge duplicates, conflicts, and unknowns.
Avoid Context Bloat
The simplest rule is to separate evidence from decisions. Full files, long logs, and 100 search hits are too heavy as evidence. Filter first, then ask the agent to decide.
OpenAI’s compaction is an API feature, but the operating principle applies to human-agent work: keep the necessary summary and drop stale detail.
Compress the investigation so the next worker can continue in under 300 words.
Keep:
- Confirmed entrypoints
- High-risk files
- Boundaries that must not be crossed
- Unconfirmed hypotheses
- Next command to run
Drop:
- Hypotheses already disproven
- Long logs
- Generated files that do not need reading
Handoff notes matter for development and content operations. For multilingual articles, note which locale files were owned. For app work, note which entrypoint was traced. For a bug, note which hypotheses failed. Pair this with Claude Code Testing Strategies and the approval and sandbox guide.
Three Concrete Use Cases
First, a SaaS onboarding handoff. On day one, map login, billing, admin, and notifications. Find importers for AuthService and BillingService, then trace checkout from route to database. Do not edit yet. The output should be a risk map and a reading order.
Second, a legacy migration. Search for legacy, deprecated, and migration before asking for replacement code. Ask the agent to list compatibility risks. Database migrations, public APIs, batch jobs, and cron tasks deserve special caution because rollback is expensive.
Third, a content and monetization site like ClaudeCodeLab. Articles, CTA components, internal links, product pages, and translations live in different folders. Work by slug. Tell the agent to own only one slug and use other articles only for link checks. That makes parallel editing safer.
Failure Modes To Watch
- Trusting the README alone. Real entrypoints often drift away from old docs.
- Treating search hits as execution flow. A hit is a candidate, not proof.
- Letting the agent edit before ownership is clear. Review can fail even when code runs.
- Feeding generated folders, lockfiles, coverage, or
distinto the conversation. - Asking subagents to investigate broad areas without file limits.
- Saying “fix safely” without defining high / medium / low risk.
CTA For Teams
Large-codebase navigation affects revenue work too. Checkout pages, lead forms, paid content, analytics, and ads all sit behind ownership and risk boundaries. A repo map before editing reduces rework.
If you are adopting Claude Code or Codex with a team, start with a repo-map template, AGENTS.md/CLAUDE.md guidance, review rules, and read-only explorer prompts. Individual developers can start with the free cheat sheet. Teams that want training around real repositories can use Claude Code training and consulting.
Hands-On Verification Note
When I use this workflow, later implementation prompts become more stable than when I start by pasting large files into the agent. The most useful artifacts are the repo map, classified rg results, a small dependency table, and the high / medium / low risk map. When I skip those, generated folders, old tests, and other teams’ modules enter the conversation and the context gets noisy. Before publishing this guide, I reviewed the commands, prompt packets, and Node script for copy-paste shape and basic JavaScript syntax.
Free PDF: Claude Code Cheatsheet
Enter your email and download the one-page Claude Code cheatsheet for commands, review habits, and safe workflows.
We handle your data with care and never send spam.
Level up your Claude Code workflow
Start with the free PDF, use Gumroad guides when you need repeatable workflows, and book consultation when rollout or revenue paths need human judgment.
About the Author
Masa
Engineer focused on practical Claude Code workflows. Runs claudecode-lab.com, a 10-language technical media site.
Related Posts
Claude Code Permission Safety Ladder: Expand Access Without Losing Control
A beginner-friendly ladder for moving Claude Code from read-only to limited edits, proof commands, and deploy checks.
Claude Code Small PR Proof Pack: Make Tiny Changes Reviewable
A practical proof pack for Claude Code PRs: diff, checks, public URL, CTA path, and rollback note.
Claude Code Review Gate Before Commit: Diff, Tests, Public URL, and CTA Checks
A commit-time review gate for Claude Code work: diff scope, build, public URL, revenue CTA links, missing tests, and unrelated files.
Related Products
Claude Code Quick Reference Cheatsheet
A free one-page reference for daily Claude Code work.
Keep the essential commands, file-reference patterns, CLAUDE.md reminders, prompting habits, review cues, and debugging workflow notes next to your editor.
50 Battle-Tested Claude Code Prompt Templates
Copy, paste, ship. 50 production-ready prompts.
Use proven prompts for code review, refactoring, testing, documentation, debugging, architecture, and incident response.