Reduce Technical Debt Safely with Claude Code: A Team Playbook
Inventory tech debt, score it with ICE/RICE, and repay it through small reviewed PRs using Claude Code.
Every team says it wants to reduce technical debt. Then the sprint fills with product work, the same flaky tests keep failing, any spreads through TypeScript, dependencies age, and TODO comments become permanent architecture.
Claude Code does not make automated refactoring risk-free. Its real value is more practical: it helps teams inventory debt, attach evidence, prioritize the work, and split repayment into small reviewable pull requests. That is the difference between a risky cleanup project and a repeatable operating habit.
This guide shows a beginner-friendly workflow for code smell inventory, dependency debt, flaky tests, duplicated logic, unsafe TODOs, ICE/RICE prioritization, small PR strategy, and governance. For the official baseline, pair it with Claude Code Common workflows, Memory, and Settings. For related ClaudeCodeLab material, see the testing strategies guide, CLAUDE.md best practices, and approval and sandbox guide.
Make Debt a Repayment Loop
The common failure mode is treating debt reduction as a big cleanup. The branch gets huge, reviewers cannot tell whether behavior changed, and the work is abandoned right before release.
Use a short loop instead.
flowchart LR
A["Inventory"] --> B["Collect evidence"]
B --> C["Score with ICE/RICE"]
C --> D["Split into small PRs"]
D --> E["Test and review"]
E --> F["Update the debt register"]
F --> A
The inventory should not be based on vibes. Collect file paths, line numbers, CI logs, outdated packages, any usage, large files, duplicated logic, and TODO/FIXME comments. Evidence keeps the discussion calm.
| Approach | Best for | Risk | How Claude Code helps |
|---|---|---|---|
| Large refactor project | Framework migrations with clear boundaries | Review diff becomes too large | Research and migration planning only |
| Small debt PRs | Ongoing maintenance | Impact can look invisible | Register, scoring, PR checklist |
| Dependency sprint | Security fixes and end-of-life packages | Breaking changes hide in minor APIs | Changelog summary and test plan |
| Flaky test cleanup | CI is no longer trusted | Retrying hides real bugs | Failure classification and reproduction steps |
Use Case 1: Inventory Code Smells
A code smell is not always a bug. It is a sign that future changes will be harder: giant functions, overloaded classes, duplicated validation, unexplained magic numbers, or type escapes.
Start by asking Claude Code to inspect, not edit.
claude -p "
Inspect src/ and tests/ for technical debt. Do not edit files yet.
Look for:
- Functions over 80 lines
- Files over 300 lines
- Nesting deeper than 4 levels
- Duplicated validation, date handling, or permission checks
- TypeScript any, as any, and @ts-ignore
- TODO / FIXME / HACK comments
- Branches with no tests or tests that only check rendering
Return a Markdown table I can paste into docs/tech-debt/register.md.
Columns: ID, File, Line, Debt type, Evidence, Risk, Suggested first PR, Confidence.
"
The phrase “Do not edit files yet” matters. If you ask for fixes immediately, speculative edits creep in. Let Claude Code investigate, classify, and propose. Let the team choose the first repayment item.
Use Case 2: Find Dependency Debt
Outdated packages, abandoned libraries, security vulnerabilities, and duplicated libraries are debt too. npm audit count alone is a poor prioritization signal: a noisy advisory can distract from a core package approaching end of life.
claude -p "
Using package.json, the lockfile, npm outdated, and npm audit output, classify dependency debt.
Categories:
1. Security fix required
2. Major update required with likely breaking changes
3. Unmaintained or replacement recommended
4. Duplicate libraries for the same job
5. Safe to defer
For each item, list impact area, changelog to read first, tests required, and the smallest safe PR.
Separate changes that can be automated from changes that require human review.
"
Dependency updates are not safe just because tests pass once. Date handling, auth, crypto, routing, build tools, and test runners deserve isolated branches and explicit rollback notes.
Use Case 3: Repay Flaky Tests and Duplicated Logic
Flaky tests destroy trust. Once engineers believe “rerun CI and it will pass,” the test suite stops acting as a safety net.
claude -p "
Read the last 20 CI failure logs and classify likely flaky tests.
Classify by:
- Time, timezone, or random data dependency
- Network or external API dependency
- Shared state leaking between tests
- Unstable async waiting
- Likely product bug that must not be labeled flaky
For each candidate, provide a reproduction command, minimal fix, and assertion to add.
"
For duplicated logic, keep the first PR boring. If permission checks appear in four files, first extract a pure helper and lock behavior with tests. Replace one call site per follow-up PR. Reviewers can reason about that.
Copy-Paste Scanner
Claude Code is useful for analysis, but mechanical markers should also be scriptable. Save this as scripts/debt-scan.mjs and run node scripts/debt-scan.mjs src.
import fs from "node:fs";
import path from "node:path";
const root = process.argv[2] || "src";
const maxLines = Number(process.env.MAX_LINES || 300);
const extensions = new Set([".js", ".jsx", ".ts", ".tsx", ".mjs", ".cjs"]);
const findings = [];
function walk(dir) {
if (!fs.existsSync(dir)) return;
for (const entry of fs.readdirSync(dir, { withFileTypes: true })) {
const fullPath = path.join(dir, entry.name);
if (entry.isDirectory()) {
if ([".git", "node_modules", "dist", "build", ".next", "coverage"].includes(entry.name)) continue;
walk(fullPath);
continue;
}
if (entry.isFile() && extensions.has(path.extname(entry.name))) {
scanFile(fullPath);
}
}
}
function add(file, line, type, severity, detail) {
findings.push({ file, line, type, severity, detail });
}
function scanFile(file) {
const text = fs.readFileSync(file, "utf8");
const lines = text.split(/\r?\n/);
if (lines.length > maxLines) {
add(file, 1, "large-file", 3, `${lines.length} lines`);
}
lines.forEach((line, index) => {
const lineNumber = index + 1;
if (/\b(FIXME|TODO|HACK)\b/i.test(line)) {
add(file, lineNumber, "unsafe-comment", /FIXME|HACK/i.test(line) ? 4 : 3, line.trim());
}
if (/\.(ts|tsx)$/.test(file) && /(:\s*any\b|as\s+any\b|<any>)/.test(line)) {
add(file, lineNumber, "typescript-any", 4, line.trim());
}
});
}
walk(root);
console.log("| file | line | type | severity | detail |");
console.log("| --- | ---: | --- | ---: | --- |");
for (const item of findings.sort((a, b) => b.severity - a.severity || a.file.localeCompare(b.file))) {
const detail = item.detail.replaceAll("|", "\\|");
console.log(`| ${item.file} | ${item.line} | ${item.type} | ${item.severity} | ${detail} |`);
}
if (findings.length === 0) {
console.error("No obvious debt markers found.");
}
This scanner will not understand architecture. It will not detect all duplication. It is still valuable because it gives the team a stable weekly baseline for TODOs, FIXMEs, any, and large files.
Debt Register Template
Put the findings into one register before creating issues. It makes tradeoffs visible.
# Technical Debt Register
| ID | Area | Evidence | User or team impact | ICE | RICE | Owner | Next PR | Status |
| --- | --- | --- | --- | ---: | ---: | --- | --- | --- |
| TD-001 | Auth permissions | src/auth/guard.ts duplicates role checks in 4 places | New role changes take 2 days and often miss one path | 420 | 1680 | Backend | Extract pure canAccess() with tests | Ready |
| TD-002 | Dependencies | vite is 2 major versions behind | Security patches and plugin updates are blocked | 280 | 900 | Platform | Upgrade in isolated branch and run build/test | Investigating |
## Scoring note
- ICE = Impact x Confidence x Ease
- RICE = Reach x Impact x Confidence / Effort
- Keep evidence links concrete: file path, line, CI run, or user-facing incident.
ICE is fast. RICE is better when reach and effort need to be explicit. Neither formula is truth. They are tools for making the conversation consistent.
Prompt for a Safe Refactor Plan
Once the team chooses an item, ask for a plan before edits.
claude -p "
Create a safe repayment plan for TD-001. Do not edit files yet.
Scope:
- src/auth/guard.ts
- src/auth/roles.ts
- tests/auth/guard.test.ts
Constraints:
- Do not change external API behavior
- Inspect existing tests first
- If behavior is under-tested, add characterization tests before refactoring
- Target a PR under 300 changed lines
- Include risks, rollback plan, and reviewer focus areas
Output:
1. Current behavior summary
2. Explicit non-goals
3. First PR diff plan
4. Test commands to run
5. PR review request
"
The “explicit non-goals” section prevents helpful overreach. Claude Code can improve too much at once unless you define the boundary.
Refactor PR Checklist
## Refactor PR checklist
- [ ] This PR changes structure, not product behavior.
- [ ] Existing behavior is covered by tests before the refactor.
- [ ] New helper names describe domain behavior, not implementation detail.
- [ ] Public API, response shape, permissions, and logging are unchanged or explicitly documented.
- [ ] The diff is small enough to review in one sitting.
- [ ] Rollback is simple: revert this PR without reverting unrelated work.
- [ ] The debt register is updated with status and follow-up PRs.
Use the checklist in the PR body Claude Code drafts. It turns “trust me, this is a refactor” into reviewable evidence.
Concrete Pitfalls
Trusting automation too much Types passing is not enough. Authorization, billing, date logic, async behavior, and migrations need characterization tests and human review.
Deleting every TODO
Some TODOs are release blockers. Some are harmless notes. Prioritize phrases like remove before release, bypass auth, temporary token, and FIXME.
Bundling dependency updates Ten major updates in one PR means one failure can take hours to isolate. Split build tools, UI libraries, auth libraries, and test runners.
Using scores as politics ICE/RICE should carry evidence, not opinions. Record file paths, CI runs, incidents, and effort assumptions.
Forgetting team memory
Rules such as “permission code needs approval” and “refactors stay under 300 lines” belong in CLAUDE.md or project settings. Claude Code Memory and Settings reduce repeated prompting.
Team Governance
A practical cadence is a 30-minute weekly debt review:
- Read the scanner and Claude Code inventory.
- Refresh ICE/RICE for only the top 10 items.
- Pick one repayment PR for the next sprint.
- Treat flaky tests and security dependencies as higher priority.
- Update the register with what got easier after the PR.
ClaudeCodeLab can help teams turn this into a working system: debt register templates, PR checklists, settings, and CLAUDE.md starter rules. See Claude Code training, templates, and consultation when you want the workflow adapted to your repository and review culture.
Summary
The safe way to reduce technical debt with Claude Code is not “ask it to refactor everything.” It is “collect evidence, score the work, and repay one small PR at a time.” Code smells, dependency debt, flaky tests, duplicated logic, and unsafe TODOs all fit the same loop.
After trying the workflow behind this article on Masa’s small projects, the biggest benefit was separation: urgent debt became distinct from debt worth merely recording. Splitting any cleanup and old TODO removal into small PRs reduced risk without increasing review load much. Major dependency upgrades were heavier than expected, so the safer move was to add tests and release notes before letting Claude Code draft the actual changes.
Free PDF: Claude Code Cheatsheet
Enter your email and download the one-page Claude Code cheatsheet for commands, review habits, and safe workflows.
We handle your data with care and never send spam.
Level up your Claude Code workflow
Start with the free PDF, use Gumroad guides when you need repeatable workflows, and book consultation when rollout or revenue paths need human judgment.
About the Author
Masa
Engineer focused on practical Claude Code workflows. Runs claudecode-lab.com, a 10-language technical media site.
Related Posts
Claude Code Obsidian to CLAUDE.md Workflow: Stop Re-explaining Context
Turn Obsidian working notes into concise CLAUDE.md operating notes that make Claude Code sessions easier to resume.
Claude Code Revenue CTA Routing: Send Articles to PDF, Gumroad, and Consultation
A Claude Code workflow for routing article readers to the free PDF, Gumroad products, or consultation by intent.
Claude Code Team Handoff Rules: Review Evidence, Permissions, Rollback, and Revenue Paths
A practical Claude Code handoff format for team review, proof, permission rules, rollback, free PDF, Gumroad, and consultation paths.
Related Products
50 Battle-Tested Claude Code Prompt Templates
Copy, paste, ship. 50 production-ready prompts.
Use proven prompts for code review, refactoring, testing, documentation, debugging, architecture, and incident response.
The Complete Claude Code Setup & Configuration Guide
From install to team-ready workflow.
A practical guide to installation, CLAUDE.md, hooks, MCP servers, permissions, IDE setup, and CI/CD workflows.