Claude Code Refactoring Automation: A Safe Practical Workflow

Start by defining the safety boundary

The safest way to automate refactoring with Claude Code is not to ask it to “clean up the codebase.” That sounds productive, but it usually creates a diff that is too large to review. Refactoring means changing the internal structure of code without changing the behavior that users, APIs, tests, and downstream systems depend on. If that boundary is vague, automation can accidentally become feature work.

This guide treats Claude Code as a practical refactoring partner: it investigates, proposes a small plan, edits one narrow slice, runs verification commands, and explains the git diff. The official Claude Code common workflows are a useful reference for this style of work. For command permissions and project-level configuration, also review the Claude Code settings documentation.

If you are still setting up safe project rules, read the related guides on Claude Code permissions and Claude Code context management. This article focuses on the day-to-day workflow: what to ask first, which changes are safe for beginners, how to test them, and how to review the result.

Masa’s practical note: in small trials, Claude Code performed well on renaming, extracting pure functions, clarifying TypeScript types, and adding regression tests. It became much harder to review when the prompt was broad, such as “modernize this service” or “make this cleaner.” The boring workflow is the productive one: narrow scope, visible tests, small diff.

The safe workflow: inspect, plan, edit one diff, verify

Use this order until the team has built trust in the workflow.

Step	What Claude Code does	What the human checks
1. Inspect	Reads target files, dependencies, and test coverage	Scope is not too broad
2. Plan	Proposes a small plan in three steps or fewer	No behavior change is hidden in the plan
3. Edit	Changes one theme only	The diff is small enough to review
4. Verify	Runs tests, typecheck, and lint	Failures are explained clearly
5. Review	Summarizes git diff and risk	Before/after behavior is equivalent

Start with an inspection-only prompt.

Inspect this repository for safe refactoring candidates.
Do not edit files yet.

Constraints:
- Do not change external behavior
- Keep one diff to three files or fewer
- Prefer areas already covered by tests
- Return a table with candidate, reason, verification command, and risk

The phrase “Do not edit files yet” matters. Claude Code can move from reading to editing quickly when the request sounds actionable. Splitting investigation from implementation dramatically lowers the chance of an accidental wide rewrite.

Create a branch and capture the baseline before editing.

git status --short
git checkout -b refactor/safe-extract-order-total
npm test
npm run typecheck
npm run lint

Your project may use different command names. Check package.json and map these to the local equivalents. If tests fail before any refactoring, record that fact first. Otherwise you will not know whether Claude Code caused the failure or only exposed an existing one.

Use case 1: Rename and extract a small pure function

The safest first exercise is naming plus pure-function extraction. A pure function returns the same output for the same input and does not update a database, send email, call an API, or mutate global state. Claude Code is strong here because the success condition is easy to test.

// before: src/domain/order.ts
export function calc(o: { items: { p: number; q: number }[]; d?: number }) {
  let t = 0;
  for (const i of o.items) {
    t += i.p * i.q;
  }
  if (o.d) {
    t = t - o.d;
  }
  return Math.max(t, 0);
}

This function is short, but the names hide the business meaning. Ask Claude Code to preserve behavior, add tests first, and improve the names.

Refactor the calc function in src/domain/order.ts safely.

Requirements:
- Add unit tests that lock the current behavior before changing the implementation
- Keep the exported name calc for this diff
- Improve variable and type names
- Preserve the rule that the total never becomes negative
- Run npm test -- order after the change

A good after-state looks like this.

// after: src/domain/order.ts
type OrderLine = {
  price: number;
  quantity: number;
};

type OrderInput = {
  items: OrderLine[];
  discount?: number;
};

export function calc(order: OrderInput): number {
  const subtotal = order.items.reduce(
    (sum, item) => sum + item.price * item.quantity,
    0
  );

  return Math.max(subtotal - (order.discount ?? 0), 0);
}

The regression tests should be simple and copy-pasteable.

// src/domain/order.test.ts
import { describe, expect, it } from "vitest";
import { calc } from "./order";

describe("calc", () => {
  it("multiplies price and quantity", () => {
    expect(calc({ items: [{ price: 1200, quantity: 2 }] })).toBe(2400);
  });

  it("applies discount without returning a negative total", () => {
    expect(calc({ items: [{ price: 500, quantity: 1 }], discount: 800 })).toBe(0);
  });
});

Then review only the files that changed.

git diff -- src/domain/order.ts src/domain/order.test.ts
npm test -- order
npm run typecheck

The review question is not “does the code look clever?” It is “did the same inputs keep the same business meaning?” For this example, check the calculation, the exported function name, and the test descriptions.

Use case 2: Remove `any` by typing the boundary first

Removing any is valuable, but doing it across the whole project at once is a common mistake. Start at boundaries: API responses, form payloads, configuration files, webhooks, and imported CSV rows. These are the places where unknown data enters the system.

// before: src/lib/user-api.ts
export async function fetchUser(id: string): Promise<any> {
  const response = await fetch(`/api/users/${id}`);
  return response.json();
}

export function getDisplayName(user: any): string {
  return user.profile.displayName || user.name;
}

Give Claude Code a narrow target and include the missing-data behavior.

Reduce any usage in src/lib/user-api.ts.

Requirements:
- Add a type for the API response
- Keep the fetch URL and return meaning unchanged
- Make getDisplayName safe when profile is missing
- Add tests for current display-name behavior
- Run npm test -- user-api and npm run typecheck

One acceptable first diff is:

// after: src/lib/user-api.ts
export type UserResponse = {
  id: string;
  name: string;
  profile?: {
    displayName?: string;
  };
};

export async function fetchUser(id: string): Promise<UserResponse> {
  const response = await fetch(`/api/users/${id}`);
  return response.json() as Promise<UserResponse>;
}

export function getDisplayName(user: UserResponse): string {
  return user.profile?.displayName ?? user.name;
}

This is not the end of the story. A cast does not validate runtime data. If the project needs runtime safety, add a second diff with a validator such as zod or an existing local parser. Do not combine “remove any” and “introduce a validation library” in the same beginner diff unless the team is ready to review both.

Tests should cover both the preferred and fallback paths.

// src/lib/user-api.test.ts
import { describe, expect, it } from "vitest";
import { getDisplayName, type UserResponse } from "./user-api";

describe("getDisplayName", () => {
  it("uses profile displayName when present", () => {
    const user: UserResponse = {
      id: "u1",
      name: "Masa",
      profile: { displayName: "Masa I." },
    };

    expect(getDisplayName(user)).toBe("Masa I.");
  });

  it("falls back to name when profile is missing", () => {
    expect(getDisplayName({ id: "u2", name: "Guest" })).toBe("Guest");
  });
});

When reviewing, look for dangerous shortcuts: as any, swallowed errors, empty-string fallbacks, or changed optional-field behavior. A type-safe diff can still be a behavior-breaking diff.

Use case 3: Split a large function only after adding a test harness

Large service functions are tempting targets, but they are also where behavior tends to hide. Order creation, billing, permissions, notifications, and import jobs often mix validation, calculation, persistence, and side effects. Ask Claude Code to extract only one pure piece first.

// before: src/services/order-service.ts
export async function createOrder(input: CreateOrderInput) {
  if (input.items.length === 0) {
    throw new Error("items required");
  }

  const subtotal = input.items.reduce((sum, item) => sum + item.price * item.quantity, 0);
  const shippingFee = subtotal >= 10000 ? 0 : 800;
  const total = subtotal + shippingFee;

  const order = await db.order.create({
    data: {
      userId: input.userId,
      subtotal,
      shippingFee,
      total,
    },
  });

  await mailer.sendOrderCreated(order.id);
  return order;
}

The prompt should explicitly say what is out of scope.

Make createOrder in src/services/order-service.ts smaller.

Do in this diff:
- Extract only shipping and total calculation into a pure function
- Name it calculateOrderTotals
- Add unit tests for calculateOrderTotals
- Keep database write and email order unchanged

Do not do in this diff:
- Change database schema
- Change error messages
- Change API response shape
- Move unrelated functions
- Reformat the whole file

The after-state is intentionally modest.

// after: src/services/order-service.ts
export function calculateOrderTotals(items: OrderItem[]) {
  const subtotal = items.reduce(
    (sum, item) => sum + item.price * item.quantity,
    0
  );
  const shippingFee = subtotal >= 10000 ? 0 : 800;

  return {
    subtotal,
    shippingFee,
    total: subtotal + shippingFee,
  };
}

export async function createOrder(input: CreateOrderInput) {
  if (input.items.length === 0) {
    throw new Error("items required");
  }

  const { subtotal, shippingFee, total } = calculateOrderTotals(input.items);

  const order = await db.order.create({
    data: { userId: input.userId, subtotal, shippingFee, total },
  });

  await mailer.sendOrderCreated(order.id);
  return order;
}

Review with focused commands.

git diff --stat
git diff -- src/services/order-service.ts
git diff -- src/services/order-service.test.ts
npm test -- order-service

If Claude Code also reformats unrelated code, ask it to reduce the diff.

This diff is too large.
Revert formatting-only changes and keep only calculateOrderTotals extraction plus tests.
Do not change external behavior, error text, database writes, or email ordering.

That sentence saves real review time. Smaller diffs make automation feel trustworthy instead of chaotic.

Review the result with git diff, not with vibes

Claude Code’s explanation is useful, but the diff is the source of truth.

git diff --check
git diff --stat
git diff --name-only
git diff --word-diff -- src/domain/order.ts

Use this review table.

Area	What to check
Behavior	Inputs, outputs, exceptions, HTTP status, and persistence order are unchanged
Diff size	The changed files fit in one human review pass
Tests	Existing behavior is covered before or during the change
Types	No new `as any`, unsafe casts, or ignored errors
Side effects	API calls, email, billing, deletion, and permissions keep their order
Summary	Claude Code’s summary matches the actual diff

You can also ask Claude Code to review its own diff.

Review this git diff.

Check:
- Did the change exceed refactoring scope?
- Which behavior is not protected by tests?
- Are there unsafe casts or swallowed errors?
- Which files should a human inspect carefully?

Return:
- Looks safe
- Needs human confirmation
- Must fix
with file names and reasons.

Even then, a human should inspect high-impact areas: deletion, billing, permissions, authentication, email delivery, and data migrations. Tests are necessary, but business impact decides the final review standard.

Pitfall and failure examples to avoid

The first failure pattern is an overly broad prompt.

Make this service layer cleaner.

That can mix function extraction, naming, error design, file movement, and formatting in one diff. Prefer one precise target.

Only extract the shipping-fee calculation from createOrder into a pure function.
Do not change processing order, error messages, or return values.

The second failure pattern is accepting a clean-looking diff without tests. Readability can improve while edge cases change. Discount floors, free-shipping thresholds, permission denials, retry behavior, and null handling should be locked by tests before the implementation changes.

The third failure pattern is mixing formatter output with structural refactoring. If Prettier or ESLint changes hundreds of lines, the meaningful refactor becomes hard to see. Run formatting in a separate diff or ask Claude Code to avoid unrelated formatting.

The fourth risk is giving broad command permissions too early. Start with read commands, test commands, typecheck, and lint. Only expand permissions when the workflow is stable. For team setups, keep command allowlists in project settings and document them in CLAUDE.md.

A reusable refactoring checklist

Paste this checklist into the session before a real refactor.

## Refactoring checklist

- [ ] The change has one purpose
- [ ] Baseline tests were run before editing
- [ ] Before/after behavior is equivalent
- [ ] Existing or new tests protect the behavior
- [ ] git diff --stat is small enough to review
- [ ] git diff --check passes
- [ ] No new any, unsafe casts, or swallowed errors were introduced
- [ ] Database, email, billing, deletion, and permission order did not change

Here is the final prompt template I use.

Execute one safe refactoring diff.

Target:
- src/services/order-service.ts
- src/services/order-service.test.ts

Success criteria:
- External behavior does not change
- calculateOrderTotals is extracted
- Existing and added tests pass
- Report git diff --stat and commands you ran

Forbidden:
- Database schema changes
- API response changes
- Error-message changes
- Unrelated file edits

This turns Claude Code from “make it nice” into “produce a small, reviewable, verified diff.”

Verification note and next step

In my own trial, the biggest quality improvement came from two habits: asking for an inspection-only plan before editing, and forcing every implementation to end with a git diff summary. Claude Code was much more reliable when the prompt included what not to change. It was less reliable when the prompt described a vague design goal.

Start with renaming, pure-function extraction, or any reduction. Once that works, combine this workflow with the Claude Code review checklist and CLAUDE.md best practices so the team can repeat the process consistently.

If your team wants a safe operating model for Claude Code, the Claude Code training covers permission settings, review habits, prompts, and workflow design. Refactoring automation becomes valuable when it is boring, testable, and easy to review.

Claude Code Refactoring Automation: A Safe Practical Workflow

Start by defining the safety boundary

The safe workflow: inspect, plan, edit one diff, verify

Use case 1: Rename and extract a small pure function

Use case 2: Remove `any` by typing the boundary first

Use case 3: Split a large function only after adding a test harness

Review the result with git diff, not with vibes

Pitfall and failure examples to avoid

A reusable refactoring checklist

Verification note and next step

Free PDF: Claude Code Cheatsheet

Level up your Claude Code workflow

Related Posts

Claude Code Obsidian to CLAUDE.md Workflow: Stop Re-explaining Context

Claude Code Revenue CTA Routing: Send Articles to PDF, Gumroad, and Consultation

Claude Code Team Handoff Rules: Review Evidence, Permissions, Rollback, and Revenue Paths

Related Products

50 Battle-Tested Claude Code Prompt Templates

The Complete Claude Code Setup & Configuration Guide

Start by defining the safety boundary

The safe workflow: inspect, plan, edit one diff, verify

Use case 1: Rename and extract a small pure function

Use case 2: Remove any by typing the boundary first

Use case 3: Split a large function only after adding a test harness

Review the result with git diff, not with vibes

Pitfall and failure examples to avoid

A reusable refactoring checklist

Verification note and next step

Free PDF: Claude Code Cheatsheet

Level up your Claude Code workflow

Related Posts

Claude Code Obsidian to CLAUDE.md Workflow: Stop Re-explaining Context

Claude Code Revenue CTA Routing: Send Articles to PDF, Gumroad, and Consultation

Claude Code Team Handoff Rules: Review Evidence, Permissions, Rollback, and Revenue Paths

Related Products

50 Battle-Tested Claude Code Prompt Templates

The Complete Claude Code Setup & Configuration Guide

Use case 2: Remove `any` by typing the boundary first