Claude Code 错误消息解读：把日志变成可复现修复的实战流程

刚开始用 Claude Code 时，很多人会把一大段错误消息贴进去，然后说“帮我修”。这偶尔能成功，但不稳定。Claude Code 不会自动知道你刚才运行的命令、环境变量、Node 版本、Kubernetes namespace，或者 CI 里隐藏的差异。更可靠的做法是：先把错误整理成可复现的 bug 报告，再让 Claude Code 输出根因假设、下一条命令和验证计划。

本文会覆盖 5 个常见场景：TypeScript 类型错误、Node.js stack trace、运行时日志、Docker/Kubernetes 日志、GitHub Actions 失败。官方基础流程可以参考 Claude Code common workflows，排障入口看 Claude Code troubleshooting，团队使用前建议检查 Claude Code settings。

核心思路：不要猜，要复现

错误消息不是谜语，而是证据。好的提问不是“这是什么错”，而是“怎样最小复现、怎样确认、怎样安全修”。

flowchart TD
  A["保存失败命令的完整输出"] --> B["压缩噪音但保留原始日志"]
  B --> C["让 Claude Code 给出假设和复现步骤"]
  C --> D["制作最小失败案例"]
  D --> E["修复后运行同一条命令"]
  E --> F["把预防措施写进测试或 CLAUDE.md"]

错误来源	交给 Claude Code 的材料	要求输出	人工确认
TypeScript	完整 `tsc --noEmit --pretty false` 输出和文件路径	哪个类型契约被破坏、安全修法、危险修法	没有用 `any` 或 `ts-ignore` 逃避
Node.js stack trace	第一行 Error、应用代码 frame、触发输入	第一个有用 frame、复现输入	同样输入能本地复现
Docker/Kubernetes	`describe`、previous logs、events	OOM、env、probe、镜像或应用错误分类	有证据行和确认命令
GitHub Actions	失败 job 日志和变更文件	失败 step、本地复现命令、CI 差异	本地和 CI 都通过

用例1：保存 npm 和 tsc 的错误输出

不要只复制终端最后几行。先保存完整输出。

mkdir -p tmp/error-cases
npm test 2>&1 | tee tmp/error-cases/test.log
npx tsc --noEmit --pretty false 2>&1 | tee tmp/error-cases/tsc.log

然后让 Claude Code 先给调试计划。

claude -p "
I need a reproducible fix, not a guess.

Read these files if they exist:
- tmp/error-cases/test.log
- tmp/error-cases/tsc.log

Return:
1. One-line failure summary
2. Likely root cause with confidence level
3. Minimal reproduction steps
4. Next 3 commands to run
5. Smallest safe code change to try
6. Verification command after the fix

Do not hide TypeScript errors with any or ts-ignore.
"

这里的 confidence level 很重要。如果 Claude Code 只有六成把握，你需要的是下一条验证命令，而不是看起来很自信的补丁。

用例2：缩短 Node.js stack trace

Node.js 的 stack trace 经常被 node_modules 淹没。原始日志要保留，同时可以生成一个短版给 Claude Code 先看。

// scripts/minimize-stacktrace.mjs
import { readFileSync } from "node:fs";

const input = readFileSync(0, "utf8");
const lines = input.split(/\r?\n/);
const kept = [];
let dependencyFrames = 0;

for (const line of lines) {
  const isStackFrame = /^\s+at /.test(line);
  const isDependencyFrame = line.includes("node_modules");

  if (!isStackFrame || !isDependencyFrame || dependencyFrames < 3) {
    kept.push(line);
  }

  if (isStackFrame && isDependencyFrame) {
    dependencyFrames += 1;
  }
}

const important = kept.filter((line) =>
  /Error:|TypeError:|ReferenceError:|SyntaxError:|Caused by:|^\s+at |src\/|app\/|packages\//.test(line)
);

console.log(important.slice(0, 80).join("\n"));

node scripts/minimize-stacktrace.mjs < tmp/error-cases/test.log > tmp/error-cases/test.min.log

再这样提问。

claude -p "
Analyze tmp/error-cases/test.min.log first.
If the minimized log is not enough, ask for the full log instead of guessing.

Explain:
- Which application frame is the first useful frame
- What input or state is needed to reproduce it
- Whether this looks like async timing, null data, missing env, or dependency mismatch
- The smallest test that would fail before the fix
"

这个脚本不是诊断器，只是减少噪音。真正的判断仍然要回到完整日志和复现命令。

用例3：把 TypeScript 错误看成“契约破坏”

Type X is not assignable to type Y 的意思通常是：调用方传入的数据形状，和被调用方承诺接收的数据形状不一致。也就是代码之间的契约被破坏。

claude -p "
Explain this TypeScript error as a broken contract between caller and callee.

Use this output:
$(npx tsc --noEmit --pretty false 2>&1)

Return a table with:
- Error location
- Plain Chinese explanation
- Data shape expected
- Data shape actually provided
- Safe fix
- Risky fix to avoid

Do not suggest any, ts-ignore, or disabling strict mode unless there is no other option.
"

这样可以把“真正修复”和“让编译器闭嘴”分开。例如 User | null 的错误，安全方向通常是处理未登录状态、校验 API 返回值、补齐测试数据，而不是直接强转。

用例4：把 Kubernetes 日志变成下一条确认命令

CrashLoopBackOff 只是结果，不是原因。先收集 pod 描述、上一次日志和事件。

kubectl get pod -n app
kubectl describe pod web-abc123 -n app > tmp/error-cases/pod.describe.txt
kubectl logs web-abc123 -n app --previous > tmp/error-cases/pod.previous.log
kubectl get events -n app --sort-by=.lastTimestamp > tmp/error-cases/events.log

然后要求 Claude Code 必须给证据。

claude -p "
Triage this Kubernetes crash.

Files:
- tmp/error-cases/pod.describe.txt
- tmp/error-cases/pod.previous.log
- tmp/error-cases/events.log

Return:
1. Most likely category: OOMKilled, missing env, image pull, app exception, probe failure, or dependency outage
2. Evidence lines from the logs
3. One kubectl command to confirm each remaining hypothesis
4. Temporary mitigation
5. Permanent fix
6. Rollback check

If evidence is insufficient, say what command is missing.
"

如果回答说不出证据行，就把它当成假设，而不是结论。

用例5：分析 GitHub Actions 失败

CI 的最后几百行经常只是连锁反应。先抓取失败日志，再区分“本地也会失败”和“只有 CI 会失败”。

gh run list --limit 5
gh run view RUN_ID --log > tmp/error-cases/github-actions.log

claude -p "
You are triaging a GitHub Actions failure.

Analyze tmp/error-cases/github-actions.log and return:
1. Failed job and failed step
2. Exact command that failed
3. Whether this should reproduce locally
4. Local reproduction command
5. CI-only differences to inspect: Node version, env vars, cache, timezone, OS, permissions
6. Smallest patch to try
7. Verification plan for local and CI

Do not assume the root cause if the log only shows a downstream symptom.
"

这个提示词对 flaky test、时区差异、缺少 secret、缓存损坏特别有用。

可复制的 bug 报告模板

# Bug report: short title

## Goal
What I was trying to do:

## Environment
- OS:
- Node version:
- Package manager:
- Branch:
- Commit:

## Exact command
```bash
paste the exact command here
```

## Expected result
What should have happened:

## Actual result
What happened instead:

## Logs
Paste the full error or attach the saved log file path.

## Minimal reproduction
Smallest steps that still fail:

## What I already tried
- Attempt 1:
- Attempt 2:

## Verification plan
Command that must pass after the fix:

常见坑

第一，只贴最后三行。最后一行常常只是失败结果，真正原因可能在中间。

第二，不写执行命令。npm test、vitest --run、npm run build 的上下文完全不同。

第三，用 any、ts-ignore 或删除测试来“修复” TypeScript。除非是明确的临时止血，否则这不是安全默认值。

第四，把 secret 放进日志。API key、Cookie、JWT、数据库连接串都要先脱敏。团队使用时，先检查官方 settings 文档里的权限和配置。

第五，修完后先跑别的命令。应该先重跑原来失败的命令，再扩大到 lint、build、CI。

ClaudeCodeLab 的模板、培训和咨询

个人使用时，直接复制本文的 prompt 就可以开始。团队使用时，真正困难的是统一日志规范、禁止危险修法、CI triage 流程和 reviewer 需要的证据。

ClaudeCodeLab 提供 Claude Code 教材与模板和 Claude Code 培训与咨询。如果你想把错误分析变成团队流程，可以一起整理 bug 报告模板、CI triage prompt、CLAUDE.md 规则和复盘检查表。

总结

Claude Code 解读错误消息的重点不是让它更会猜，而是给它足够证据，让它输出可以执行的下一步。保存命令、保存完整日志、谨慎压缩噪音、要求置信度、用同一条命令验证，质量会明显提高。

实际试用本文流程后，Masa 在维护 ClaudeCodeLab 时发现，前 10 分钟的调试最省时间。固定保存 tsc --pretty false 输出、压缩 stack trace 但保留原始日志、把 CI 失败拆成 job、step、command、reproduction 以后，Claude Code 的建议不再需要盲信，而是可以一条命令一条命令地验证。

Claude Code 错误消息解读：把日志变成可复现修复的实战流程

核心思路：不要猜，要复现

用例1：保存 npm 和 tsc 的错误输出

用例2：缩短 Node.js stack trace

用例3：把 TypeScript 错误看成“契约破坏”

用例4：把 Kubernetes 日志变成下一条确认命令

用例5：分析 GitHub Actions 失败

可复制的 bug 报告模板

常见坑

ClaudeCodeLab 的模板、培训和咨询

总结

免费 PDF: Claude Code 速查表

把 Claude Code 变成真正能带来结果的工作流

相关文章

Claude Code权限安全阶梯：逐步放开访问而不失控

Claude Code 小PR证据包：让小改动真正可审查

Claude Code 提交前 Review Gate：同时检查差异、测试、公开 URL 和 CTA