用 Claude Code 安全处理 Markdown/MDX 的实战指南

为什么不能只把 Markdown 当字符串处理

发布过的 Markdown/MDX 文章不是一段普通文本。它同时包含 frontmatter、SEO description、标题层级、自动生成的锚点、代码围栏、内部链接、官方外部链接、多语言路径，有时还会包含 raw HTML。让 Claude Code “顺手优化一下文章”看起来很方便，但如果没有边界，它可能把 slug 改掉、把 CTA 删掉、让某个 locale 只剩摘要，或者让代码块少一个结束围栏。

更稳的做法是：让 Claude Code 写内容，但让结构由脚本检查。Markdown 结构应该用 AST，也就是抽象语法树来读取；frontmatter 应该像数据一样验证；HTML 输出必须考虑 XSS；多语言文件要一起检查。这样文章质量不靠感觉，而靠可重复的检查流程。

本文引用的核心资料已在 2026 年 6 月 2 日确认。unified 的处理模型可以看 unified 入门和 syntax trees 说明。Markdown 解析参考 remark 和 remark-parse。MDX 语法参考 MDX 官方文档。frontmatter 可用 gray-matter。HTML 安全要同时看 rehype-sanitize 和 OWASP XSS Prevention Cheat Sheet。Claude Code 的工作边界建议阅读 Claude Code overview 与 settings。

flowchart LR
  A["MDX 文件"] --> B["frontmatter"]
  B --> C["schema 验证"]
  A --> D["remark / MDX AST"]
  D --> E["标题、代码围栏、链接"]
  D --> F["rehype HTML 管线"]
  F --> G["sanitize"]
  C --> H["locale 与 build 检查"]
  E --> H
  G --> H

先选对 parser，再让 Claude Code 动手

给 Claude Code 的第一条约束应该是工具链。只说“解析 Markdown”，模型很容易给出短正则。短正则可以找固定字符串，但不能承担发布文章的结构判断。

场景	推荐做法	容易出错的做法
读取标题、链接、代码块	`remark-parse` 加 AST 遍历	用 `^##` 匹配原文
处理带 JSX 的 `.mdx`	`remark-mdx` 或 MDX compiler	只用 Markdown parser
输出 HTML	`remark-rehype` 转到 rehype	手动拼 HTML 字符串
允许 raw HTML	`rehype-raw` 后接 `rehype-sanitize`	只开 `allowDangerousHtml`
读取 frontmatter	`gray-matter` 加字段验证	按行 split YAML

AST 的价值在于它能区分语义。代码块里的 ## 假标题 不应该进入目录；MDX 组件里的属性也不能当普通段落处理；YAML 里 tags: Claude Code, Markdown 是字符串，不是数组。Claude Code 可以写检查脚本，但你必须要求它用结构化 parser。

4 个具体用例

第一个用例是更新已发布博客。你需要同时处理标题、description、updatedDate、官方链接、内部链接、代码示例和商业 CTA。对于 ClaudeCodeLab，相关文章可以连接到 CLAUDE.md 最佳实践和 Claude Code 网页抓取，但不能顺手改其他 slug。

第二个用例是文档站或帮助中心的 MDX 组件化。提示框、标签页、价格表、FAQ、配置片段都适合 MDX，但 Markdown 与 JSX 混在一起以后，regex 会很脆弱。

第三个用例是十语言发布。日本语 canonical 写得很厚，但中文、韩文、法文或印尼语只剩摘要，会直接伤害本地读者和 SEO。每个 locale 都需要完整案例、失败模式、可复制代码、官方链接、内部链接、CTA 和验证说明。

第四个用例是商业内容运营。Gumroad 商品页、培训页、邮件资源、免费 PDF 交付页都可能复用 Markdown。越靠近购买和咨询，代码块、链接和 XSS 防护越不能靠人工记忆。

可复制的最小安装

下面的例子假设 Node.js 18 以上，并使用 ESM。可以先放在临时目录里试跑，再搬进实际仓库。

mkdir mdx-audit-demo
cd mdx-audit-demo
npm init -y
npm pkg set type=module
npm install unified remark-parse remark-mdx remark-gfm gray-matter
npm install unist-util-visit github-slugger
npm install remark-rehype rehype-raw rehype-sanitize rehype-stringify
mkdir tools

示例1：检查 frontmatter、标题、代码围栏和链接

这个脚本用 gray-matter 读取 frontmatter，用 remark 和 MDX parser 读取正文。它会检查 description 长度、必填字段、代码围栏语言、内部链接、外部链接，并为标题生成稳定 slug。

// tools/audit-mdx.mjs
import fs from "node:fs/promises";
import matter from "gray-matter";
import GithubSlugger from "github-slugger";
import { unified } from "unified";
import remarkParse from "remark-parse";
import remarkMdx from "remark-mdx";
import remarkGfm from "remark-gfm";
import { visit } from "unist-util-visit";

const file = process.argv[2];
if (!file) {
  throw new Error("Usage: node tools/audit-mdx.mjs article.mdx");
}

const source = await fs.readFile(file, "utf8");
const { data, content } = matter(source);
const errors = [];
const links = { internal: [], external: [] };
const headings = [];
const codeBlocks = [];

for (const key of ["title", "description", "pubDate", "heroImage", "lang"]) {
  if (typeof data[key] !== "string" || data[key].trim() === "") {
    errors.push(`frontmatter.${key} is required`);
  }
}

if ([...String(data.description ?? "")].length > 120) {
  errors.push("description must be 120 characters or fewer");
}

if (!Array.isArray(data.tags) || data.tags.length === 0) {
  errors.push("frontmatter.tags must be a non-empty array");
}

const tree = unified()
  .use(remarkParse)
  .use(remarkMdx)
  .use(remarkGfm)
  .parse(content);

const slugger = new GithubSlugger();

visit(tree, (node) => {
  if (node.type === "heading") {
    const text = plainText(node);
    headings.push({ depth: node.depth, text, slug: slugger.slug(text) });
  }

  if (node.type === "code") {
    codeBlocks.push({ lang: node.lang || "", meta: node.meta || "" });
    if (!node.lang) errors.push("code fence is missing a language");
  }

  if (node.type === "link") {
    const url = String(node.url || "");
    if (url.startsWith("http")) links.external.push(url);
    if (url.startsWith("/")) links.internal.push(url);
  }
});

if (links.internal.length === 0) errors.push("missing internal link");
if (links.external.length === 0) errors.push("missing external link");

if (errors.length > 0) {
  console.error(errors.map((error) => `- ${error}`).join("\n"));
  process.exit(1);
}

console.log(JSON.stringify({ headings, codeBlocks, links }, null, 2));

function plainText(node) {
  if (typeof node.value === "string") return node.value;
  if (!Array.isArray(node.children)) return "";
  return node.children.map(plainText).join("");
}

运行方式如下。先对单个文件跑，确认规则没有误伤后再放进 CI。

node tools/audit-mdx.mjs site/src/content/blog-zh/example.mdx

示例2：把 Markdown 转成安全 HTML

如果不需要 raw HTML，最安全的选择是不要启用它。如果必须允许作者写 HTML，就要先解析 raw HTML，再 sanitize。只启用 allowDangerousHtml 不是安全策略。

// tools/markdown-to-safe-html.mjs
import fs from "node:fs/promises";
import { unified } from "unified";
import remarkParse from "remark-parse";
import remarkGfm from "remark-gfm";
import remarkRehype from "remark-rehype";
import rehypeRaw from "rehype-raw";
import rehypeSanitize, { defaultSchema } from "rehype-sanitize";
import rehypeStringify from "rehype-stringify";

const file = process.argv[2];
const markdown = await fs.readFile(file, "utf8");
const schema = {
  ...defaultSchema,
  attributes: {
    ...defaultSchema.attributes,
    code: [["className", /^language-/]],
  },
};

const html = await unified()
  .use(remarkParse)
  .use(remarkGfm)
  .use(remarkRehype, { allowDangerousHtml: true })
  .use(rehypeRaw)
  .use(rehypeSanitize, schema)
  .use(rehypeStringify)
  .process(markdown);

console.log(String(html));

顺序很重要。rehype-raw 把 HTML 重新放进 HTML AST，rehype-sanitize 再按 schema 删除不允许的标签和属性。没有第二步，就可能把危险属性带到页面中。

示例3：检查十个 locale 的同一 slug

多语言文章要一起看。下面的脚本确认每个 locale 都存在、heroImage 没有变、updatedDate 正确、description 不超过 120 字，并且正文包含内部链接和外部链接。

// tools/check-locales.mjs
import fs from "node:fs";
import path from "node:path";
import matter from "gray-matter";

const slug = "claude-code-markdown-processing.mdx";
const expectedHero = "/images/hero/hero-077.png";
const locales = [
  ["ja", "site/src/content/blog"],
  ["en", "site/src/content/blog-en"],
  ["zh", "site/src/content/blog-zh"],
  ["ko", "site/src/content/blog-ko"],
  ["es", "site/src/content/blog-es"],
  ["fr", "site/src/content/blog-fr"],
  ["de", "site/src/content/blog-de"],
  ["pt", "site/src/content/blog-pt"],
  ["hi", "site/src/content/blog-hi"],
  ["id", "site/src/content/blog-id"],
];

const errors = [];

for (const [lang, dir] of locales) {
  const file = path.join(dir, slug);
  const source = fs.readFileSync(file, "utf8");
  const { data, content } = matter(source);
  if (data.lang !== lang) errors.push(`${lang}: lang mismatch`);
  if (data.heroImage !== expectedHero) errors.push(`${lang}: hero changed`);
  if (data.updatedDate !== "2026-06-02") {
    errors.push(`${lang}: updatedDate mismatch`);
  }
  if ([...String(data.description ?? "")].length > 120) {
    errors.push(`${lang}: description too long`);
  }
  if (!content.includes("https://")) errors.push(`${lang}: no external link`);
  if (!content.includes("](/")) errors.push(`${lang}: no internal link`);
}

if (errors.length > 0) {
  console.error(errors.map((error) => `- ${error}`).join("\n"));
  process.exit(1);
}

console.log("locale set is consistent");

具体失败例

失败	后果	防线
用 regex 读标题	代码块里的假标题进入目录	只读 `heading` 节点
`tags` 写成字符串	相关文章和筛选器异常	验证 frontmatter 类型
不统一 slug 生成	锚点链接在不同语言断开	使用同一 slugger
raw HTML 直接发布	XSS 风险进入页面	使用 `rehype-sanitize`
不检查外部链接	官方文档迁移后仍指向旧地址	发布前探测链接
prompt 范围太大	并行工作者的文件被改动	明确 `owned_files`

这些失败例应该直接写进 Claude Code 的任务说明。不要只说“质量要高”，而要说“不要用 regex-only 解析标题、不要修改其他 slug、保留 heroImage、description 不超过 120 字、raw HTML 必须 sanitize”。

给 Claude Code 的安全 prompt

task: "Refresh one published MDX article"
owned_files:
  - "site/src/content/blog-zh/claude-code-markdown-processing.mdx"
preserve:
  - "slug path"
  - "heroImage"
  - "unrelated dirty files"
required:
  - "updatedDate: 2026-06-02"
  - "description <= 120 characters"
  - "AST-based Markdown checks"
  - "official external links"
  - "internal links and monetization CTA"
forbidden:
  - "regex-only heading parsing"
  - "raw HTML without sanitization"
  - "thin locale summaries"
verification:
  - "node scripts/check-code-fences.mjs"
  - "node scripts/check-updated-article-quality.mjs"

发布前检查与 CTA

发布前至少跑一次脚本，再做人工阅读。脚本负责结构、围栏、metadata、链接和文章厚度；人工负责判断本地化是否自然、段落在手机上是否过长、CTA 是否符合上下文。

node tools/audit-mdx.mjs site/src/content/blog-zh/claude-code-markdown-processing.mdx
node tools/check-locales.mjs
node scripts/check-code-fences.mjs
node scripts/check-updated-article-quality.mjs

个人学习可以从免费 Claude Code cheatsheet 开始。需要可复用的 review 和写作 prompt，可以使用 Claude Code prompt templates。如果团队要统一权限、CI、locale 流程和发布 review，可以查看 Claude Code 培训与咨询。

实际验证结果

这次刷新中，Masa 先把常见问题列成失败条件：description 过长、updatedDate 缺失、heroImage 被改、代码围栏没有语言、locale 变成薄摘要、官方链接过旧。然后再让 Claude Code 按 AST 和 frontmatter schema 的思路重写。实际效果是，review 不再只看文章是否“读起来不错”，而是能用 node scripts/check-code-fences.mjs 和 node scripts/check-updated-article-quality.mjs 证明结构没有坏。发布文章的质量提升，关键不只是多写内容，而是把不能破坏的约束先变成脚本和 prompt。

用 Claude Code 安全处理 Markdown/MDX 的实战指南

为什么不能只把 Markdown 当字符串处理

先选对 parser，再让 Claude Code 动手

4 个具体用例

可复制的最小安装

示例1：检查 frontmatter、标题、代码围栏和链接

示例2：把 Markdown 转成安全 HTML

示例3：检查十个 locale 的同一 slug

具体失败例

给 Claude Code 的安全 prompt

发布前检查与 CTA

实际验证结果

免费 PDF: Claude Code 速查表

把 Claude Code 变成真正能带来结果的工作流

相关文章

Claude Code权限安全阶梯：逐步放开访问而不失控

Claude Code 小PR证据包：让小改动真正可审查

Claude Code 提交前 Review Gate：同时检查差异、测试、公开 URL 和 CTA