Safe Markdown and MDX Processing with Claude Code
Build safe Markdown/MDX workflows with Claude Code: AST parsing, frontmatter, XSS defense, links, and locale QA.
Why Markdown Processing Needs More Than Regex
A published Markdown or MDX article is not just text. It carries frontmatter, SEO metadata, headings, generated IDs, code fences, internal links, external references, locale-specific routes, and sometimes raw HTML. If you ask Claude Code to “clean up this article” without a processing contract, the result may read better while silently changing the slug, dropping a CTA, breaking a code fence, or leaving one locale as a thin summary.
The practical rule is simple: let Claude Code edit prose, but make the structure machine-checkable. Use an AST, or abstract syntax tree, for Markdown and MDX. Validate frontmatter as data. Treat HTML sanitization as a security boundary. Run locale and build checks before publishing.
I verified the core references on June 2, 2026. The unified guide explains the parse, transform, and stringify pipeline, while unified syntax trees explain why AST nodes are safer than line matching. Markdown parsing is covered by remark and remark-parse. MDX syntax is documented in the MDX docs. For frontmatter, use gray-matter. For raw HTML and XSS risk, compare rehype-sanitize with the OWASP XSS Prevention Cheat Sheet. Claude Code workflow boundaries are easier to enforce when you also read the official Claude Code overview and settings.
flowchart LR
A["MDX file"] --> B["frontmatter"]
B --> C["schema validation"]
A --> D["remark / MDX AST"]
D --> E["headings, fences, links"]
D --> F["rehype HTML pipeline"]
F --> G["sanitize"]
C --> H["locale and build checks"]
E --> H
G --> H
Choose the Parser by the Job
The first instruction to Claude Code should name the tool chain. A vague “parse Markdown” request often produces a quick regex. That works for a toy file and fails on real articles.
| Need | Better choice | Risky shortcut |
|---|---|---|
| Read headings, links, and code fences | remark-parse plus AST traversal | ^## regex on raw text |
Handle JSX inside .mdx | remark-mdx or the MDX compiler | Markdown-only parser |
| Render HTML | remark-rehype into the rehype pipeline | String concatenation |
| Accept raw HTML | rehype-raw followed by rehype-sanitize | allowDangerousHtml alone |
| Read frontmatter | gray-matter and schema checks | Splitting YAML by hand |
Regex is still useful for narrow checks, such as finding an exact literal in a file. It should not be the source of truth for Markdown structure. A code fence can contain ## Not a heading; an MDX component can contain links as props; YAML can turn a list into a string if the author forgets brackets. AST and schema validation catch those cases before readers see them.
Four Practical Use Cases
The first use case is refreshing a published blog article. You need updated metadata, a stronger introduction, official links, internal links, working code, and a revenue CTA. For ClaudeCodeLab, that means connecting this topic to guides such as CLAUDE.md best practices and web scraping with Claude Code without changing unrelated slugs.
The second use case is a docs site that mixes Markdown and MDX components. Callouts, tabs, pricing cards, and live examples are useful, but they make regex parsing brittle. The checker must understand both Markdown nodes and MDX nodes.
The third use case is multilingual publishing. A strong Japanese canonical article does not help if English, Spanish, Indonesian, and other locales become short summaries. Each locale needs the same depth: use cases, pitfalls, snippets, official links, internal links, CTA, and verification notes.
The fourth use case is commercial content operations. Product pages, Gumroad landing pages, training pages, and email resources often reuse Markdown. Broken code fences and unsafe HTML reduce trust exactly where conversion matters. If an article sends readers to products or training, the content pipeline has to protect those links.
Copy-Paste Setup
The snippets below use Node.js 18 or newer. They are written as ESM modules so they can run directly in a small tools folder.
mkdir mdx-audit-demo
cd mdx-audit-demo
npm init -y
npm pkg set type=module
npm install unified remark-parse remark-mdx remark-gfm gray-matter
npm install unist-util-visit github-slugger
npm install remark-rehype rehype-raw rehype-sanitize rehype-stringify
mkdir tools
Example 1: Audit Frontmatter, Headings, Fences, and Links
This script reads frontmatter with gray-matter, parses the body with remark and MDX support, then reports headings, code fences, and links. It fails if required metadata is missing, the description is too long, a code fence has no language, or the article lacks internal or external links.
// tools/audit-mdx.mjs
import fs from "node:fs/promises";
import matter from "gray-matter";
import GithubSlugger from "github-slugger";
import { unified } from "unified";
import remarkParse from "remark-parse";
import remarkMdx from "remark-mdx";
import remarkGfm from "remark-gfm";
import { visit } from "unist-util-visit";
const file = process.argv[2];
if (!file) {
throw new Error("Usage: node tools/audit-mdx.mjs article.mdx");
}
const source = await fs.readFile(file, "utf8");
const { data, content } = matter(source);
const errors = [];
const links = { internal: [], external: [] };
const codeBlocks = [];
const headings = [];
for (const key of ["title", "description", "pubDate", "heroImage", "lang"]) {
if (typeof data[key] !== "string" || data[key].trim() === "") {
errors.push(`frontmatter.${key} is required`);
}
}
if ([...String(data.description ?? "")].length > 120) {
errors.push("description must be 120 characters or fewer");
}
if (!Array.isArray(data.tags) || data.tags.length === 0) {
errors.push("frontmatter.tags must be a non-empty array");
}
const tree = unified()
.use(remarkParse)
.use(remarkMdx)
.use(remarkGfm)
.parse(content);
const slugger = new GithubSlugger();
visit(tree, (node) => {
if (node.type === "heading") {
const text = plainText(node);
headings.push({ depth: node.depth, text, slug: slugger.slug(text) });
}
if (node.type === "code") {
codeBlocks.push({ lang: node.lang || "", meta: node.meta || "" });
if (!node.lang) errors.push("code fence is missing a language");
}
if (node.type === "link") {
const url = String(node.url || "");
if (url.startsWith("http")) links.external.push(url);
if (url.startsWith("/")) links.internal.push(url);
}
});
if (links.internal.length === 0) errors.push("missing internal link");
if (links.external.length === 0) errors.push("missing external link");
if (errors.length > 0) {
console.error(errors.map((error) => `- ${error}`).join("\n"));
process.exit(1);
}
console.log(JSON.stringify({ headings, codeBlocks, links }, null, 2));
function plainText(node) {
if (typeof node.value === "string") return node.value;
if (!Array.isArray(node.children)) return "";
return node.children.map(plainText).join("");
}
Run it against one file first, then wire it into CI only after the false positives are understood.
node tools/audit-mdx.mjs site/src/content/blog-en/example.mdx
Example 2: Convert Markdown to Safe HTML
If you never need raw HTML in Markdown, do not enable it. If the product requires raw HTML, sanitize after parsing. The unsafe pattern is to pass allowDangerousHtml and stop there.
// tools/markdown-to-safe-html.mjs
import fs from "node:fs/promises";
import { unified } from "unified";
import remarkParse from "remark-parse";
import remarkGfm from "remark-gfm";
import remarkRehype from "remark-rehype";
import rehypeRaw from "rehype-raw";
import rehypeSanitize, { defaultSchema } from "rehype-sanitize";
import rehypeStringify from "rehype-stringify";
const file = process.argv[2];
const markdown = await fs.readFile(file, "utf8");
const schema = {
...defaultSchema,
attributes: {
...defaultSchema.attributes,
code: [["className", /^language-/]],
},
};
const html = await unified()
.use(remarkParse)
.use(remarkGfm)
.use(remarkRehype, { allowDangerousHtml: true })
.use(rehypeRaw)
.use(rehypeSanitize, schema)
.use(rehypeStringify)
.process(markdown);
console.log(String(html));
The important detail is the order. rehype-raw parses raw HTML into the HTML tree; rehype-sanitize then removes disallowed tags and attributes. Without the second step, user-authored HTML can carry unsafe attributes into the rendered page.
Example 3: Check Locale Files for the Same Slug
For a ten-language site, run a small consistency script before review. This catches the common mistake where the canonical file is updated but one translation keeps the old hero image or missing updatedDate.
// tools/check-locales.mjs
import fs from "node:fs";
import path from "node:path";
import matter from "gray-matter";
const slug = "claude-code-markdown-processing.mdx";
const expectedHero = "/images/hero/hero-077.png";
const locales = [
["ja", "site/src/content/blog"],
["en", "site/src/content/blog-en"],
["zh", "site/src/content/blog-zh"],
["ko", "site/src/content/blog-ko"],
["es", "site/src/content/blog-es"],
["fr", "site/src/content/blog-fr"],
["de", "site/src/content/blog-de"],
["pt", "site/src/content/blog-pt"],
["hi", "site/src/content/blog-hi"],
["id", "site/src/content/blog-id"],
];
const errors = [];
for (const [lang, dir] of locales) {
const file = path.join(dir, slug);
const source = fs.readFileSync(file, "utf8");
const { data, content } = matter(source);
if (data.lang !== lang) errors.push(`${lang}: lang mismatch`);
if (data.heroImage !== expectedHero) errors.push(`${lang}: hero changed`);
if (data.updatedDate !== "2026-06-02") {
errors.push(`${lang}: updatedDate mismatch`);
}
if ([...String(data.description ?? "")].length > 120) {
errors.push(`${lang}: description too long`);
}
if (!content.includes("https://")) errors.push(`${lang}: no external link`);
if (!content.includes("](/")) errors.push(`${lang}: no internal link`);
}
if (errors.length > 0) {
console.error(errors.map((error) => `- ${error}`).join("\n"));
process.exit(1);
}
console.log("locale set is consistent");
Failure Modes to Show Claude Code Up Front
| Failure | Result | Guardrail |
|---|---|---|
| Regex reads headings | Code fence text enters the table of contents | Traverse heading nodes |
tags becomes a string | Filters and related posts break | Validate frontmatter types |
| Duplicate headings | Anchor links point to the wrong section | Generate slugs consistently |
| Raw HTML is trusted | XSS risk through attributes or tags | Sanitize with a schema |
| External links are not checked | Official docs move silently | Probe before publishing |
| Prompt scope is broad | Other workers’ files are modified | Lock owned_files |
The failure examples matter because Claude Code responds better to explicit constraints than to taste-based review comments. “Make it high quality” is weak. “Do not parse headings with regex, preserve heroImage, keep description under 120 characters, and do not touch other slugs” is actionable.
Safe Prompt Contract
Use a prompt contract like this when refreshing published content in a busy repository.
task: "Refresh one published MDX article"
owned_files:
- "site/src/content/blog-en/claude-code-markdown-processing.mdx"
preserve:
- "slug path"
- "heroImage"
- "unrelated dirty files"
required:
- "updatedDate: 2026-06-02"
- "description <= 120 characters"
- "AST-based Markdown checks"
- "official external links"
- "internal links"
- "monetization CTA"
forbidden:
- "regex-only heading parsing"
- "raw HTML without sanitization"
- "thin locale summaries"
verification:
- "node scripts/check-code-fences.mjs"
- "node scripts/check-updated-article-quality.mjs"
Publish Checks and CTA
Before publishing, run both local scripts and a human pass. The machine pass checks fences, metadata, links, and article depth. The human pass checks whether the examples fit the reader, whether paragraphs are short enough on mobile, and whether the CTA is natural.
node tools/audit-mdx.mjs site/src/content/blog-en/claude-code-markdown-processing.mdx
node tools/check-locales.mjs
node scripts/check-code-fences.mjs
node scripts/check-updated-article-quality.mjs
For monetization, keep the next step contextual. Individual users can start with the free Claude Code cheatsheet. Readers who want repeatable review and writing prompts can use Claude Code prompt templates. Teams that need permissions, CI checks, locale workflow, and review habits should use Claude Code training and consultation.
Hands-On Verification Note
For this refresh, Masa treated the article as a real content pipeline problem rather than a prose-only rewrite. The most useful checks were: description length, updatedDate, hero image preservation, code-fence languages, official external links, locale depth, and internal CTA links. The AST-based audit caught the category of mistakes that regex would miss, especially headings inside code blocks and MDX syntax near components. The final local commands were node scripts/check-code-fences.mjs and node scripts/check-updated-article-quality.mjs. The main lesson is that Claude Code becomes reliable when the article contract is executable, not when the prompt merely asks for better writing.
Free PDF: Claude Code Cheatsheet
Enter your email and download the one-page Claude Code cheatsheet for commands, review habits, and safe workflows.
We handle your data with care and never send spam.
Level up your Claude Code workflow
Start with the free PDF, use Gumroad guides when you need repeatable workflows, and book consultation when rollout or revenue paths need human judgment.
About the Author
Masa
Engineer focused on practical Claude Code workflows. Runs claudecode-lab.com, a 10-language technical media site.
Related Posts
Claude Code Permission Safety Ladder: Expand Access Without Losing Control
A beginner-friendly ladder for moving Claude Code from read-only to limited edits, proof commands, and deploy checks.
Claude Code Small PR Proof Pack: Make Tiny Changes Reviewable
A practical proof pack for Claude Code PRs: diff, checks, public URL, CTA path, and rollback note.
Claude Code Review Gate Before Commit: Diff, Tests, Public URL, and CTA Checks
A commit-time review gate for Claude Code work: diff scope, build, public URL, revenue CTA links, missing tests, and unrelated files.
Related Products
50 Battle-Tested Claude Code Prompt Templates
Copy, paste, ship. 50 production-ready prompts.
Use proven prompts for code review, refactoring, testing, documentation, debugging, architecture, and incident response.
The Complete Claude Code Setup & Configuration Guide
From install to team-ready workflow.
A practical guide to installation, CLAUDE.md, hooks, MCP servers, permissions, IDE setup, and CI/CD workflows.