Tips & Tricks (Updated: 6/2/2026)

Safe Markdown and MDX Processing with Claude Code

Build safe Markdown/MDX workflows with Claude Code: AST parsing, frontmatter, XSS defense, links, and locale QA.

Safe Markdown and MDX Processing with Claude Code

Why Markdown Processing Needs More Than Regex

A published Markdown or MDX article is not just text. It carries frontmatter, SEO metadata, headings, generated IDs, code fences, internal links, external references, locale-specific routes, and sometimes raw HTML. If you ask Claude Code to “clean up this article” without a processing contract, the result may read better while silently changing the slug, dropping a CTA, breaking a code fence, or leaving one locale as a thin summary.

The practical rule is simple: let Claude Code edit prose, but make the structure machine-checkable. Use an AST, or abstract syntax tree, for Markdown and MDX. Validate frontmatter as data. Treat HTML sanitization as a security boundary. Run locale and build checks before publishing.

I verified the core references on June 2, 2026. The unified guide explains the parse, transform, and stringify pipeline, while unified syntax trees explain why AST nodes are safer than line matching. Markdown parsing is covered by remark and remark-parse. MDX syntax is documented in the MDX docs. For frontmatter, use gray-matter. For raw HTML and XSS risk, compare rehype-sanitize with the OWASP XSS Prevention Cheat Sheet. Claude Code workflow boundaries are easier to enforce when you also read the official Claude Code overview and settings.

flowchart LR
  A["MDX file"] --> B["frontmatter"]
  B --> C["schema validation"]
  A --> D["remark / MDX AST"]
  D --> E["headings, fences, links"]
  D --> F["rehype HTML pipeline"]
  F --> G["sanitize"]
  C --> H["locale and build checks"]
  E --> H
  G --> H

Choose the Parser by the Job

The first instruction to Claude Code should name the tool chain. A vague “parse Markdown” request often produces a quick regex. That works for a toy file and fails on real articles.

NeedBetter choiceRisky shortcut
Read headings, links, and code fencesremark-parse plus AST traversal^## regex on raw text
Handle JSX inside .mdxremark-mdx or the MDX compilerMarkdown-only parser
Render HTMLremark-rehype into the rehype pipelineString concatenation
Accept raw HTMLrehype-raw followed by rehype-sanitizeallowDangerousHtml alone
Read frontmattergray-matter and schema checksSplitting YAML by hand

Regex is still useful for narrow checks, such as finding an exact literal in a file. It should not be the source of truth for Markdown structure. A code fence can contain ## Not a heading; an MDX component can contain links as props; YAML can turn a list into a string if the author forgets brackets. AST and schema validation catch those cases before readers see them.

Four Practical Use Cases

The first use case is refreshing a published blog article. You need updated metadata, a stronger introduction, official links, internal links, working code, and a revenue CTA. For ClaudeCodeLab, that means connecting this topic to guides such as CLAUDE.md best practices and web scraping with Claude Code without changing unrelated slugs.

The second use case is a docs site that mixes Markdown and MDX components. Callouts, tabs, pricing cards, and live examples are useful, but they make regex parsing brittle. The checker must understand both Markdown nodes and MDX nodes.

The third use case is multilingual publishing. A strong Japanese canonical article does not help if English, Spanish, Indonesian, and other locales become short summaries. Each locale needs the same depth: use cases, pitfalls, snippets, official links, internal links, CTA, and verification notes.

The fourth use case is commercial content operations. Product pages, Gumroad landing pages, training pages, and email resources often reuse Markdown. Broken code fences and unsafe HTML reduce trust exactly where conversion matters. If an article sends readers to products or training, the content pipeline has to protect those links.

Copy-Paste Setup

The snippets below use Node.js 18 or newer. They are written as ESM modules so they can run directly in a small tools folder.

mkdir mdx-audit-demo
cd mdx-audit-demo
npm init -y
npm pkg set type=module
npm install unified remark-parse remark-mdx remark-gfm gray-matter
npm install unist-util-visit github-slugger
npm install remark-rehype rehype-raw rehype-sanitize rehype-stringify
mkdir tools

This script reads frontmatter with gray-matter, parses the body with remark and MDX support, then reports headings, code fences, and links. It fails if required metadata is missing, the description is too long, a code fence has no language, or the article lacks internal or external links.

// tools/audit-mdx.mjs
import fs from "node:fs/promises";
import matter from "gray-matter";
import GithubSlugger from "github-slugger";
import { unified } from "unified";
import remarkParse from "remark-parse";
import remarkMdx from "remark-mdx";
import remarkGfm from "remark-gfm";
import { visit } from "unist-util-visit";

const file = process.argv[2];
if (!file) {
  throw new Error("Usage: node tools/audit-mdx.mjs article.mdx");
}

const source = await fs.readFile(file, "utf8");
const { data, content } = matter(source);
const errors = [];
const links = { internal: [], external: [] };
const codeBlocks = [];
const headings = [];

for (const key of ["title", "description", "pubDate", "heroImage", "lang"]) {
  if (typeof data[key] !== "string" || data[key].trim() === "") {
    errors.push(`frontmatter.${key} is required`);
  }
}

if ([...String(data.description ?? "")].length > 120) {
  errors.push("description must be 120 characters or fewer");
}

if (!Array.isArray(data.tags) || data.tags.length === 0) {
  errors.push("frontmatter.tags must be a non-empty array");
}

const tree = unified()
  .use(remarkParse)
  .use(remarkMdx)
  .use(remarkGfm)
  .parse(content);

const slugger = new GithubSlugger();

visit(tree, (node) => {
  if (node.type === "heading") {
    const text = plainText(node);
    headings.push({ depth: node.depth, text, slug: slugger.slug(text) });
  }

  if (node.type === "code") {
    codeBlocks.push({ lang: node.lang || "", meta: node.meta || "" });
    if (!node.lang) errors.push("code fence is missing a language");
  }

  if (node.type === "link") {
    const url = String(node.url || "");
    if (url.startsWith("http")) links.external.push(url);
    if (url.startsWith("/")) links.internal.push(url);
  }
});

if (links.internal.length === 0) errors.push("missing internal link");
if (links.external.length === 0) errors.push("missing external link");

if (errors.length > 0) {
  console.error(errors.map((error) => `- ${error}`).join("\n"));
  process.exit(1);
}

console.log(JSON.stringify({ headings, codeBlocks, links }, null, 2));

function plainText(node) {
  if (typeof node.value === "string") return node.value;
  if (!Array.isArray(node.children)) return "";
  return node.children.map(plainText).join("");
}

Run it against one file first, then wire it into CI only after the false positives are understood.

node tools/audit-mdx.mjs site/src/content/blog-en/example.mdx

Example 2: Convert Markdown to Safe HTML

If you never need raw HTML in Markdown, do not enable it. If the product requires raw HTML, sanitize after parsing. The unsafe pattern is to pass allowDangerousHtml and stop there.

// tools/markdown-to-safe-html.mjs
import fs from "node:fs/promises";
import { unified } from "unified";
import remarkParse from "remark-parse";
import remarkGfm from "remark-gfm";
import remarkRehype from "remark-rehype";
import rehypeRaw from "rehype-raw";
import rehypeSanitize, { defaultSchema } from "rehype-sanitize";
import rehypeStringify from "rehype-stringify";

const file = process.argv[2];
const markdown = await fs.readFile(file, "utf8");
const schema = {
  ...defaultSchema,
  attributes: {
    ...defaultSchema.attributes,
    code: [["className", /^language-/]],
  },
};

const html = await unified()
  .use(remarkParse)
  .use(remarkGfm)
  .use(remarkRehype, { allowDangerousHtml: true })
  .use(rehypeRaw)
  .use(rehypeSanitize, schema)
  .use(rehypeStringify)
  .process(markdown);

console.log(String(html));

The important detail is the order. rehype-raw parses raw HTML into the HTML tree; rehype-sanitize then removes disallowed tags and attributes. Without the second step, user-authored HTML can carry unsafe attributes into the rendered page.

Example 3: Check Locale Files for the Same Slug

For a ten-language site, run a small consistency script before review. This catches the common mistake where the canonical file is updated but one translation keeps the old hero image or missing updatedDate.

// tools/check-locales.mjs
import fs from "node:fs";
import path from "node:path";
import matter from "gray-matter";

const slug = "claude-code-markdown-processing.mdx";
const expectedHero = "/images/hero/hero-077.png";
const locales = [
  ["ja", "site/src/content/blog"],
  ["en", "site/src/content/blog-en"],
  ["zh", "site/src/content/blog-zh"],
  ["ko", "site/src/content/blog-ko"],
  ["es", "site/src/content/blog-es"],
  ["fr", "site/src/content/blog-fr"],
  ["de", "site/src/content/blog-de"],
  ["pt", "site/src/content/blog-pt"],
  ["hi", "site/src/content/blog-hi"],
  ["id", "site/src/content/blog-id"],
];

const errors = [];

for (const [lang, dir] of locales) {
  const file = path.join(dir, slug);
  const source = fs.readFileSync(file, "utf8");
  const { data, content } = matter(source);
  if (data.lang !== lang) errors.push(`${lang}: lang mismatch`);
  if (data.heroImage !== expectedHero) errors.push(`${lang}: hero changed`);
  if (data.updatedDate !== "2026-06-02") {
    errors.push(`${lang}: updatedDate mismatch`);
  }
  if ([...String(data.description ?? "")].length > 120) {
    errors.push(`${lang}: description too long`);
  }
  if (!content.includes("https://")) errors.push(`${lang}: no external link`);
  if (!content.includes("](/")) errors.push(`${lang}: no internal link`);
}

if (errors.length > 0) {
  console.error(errors.map((error) => `- ${error}`).join("\n"));
  process.exit(1);
}

console.log("locale set is consistent");

Failure Modes to Show Claude Code Up Front

FailureResultGuardrail
Regex reads headingsCode fence text enters the table of contentsTraverse heading nodes
tags becomes a stringFilters and related posts breakValidate frontmatter types
Duplicate headingsAnchor links point to the wrong sectionGenerate slugs consistently
Raw HTML is trustedXSS risk through attributes or tagsSanitize with a schema
External links are not checkedOfficial docs move silentlyProbe before publishing
Prompt scope is broadOther workers’ files are modifiedLock owned_files

The failure examples matter because Claude Code responds better to explicit constraints than to taste-based review comments. “Make it high quality” is weak. “Do not parse headings with regex, preserve heroImage, keep description under 120 characters, and do not touch other slugs” is actionable.

Safe Prompt Contract

Use a prompt contract like this when refreshing published content in a busy repository.

task: "Refresh one published MDX article"
owned_files:
  - "site/src/content/blog-en/claude-code-markdown-processing.mdx"
preserve:
  - "slug path"
  - "heroImage"
  - "unrelated dirty files"
required:
  - "updatedDate: 2026-06-02"
  - "description <= 120 characters"
  - "AST-based Markdown checks"
  - "official external links"
  - "internal links"
  - "monetization CTA"
forbidden:
  - "regex-only heading parsing"
  - "raw HTML without sanitization"
  - "thin locale summaries"
verification:
  - "node scripts/check-code-fences.mjs"
  - "node scripts/check-updated-article-quality.mjs"

Publish Checks and CTA

Before publishing, run both local scripts and a human pass. The machine pass checks fences, metadata, links, and article depth. The human pass checks whether the examples fit the reader, whether paragraphs are short enough on mobile, and whether the CTA is natural.

node tools/audit-mdx.mjs site/src/content/blog-en/claude-code-markdown-processing.mdx
node tools/check-locales.mjs
node scripts/check-code-fences.mjs
node scripts/check-updated-article-quality.mjs

For monetization, keep the next step contextual. Individual users can start with the free Claude Code cheatsheet. Readers who want repeatable review and writing prompts can use Claude Code prompt templates. Teams that need permissions, CI checks, locale workflow, and review habits should use Claude Code training and consultation.

Hands-On Verification Note

For this refresh, Masa treated the article as a real content pipeline problem rather than a prose-only rewrite. The most useful checks were: description length, updatedDate, hero image preservation, code-fence languages, official external links, locale depth, and internal CTA links. The AST-based audit caught the category of mistakes that regex would miss, especially headings inside code blocks and MDX syntax near components. The final local commands were node scripts/check-code-fences.mjs and node scripts/check-updated-article-quality.mjs. The main lesson is that Claude Code becomes reliable when the article contract is executable, not when the prompt merely asks for better writing.

#Claude Code #Markdown #MDX #remark #content-ops
Free

Free PDF: Claude Code Cheatsheet

Enter your email and download the one-page Claude Code cheatsheet for commands, review habits, and safe workflows.

We handle your data with care and never send spam.

Level up your Claude Code workflow

Start with the free PDF, use Gumroad guides when you need repeatable workflows, and book consultation when rollout or revenue paths need human judgment.

Masa

About the Author

Masa

Engineer focused on practical Claude Code workflows. Runs claudecode-lab.com, a 10-language technical media site.