Claude Code के साथ Markdown/MDX को सुरक्षित प्रोसेस करने की गाइड

Markdown को सिर्फ text समझना क्यों गलत है

Published Markdown या MDX article केवल paragraphs नहीं होता। उसमें frontmatter, SEO description, heading hierarchy, generated anchors, code fences, internal links, official external links, locale routes और कभी-कभी raw HTML भी होता है। अगर आप Claude Code से सिर्फ “इस article को बेहतर बना दो” कहते हैं, तो prose अच्छा हो सकता है लेकिन slug बदल सकता है, CTA हट सकता है, hero image बदल सकती है, या किसी एक language file में सिर्फ पतला summary रह सकता है।

सुरक्षित तरीका यह है कि writing और verification अलग रखें। Claude Code content rewrite और localization करे, लेकिन structure scripts से check हो। Markdown/MDX को AST, यानी abstract syntax tree, के रूप में पढ़ें। frontmatter को data की तरह validate करें। HTML output में XSS boundary साफ रखें। और सभी locale files को एक set की तरह check करें।

मैंने मुख्य references 2 जून 2026 को verify किए। unified guide parse, transform और stringify pipeline बताता है। syntax trees guide समझाता है कि AST raw line matching से सुरक्षित क्यों है। Markdown के लिए remark और remark-parse देखें। MDX के लिए MDX docs official source है। frontmatter के लिए gray-matter उपयोगी है। HTML safety के लिए rehype-sanitize और OWASP XSS Prevention Cheat Sheet साथ पढ़ें। Claude Code scope समझने के लिए Claude Code overview और settings देखें।

flowchart LR
  A["MDX file"] --> B["frontmatter"]
  B --> C["schema validation"]
  A --> D["remark / MDX AST"]
  D --> E["headings, fences, links"]
  D --> F["rehype pipeline"]
  F --> G["sanitize"]
  C --> H["locale and build checks"]
  E --> H
  G --> H

पहले parser चुनें, फिर Claude Code से edit कराएं

Claude Code को पहली instruction toolchain के बारे में होनी चाहिए। “Markdown parse करो” बहुत vague है। छोटे demo में regex ठीक लग सकती है, लेकिन real article में यह code block के अंदर के fake heading को भी heading मान लेती है।

जरूरत	बेहतर चुनाव	जोखिम वाला shortcut
headings, links, code fences पढ़ना	`remark-parse` और AST traversal	raw text पर `^##` regex
`.mdx` में JSX support	`remark-mdx` या MDX compiler	केवल Markdown parser
HTML output बनाना	`remark-rehype` से rehype pipeline	string जोड़कर HTML बनाना
raw HTML allow करना	`rehype-raw` के बाद `rehype-sanitize`	सिर्फ `allowDangerousHtml`
frontmatter पढ़ना	`gray-matter` और schema checks	YAML को manual split करना

AST meaning को अलग करता है। code fence में लिखा ## fake heading table of contents में नहीं जाना चाहिए। MDX component prop में URL हो तो वह editorial link नहीं भी हो सकता। tags: Claude Code, Markdown YAML में string है, array नहीं। ऐसे errors parser और schema से जल्दी पकड़ में आते हैं।

4 practical use cases

पहला use case published blog refresh है। इसमें title, description, updatedDate, official links, internal links, code examples और CTA साथ update होते हैं। ClaudeCodeLab में यह flow CLAUDE.md best practices और Claude Code web scraping जैसे internal articles से जुड़ता है, लेकिन unrelated slug नहीं छूना चाहिए।

दूसरा use case docs site में MDX components है। Callouts, tabs, pricing cards, FAQ और live examples अच्छे हैं, लेकिन Markdown और JSX mix होने से regex-based checker fragile हो जाता है।

तीसरा use case multilingual publishing है। Japanese canonical article strong हो लेकिन Hindi, English, Spanish या Indonesian सिर्फ summary बन जाएं, तो local reader और SEO दोनों कमजोर होते हैं। हर locale में examples, pitfalls, runnable snippets, official links, internal links, CTA और verification note होना चाहिए।

चौथा use case commercial content ops है। Gumroad pages, training pages, free PDF delivery और email resources अक्सर Markdown reuse करते हैं। जहां purchase या consultation नजदीक हो, वहां code fences, links और HTML safety directly trust से जुड़े हैं।

Copy-paste setup

नीचे के snippets Node.js 18+ और ESM modules मानते हैं। पहले demo folder में चलाएं, फिर repo में शामिल करें।

mkdir mdx-audit-demo
cd mdx-audit-demo
npm init -y
npm pkg set type=module
npm install unified remark-parse remark-mdx remark-gfm gray-matter
npm install unist-util-visit github-slugger
npm install remark-rehype rehype-raw rehype-sanitize rehype-stringify
mkdir tools

Example 1: frontmatter, headings, fences और links audit

यह script gray-matter से frontmatter पढ़ती है और remark plus MDX parser से body को AST बनाती है। यह required fields, description length, code fence language, internal links और external links check करती है।

// tools/audit-mdx.mjs
import fs from "node:fs/promises";
import matter from "gray-matter";
import GithubSlugger from "github-slugger";
import { unified } from "unified";
import remarkParse from "remark-parse";
import remarkMdx from "remark-mdx";
import remarkGfm from "remark-gfm";
import { visit } from "unist-util-visit";

const file = process.argv[2];
if (!file) {
  throw new Error("Usage: node tools/audit-mdx.mjs article.mdx");
}

const source = await fs.readFile(file, "utf8");
const { data, content } = matter(source);
const errors = [];
const links = { internal: [], external: [] };
const headings = [];
const codeBlocks = [];

for (const key of ["title", "description", "pubDate", "heroImage", "lang"]) {
  if (typeof data[key] !== "string" || data[key].trim() === "") {
    errors.push(`frontmatter.${key} is required`);
  }
}

if ([...String(data.description ?? "")].length > 120) {
  errors.push("description must be 120 characters or fewer");
}

if (!Array.isArray(data.tags) || data.tags.length === 0) {
  errors.push("frontmatter.tags must be a non-empty array");
}

const tree = unified()
  .use(remarkParse)
  .use(remarkMdx)
  .use(remarkGfm)
  .parse(content);

const slugger = new GithubSlugger();

visit(tree, (node) => {
  if (node.type === "heading") {
    const text = plainText(node);
    headings.push({ depth: node.depth, text, slug: slugger.slug(text) });
  }

  if (node.type === "code") {
    codeBlocks.push({ lang: node.lang || "", meta: node.meta || "" });
    if (!node.lang) errors.push("code fence is missing a language");
  }

  if (node.type === "link") {
    const url = String(node.url || "");
    if (url.startsWith("http")) links.external.push(url);
    if (url.startsWith("/")) links.internal.push(url);
  }
});

if (links.internal.length === 0) errors.push("missing internal link");
if (links.external.length === 0) errors.push("missing external link");

if (errors.length > 0) {
  console.error(errors.map((error) => `- ${error}`).join("\n"));
  process.exit(1);
}

console.log(JSON.stringify({ headings, codeBlocks, links }, null, 2));

function plainText(node) {
  if (typeof node.value === "string") return node.value;
  if (!Array.isArray(node.children)) return "";
  return node.children.map(plainText).join("");
}

node tools/audit-mdx.mjs site/src/content/blog-hi/example.mdx

Example 2: Markdown को safe HTML में बदलना

अगर raw HTML की जरूरत नहीं है, तो उसे enable न करें। अगर product में raw HTML जरूरी है, तो parse के बाद तुरंत sanitize करें। केवल allowDangerousHtml security नहीं है।

// tools/markdown-to-safe-html.mjs
import fs from "node:fs/promises";
import { unified } from "unified";
import remarkParse from "remark-parse";
import remarkGfm from "remark-gfm";
import remarkRehype from "remark-rehype";
import rehypeRaw from "rehype-raw";
import rehypeSanitize, { defaultSchema } from "rehype-sanitize";
import rehypeStringify from "rehype-stringify";

const file = process.argv[2];
const markdown = await fs.readFile(file, "utf8");
const schema = {
  ...defaultSchema,
  attributes: {
    ...defaultSchema.attributes,
    code: [["className", /^language-/]],
  },
};

const html = await unified()
  .use(remarkParse)
  .use(remarkGfm)
  .use(remarkRehype, { allowDangerousHtml: true })
  .use(rehypeRaw)
  .use(rehypeSanitize, schema)
  .use(rehypeStringify)
  .process(markdown);

console.log(String(html));

Order जरूरी है। rehype-raw raw HTML को HTML tree में वापस लाता है। rehype-sanitize unsafe tags और attributes हटाता है। दूसरे step के बिना risky content rendered DOM तक जा सकता है।

Example 3: सभी 10 locale files check करना

यह script same slug को सभी locales में check करता है, heroImage preserve है या नहीं देखता है, updatedDate और description length verify करता है, और internal plus external links की presence देखता है।

// tools/check-locales.mjs
import fs from "node:fs";
import path from "node:path";
import matter from "gray-matter";

const slug = "claude-code-markdown-processing.mdx";
const expectedHero = "/images/hero/hero-077.png";
const locales = [
  ["ja", "site/src/content/blog"],
  ["en", "site/src/content/blog-en"],
  ["zh", "site/src/content/blog-zh"],
  ["ko", "site/src/content/blog-ko"],
  ["es", "site/src/content/blog-es"],
  ["fr", "site/src/content/blog-fr"],
  ["de", "site/src/content/blog-de"],
  ["pt", "site/src/content/blog-pt"],
  ["hi", "site/src/content/blog-hi"],
  ["id", "site/src/content/blog-id"],
];

const errors = [];

for (const [lang, dir] of locales) {
  const file = path.join(dir, slug);
  const source = fs.readFileSync(file, "utf8");
  const { data, content } = matter(source);
  if (data.lang !== lang) errors.push(`${lang}: lang mismatch`);
  if (data.heroImage !== expectedHero) errors.push(`${lang}: hero changed`);
  if (data.updatedDate !== "2026-06-02") {
    errors.push(`${lang}: updatedDate mismatch`);
  }
  if ([...String(data.description ?? "")].length > 120) {
    errors.push(`${lang}: description too long`);
  }
  if (!content.includes("https://")) errors.push(`${lang}: no external link`);
  if (!content.includes("](/")) errors.push(`${lang}: no internal link`);
}

if (errors.length > 0) {
  console.error(errors.map((error) => `- ${error}`).join("\n"));
  process.exit(1);
}

console.log("locale set is consistent");

Concrete failure modes

Failure	Result	Guardrail
headings regex से पढ़ना	code block का fake heading TOC में जाता है	केवल `heading` nodes पढ़ें
`tags` string बनना	filters और related posts टूटते हैं	frontmatter type validate करें
slug generation अलग-अलग	anchor links locales में टूटते हैं	same slugger उपयोग करें
raw HTML trust करना	XSS risk page तक आता है	schema के साथ sanitize करें
external links न check करना	official docs move होने पर पता नहीं चलता	publish से पहले test करें
prompt scope broad रखना	दूसरे workers की files बदल सकती हैं	`owned_files` fix करें

इन failures को prompt में लिखें। “High quality बना दो” weak instruction है। “regex-only heading parsing मत करो, heroImage preserve करो, description 120 characters से कम रखो, raw HTML sanitize करो, unrelated slug मत छूओ” ज्यादा actionable है।

Claude Code के लिए safe prompt

task: "Refresh one published MDX article"
owned_files:
  - "site/src/content/blog-hi/claude-code-markdown-processing.mdx"
preserve:
  - "slug path"
  - "heroImage"
  - "unrelated dirty files"
required:
  - "updatedDate: 2026-06-02"
  - "description <= 120 characters"
  - "AST-based Markdown checks"
  - "official external links"
  - "internal links and monetization CTA"
forbidden:
  - "regex-only heading parsing"
  - "raw HTML without sanitization"
  - "thin locale summaries"
verification:
  - "node scripts/check-code-fences.mjs"
  - "node scripts/check-updated-article-quality.mjs"

Publish checks और CTA

Publish से पहले scripts और human review दोनों करें। Scripts metadata, code fences, links और body depth check करते हैं। Human review localization, mobile readability, search intent और CTA fit देखता है।

node tools/audit-mdx.mjs site/src/content/blog-hi/claude-code-markdown-processing.mdx
node tools/check-locales.mjs
node scripts/check-code-fences.mjs
node scripts/check-updated-article-quality.mjs

Individual workflow के लिए free Claude Code cheatsheet से शुरू करें। Repeatable review और writing prompts चाहिए तो Claude Code prompt templates उपयोग करें। Team permissions, CI, locale workflow और editorial review design करना हो तो Claude Code training and consultation देखें।

Practical verification result

इस refresh में Masa ने article को prose task नहीं, content pipeline task माना। Checks में description length, updatedDate, heroImage preservation, code fence languages, official links, locale depth और CTA शामिल थे। AST-based audit regex से छूटने वाले cases पकड़ता है, जैसे code block के अंदर headings और MDX components के पास syntax। Final local commands node scripts/check-code-fences.mjs और node scripts/check-updated-article-quality.mjs रहे। सबसे बड़ा lesson यह है कि Claude Code तब reliable होता है जब article contract executable हो, न कि सिर्फ prompt में “quality improve करो” लिखा हो।