Generate XML Sitemaps with Claude Code

A sitemap is a crawlable inventory, not an indexing guarantee

When Claude Code helps you publish many articles, docs, or product pages, the weak point is often not the page template. It is the list of URLs that search engines can reliably discover. A good XML sitemap tells Google which canonical URLs matter, when they were meaningfully updated, and how translated versions relate to each other.

The important word is “meaningfully.” Google’s current sitemap guidance says it ignores priority and changefreq, and uses lastmod only when the value is consistently accurate. Google also deprecated the old sitemap ping endpoint, so modern workflows should rely on robots.txt, Search Console submission, and verification checks instead of https://www.google.com/ping?sitemap=....

This guide shows two practical paths: Astro’s official sitemap integration and a dependency-free Node.js generator for multilingual content collections. For a broader SEO workflow, pair this with Claude Code SEO optimization and your deployment checks in Claude Code CI/CD setup.

What to follow from the official specs

Rule	Practical decision
Canonical URLs	Include absolute URLs such as `https://example.com/blog/post/`, not relative paths
File limits	Split after 50,000 URLs or 50 MB uncompressed
Encoding	Save as UTF-8 and XML-escape URL values
`lastmod`	Use the real date of a significant content, structured data, or link change
`priority` / `changefreq`	Safe to omit for Google because Google ignores them
Multilingual pages	Use reciprocal `hreflang` entries, including self references
Submission	Use `robots.txt` and Google Search Console; do not keep ping scripts

Sources worth bookmarking are Google’s sitemap guide, Google’s sitemap ping deprecation note, Google’s localized versions guide, and sitemaps.org.

Use case 1: Astro pages and blog routes

For a typical Astro site, start with the official integration. It generates sitemap files during astro build and can add localized URL relationships when your route structure is predictable.

npx astro add sitemap

// astro.config.mjs
import { defineConfig } from 'astro/config';
import sitemap from '@astrojs/sitemap';

export default defineConfig({
  site: 'https://claudecodelab.com',
  integrations: [
    sitemap({
      filter: (page) => !page.includes('/draft/') && !page.includes('/preview/'),
      i18n: {
        defaultLocale: 'ja',
        locales: {
          ja: 'ja',
          en: 'en',
          zh: 'zh-CN',
          ko: 'ko',
          es: 'es',
          fr: 'fr',
          de: 'de',
          pt: 'pt-BR',
          hi: 'hi',
          id: 'id',
        },
      },
    }),
  ],
});

The common failure here is a wrong site value. Do not leave localhost, a preview domain, or mixed http and https URLs in the generated sitemap. Google crawls the URLs exactly as listed, so your sitemap should match the canonical URLs you expect to rank.

Use case 2: Node generator for multilingual MDX content

Use a custom generator when content lives in collections such as blog, blog-en, and blog-zh, or when you need updatedDate to become lastmod. The following script uses only Node.js built-ins and writes public/sitemap.xml.

// scripts/generate-sitemap.mjs
import { mkdir, readdir, readFile, stat, writeFile } from 'node:fs/promises';
import path from 'node:path';

const SITE_URL = (process.env.SITE_URL ?? 'https://example.com').replace(/\/$/, '');
const OUT_DIR = 'public';
const OUT_FILE = path.join(OUT_DIR, 'sitemap.xml');

const collections = [
  { dir: 'site/src/content/blog', prefix: '/blog', hreflang: 'ja' },
  { dir: 'site/src/content/blog-en', prefix: '/en/blog', hreflang: 'en' },
  { dir: 'site/src/content/blog-zh', prefix: '/zh/blog', hreflang: 'zh-CN' },
  { dir: 'site/src/content/blog-ko', prefix: '/ko/blog', hreflang: 'ko' },
  { dir: 'site/src/content/blog-es', prefix: '/es/blog', hreflang: 'es' },
  { dir: 'site/src/content/blog-fr', prefix: '/fr/blog', hreflang: 'fr' },
  { dir: 'site/src/content/blog-de', prefix: '/de/blog', hreflang: 'de' },
  { dir: 'site/src/content/blog-pt', prefix: '/pt/blog', hreflang: 'pt-BR' },
  { dir: 'site/src/content/blog-hi', prefix: '/hi/blog', hreflang: 'hi' },
  { dir: 'site/src/content/blog-id', prefix: '/id/blog', hreflang: 'id' },
];

function escapeXml(value) {
  return String(value).replace(/[<>&'"]/g, (char) => ({
    '<': '&lt;',
    '>': '&gt;',
    '&': '&amp;',
    "'": '&apos;',
    '"': '&quot;',
  })[char]);
}

async function* walk(dir) {
  let items;
  try {
    items = await readdir(dir, { withFileTypes: true });
  } catch (error) {
    if (error.code === 'ENOENT') return;
    throw error;
  }

  for (const item of items) {
    const fullPath = path.join(dir, item.name);
    if (item.isDirectory()) {
      yield* walk(fullPath);
    } else if (/\.(md|mdx)$/.test(item.name)) {
      yield fullPath;
    }
  }
}

function frontmatterOf(source) {
  return source.match(/^---\n([\s\S]*?)\n---/)?.[1] ?? '';
}

function dateField(frontmatter, key) {
  return frontmatter.match(new RegExp(`^${key}:\\s*["']?(\\d{4}-\\d{2}-\\d{2})`, 'm'))?.[1];
}

function routeSlug(collectionDir, filePath) {
  return path
    .relative(collectionDir, filePath)
    .replace(/\\/g, '/')
    .replace(/\.(md|mdx)$/, '')
    .replace(/\/index$/, '');
}

function encodeRoute(slug) {
  return slug.split('/').map(encodeURIComponent).join('/');
}

async function collectEntries() {
  const bySlug = new Map();

  for (const collection of collections) {
    for await (const filePath of walk(collection.dir)) {
      const source = await readFile(filePath, 'utf8');
      const frontmatter = frontmatterOf(source);
      if (/^draft:\s*true\s*$/m.test(frontmatter)) continue;

      const info = await stat(filePath);
      const slug = routeSlug(collection.dir, filePath);
      const lastmod =
        dateField(frontmatter, 'updatedDate') ??
        dateField(frontmatter, 'pubDate') ??
        info.mtime.toISOString().slice(0, 10);

      const route = `${collection.prefix}/${encodeRoute(slug)}/`;
      const variant = {
        loc: `${SITE_URL}${route}`,
        hreflang: collection.hreflang,
        lastmod,
      };

      const variants = bySlug.get(slug) ?? [];
      variants.push(variant);
      bySlug.set(slug, variants);
    }
  }

  return [...bySlug.values()].flatMap((variants) =>
    variants.map((variant) => ({
      ...variant,
      alternates: variants.map(({ hreflang, loc }) => ({ hreflang, loc })),
    })),
  );
}

function buildSitemap(entries) {
  const urls = entries.map((entry) => `  <url>
    <loc>${escapeXml(entry.loc)}</loc>
    <lastmod>${entry.lastmod}</lastmod>
${entry.alternates.map((alt) => `    <xhtml:link rel="alternate" hreflang="${escapeXml(alt.hreflang)}" href="${escapeXml(alt.loc)}" />`).join('\n')}
  </url>`).join('\n');

  return `<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
        xmlns:xhtml="http://www.w3.org/1999/xhtml">
${urls}
</urlset>
`;
}

const entries = await collectEntries();
if (entries.length === 0) {
  throw new Error('No public URLs were found for the sitemap.');
}

await mkdir(OUT_DIR, { recursive: true });
await writeFile(OUT_FILE, buildSitemap(entries), 'utf8');
console.log(`Wrote ${entries.length} URLs to ${OUT_FILE}.`);

Run it after build or as a dedicated command:

SITE_URL=https://claudecodelab.com node scripts/generate-sitemap.mjs

Use case 3: Large sites and split sitemaps

A single sitemap file is fine for a small blog. Larger sites should split by content type or by chunks. This keeps files under the official limit and makes Search Console debugging easier.

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://example.com/sitemap-pages.xml</loc>
    <lastmod>2026-06-03</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://example.com/sitemap-blog.xml</loc>
    <lastmod>2026-06-03</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://example.com/sitemap-products.xml</loc>
    <lastmod>2026-06-03</lastmod>
  </sitemap>
</sitemapindex>

Ask Claude Code to log URL counts per file and fail the job before any file crosses 50,000 URLs. For commerce, course, or documentation sites, splitting pages, articles, and products also helps you see which section has crawl or indexing problems.

robots.txt, Search Console, and verification

Add the sitemap or sitemap index to robots.txt:

User-agent: *
Allow: /

Sitemap: https://claudecodelab.com/sitemap.xml

Then submit it once in Google Search Console. For deployment checks, verify that the public URL returns HTTP 200 and looks like a sitemap.

// scripts/verify-sitemap.mjs
const sitemapUrl = process.env.SITEMAP_URL ?? 'https://example.com/sitemap.xml';
const response = await fetch(sitemapUrl);

if (!response.ok) {
  throw new Error(`Sitemap request failed: HTTP ${response.status}`);
}

const xml = await response.text();
if (!xml.includes('<urlset') && !xml.includes('<sitemapindex')) {
  throw new Error('The response does not look like a sitemap XML file.');
}

console.log(`Verified ${sitemapUrl}. Size: ${xml.length} bytes`);

Pitfalls to catch before publishing

The biggest mistake is setting every lastmod to the build date. That makes the file look fresh while giving Google unreliable update signals. Use updatedDate or the real content modification date.

Another common mistake is including drafts, noindex pages, redirect sources, or duplicate URLs. A sitemap should list the canonical URLs you want in search results. It should not be a dump of every route your app can render.

For multilingual sites, missing return links are easy to miss. Each language version should list itself and every alternate version. If Japanese points to English, English should point back to Japanese with the same cluster.

Finally, escape XML values. Query strings with & must become &. The Node example includes escapeXml() for that reason.

Monetization and the final check

A sitemap will not monetize a site by itself, but it protects the discovery layer for pages that do monetize: tutorials, product comparisons, lead magnets, and consultation pages. After you fix the sitemap, review internal links and CTAs so readers can move naturally from free articles to Claude Code training or related resources.

In Masa’s ClaudeCodeLab workflow, the practical result was clearest after removing stale ping code, tying lastmod to updatedDate, and grouping ten locale versions with reciprocal hreflang. Review became simpler because the sitemap reflected the same dates and slugs that editors checked in MDX frontmatter.

Generate XML Sitemaps with Claude Code

A sitemap is a crawlable inventory, not an indexing guarantee

What to follow from the official specs

Use case 1: Astro pages and blog routes

Use case 2: Node generator for multilingual MDX content

Use case 3: Large sites and split sitemaps

robots.txt, Search Console, and verification

Pitfalls to catch before publishing

Monetization and the final check

Free PDF: Claude Code Cheatsheet

Level up your Claude Code workflow

Related Posts

Claude Code Obsidian to CLAUDE.md Workflow: Stop Re-explaining Context

Claude Code Revenue CTA Routing: Send Articles to PDF, Gumroad, and Consultation

Claude Code Team Handoff Rules: Review Evidence, Permissions, Rollback, and Revenue Paths

Related Products

50 Battle-Tested Claude Code Prompt Templates

The Complete Claude Code Setup & Configuration Guide