Generate XML Sitemaps with Claude Code
Build Astro and Node sitemap generation with hreflang, lastmod, robots.txt, and Search Console checks.
A sitemap is a crawlable inventory, not an indexing guarantee
When Claude Code helps you publish many articles, docs, or product pages, the weak point is often not the page template. It is the list of URLs that search engines can reliably discover. A good XML sitemap tells Google which canonical URLs matter, when they were meaningfully updated, and how translated versions relate to each other.
The important word is “meaningfully.” Google’s current sitemap guidance says it ignores priority and changefreq, and uses lastmod only when the value is consistently accurate. Google also deprecated the old sitemap ping endpoint, so modern workflows should rely on robots.txt, Search Console submission, and verification checks instead of https://www.google.com/ping?sitemap=....
This guide shows two practical paths: Astro’s official sitemap integration and a dependency-free Node.js generator for multilingual content collections. For a broader SEO workflow, pair this with Claude Code SEO optimization and your deployment checks in Claude Code CI/CD setup.
What to follow from the official specs
| Rule | Practical decision |
|---|---|
| Canonical URLs | Include absolute URLs such as https://example.com/blog/post/, not relative paths |
| File limits | Split after 50,000 URLs or 50 MB uncompressed |
| Encoding | Save as UTF-8 and XML-escape URL values |
lastmod | Use the real date of a significant content, structured data, or link change |
priority / changefreq | Safe to omit for Google because Google ignores them |
| Multilingual pages | Use reciprocal hreflang entries, including self references |
| Submission | Use robots.txt and Google Search Console; do not keep ping scripts |
Sources worth bookmarking are Google’s sitemap guide, Google’s sitemap ping deprecation note, Google’s localized versions guide, and sitemaps.org.
Use case 1: Astro pages and blog routes
For a typical Astro site, start with the official integration. It generates sitemap files during astro build and can add localized URL relationships when your route structure is predictable.
npx astro add sitemap
// astro.config.mjs
import { defineConfig } from 'astro/config';
import sitemap from '@astrojs/sitemap';
export default defineConfig({
site: 'https://claudecodelab.com',
integrations: [
sitemap({
filter: (page) => !page.includes('/draft/') && !page.includes('/preview/'),
i18n: {
defaultLocale: 'ja',
locales: {
ja: 'ja',
en: 'en',
zh: 'zh-CN',
ko: 'ko',
es: 'es',
fr: 'fr',
de: 'de',
pt: 'pt-BR',
hi: 'hi',
id: 'id',
},
},
}),
],
});
The common failure here is a wrong site value. Do not leave localhost, a preview domain, or mixed http and https URLs in the generated sitemap. Google crawls the URLs exactly as listed, so your sitemap should match the canonical URLs you expect to rank.
Use case 2: Node generator for multilingual MDX content
Use a custom generator when content lives in collections such as blog, blog-en, and blog-zh, or when you need updatedDate to become lastmod. The following script uses only Node.js built-ins and writes public/sitemap.xml.
// scripts/generate-sitemap.mjs
import { mkdir, readdir, readFile, stat, writeFile } from 'node:fs/promises';
import path from 'node:path';
const SITE_URL = (process.env.SITE_URL ?? 'https://example.com').replace(/\/$/, '');
const OUT_DIR = 'public';
const OUT_FILE = path.join(OUT_DIR, 'sitemap.xml');
const collections = [
{ dir: 'site/src/content/blog', prefix: '/blog', hreflang: 'ja' },
{ dir: 'site/src/content/blog-en', prefix: '/en/blog', hreflang: 'en' },
{ dir: 'site/src/content/blog-zh', prefix: '/zh/blog', hreflang: 'zh-CN' },
{ dir: 'site/src/content/blog-ko', prefix: '/ko/blog', hreflang: 'ko' },
{ dir: 'site/src/content/blog-es', prefix: '/es/blog', hreflang: 'es' },
{ dir: 'site/src/content/blog-fr', prefix: '/fr/blog', hreflang: 'fr' },
{ dir: 'site/src/content/blog-de', prefix: '/de/blog', hreflang: 'de' },
{ dir: 'site/src/content/blog-pt', prefix: '/pt/blog', hreflang: 'pt-BR' },
{ dir: 'site/src/content/blog-hi', prefix: '/hi/blog', hreflang: 'hi' },
{ dir: 'site/src/content/blog-id', prefix: '/id/blog', hreflang: 'id' },
];
function escapeXml(value) {
return String(value).replace(/[<>&'"]/g, (char) => ({
'<': '<',
'>': '>',
'&': '&',
"'": ''',
'"': '"',
})[char]);
}
async function* walk(dir) {
let items;
try {
items = await readdir(dir, { withFileTypes: true });
} catch (error) {
if (error.code === 'ENOENT') return;
throw error;
}
for (const item of items) {
const fullPath = path.join(dir, item.name);
if (item.isDirectory()) {
yield* walk(fullPath);
} else if (/\.(md|mdx)$/.test(item.name)) {
yield fullPath;
}
}
}
function frontmatterOf(source) {
return source.match(/^---\n([\s\S]*?)\n---/)?.[1] ?? '';
}
function dateField(frontmatter, key) {
return frontmatter.match(new RegExp(`^${key}:\\s*["']?(\\d{4}-\\d{2}-\\d{2})`, 'm'))?.[1];
}
function routeSlug(collectionDir, filePath) {
return path
.relative(collectionDir, filePath)
.replace(/\\/g, '/')
.replace(/\.(md|mdx)$/, '')
.replace(/\/index$/, '');
}
function encodeRoute(slug) {
return slug.split('/').map(encodeURIComponent).join('/');
}
async function collectEntries() {
const bySlug = new Map();
for (const collection of collections) {
for await (const filePath of walk(collection.dir)) {
const source = await readFile(filePath, 'utf8');
const frontmatter = frontmatterOf(source);
if (/^draft:\s*true\s*$/m.test(frontmatter)) continue;
const info = await stat(filePath);
const slug = routeSlug(collection.dir, filePath);
const lastmod =
dateField(frontmatter, 'updatedDate') ??
dateField(frontmatter, 'pubDate') ??
info.mtime.toISOString().slice(0, 10);
const route = `${collection.prefix}/${encodeRoute(slug)}/`;
const variant = {
loc: `${SITE_URL}${route}`,
hreflang: collection.hreflang,
lastmod,
};
const variants = bySlug.get(slug) ?? [];
variants.push(variant);
bySlug.set(slug, variants);
}
}
return [...bySlug.values()].flatMap((variants) =>
variants.map((variant) => ({
...variant,
alternates: variants.map(({ hreflang, loc }) => ({ hreflang, loc })),
})),
);
}
function buildSitemap(entries) {
const urls = entries.map((entry) => ` <url>
<loc>${escapeXml(entry.loc)}</loc>
<lastmod>${entry.lastmod}</lastmod>
${entry.alternates.map((alt) => ` <xhtml:link rel="alternate" hreflang="${escapeXml(alt.hreflang)}" href="${escapeXml(alt.loc)}" />`).join('\n')}
</url>`).join('\n');
return `<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xhtml="http://www.w3.org/1999/xhtml">
${urls}
</urlset>
`;
}
const entries = await collectEntries();
if (entries.length === 0) {
throw new Error('No public URLs were found for the sitemap.');
}
await mkdir(OUT_DIR, { recursive: true });
await writeFile(OUT_FILE, buildSitemap(entries), 'utf8');
console.log(`Wrote ${entries.length} URLs to ${OUT_FILE}.`);
Run it after build or as a dedicated command:
SITE_URL=https://claudecodelab.com node scripts/generate-sitemap.mjs
Use case 3: Large sites and split sitemaps
A single sitemap file is fine for a small blog. Larger sites should split by content type or by chunks. This keeps files under the official limit and makes Search Console debugging easier.
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://example.com/sitemap-pages.xml</loc>
<lastmod>2026-06-03</lastmod>
</sitemap>
<sitemap>
<loc>https://example.com/sitemap-blog.xml</loc>
<lastmod>2026-06-03</lastmod>
</sitemap>
<sitemap>
<loc>https://example.com/sitemap-products.xml</loc>
<lastmod>2026-06-03</lastmod>
</sitemap>
</sitemapindex>
Ask Claude Code to log URL counts per file and fail the job before any file crosses 50,000 URLs. For commerce, course, or documentation sites, splitting pages, articles, and products also helps you see which section has crawl or indexing problems.
robots.txt, Search Console, and verification
Add the sitemap or sitemap index to robots.txt:
User-agent: *
Allow: /
Sitemap: https://claudecodelab.com/sitemap.xml
Then submit it once in Google Search Console. For deployment checks, verify that the public URL returns HTTP 200 and looks like a sitemap.
// scripts/verify-sitemap.mjs
const sitemapUrl = process.env.SITEMAP_URL ?? 'https://example.com/sitemap.xml';
const response = await fetch(sitemapUrl);
if (!response.ok) {
throw new Error(`Sitemap request failed: HTTP ${response.status}`);
}
const xml = await response.text();
if (!xml.includes('<urlset') && !xml.includes('<sitemapindex')) {
throw new Error('The response does not look like a sitemap XML file.');
}
console.log(`Verified ${sitemapUrl}. Size: ${xml.length} bytes`);
Pitfalls to catch before publishing
The biggest mistake is setting every lastmod to the build date. That makes the file look fresh while giving Google unreliable update signals. Use updatedDate or the real content modification date.
Another common mistake is including drafts, noindex pages, redirect sources, or duplicate URLs. A sitemap should list the canonical URLs you want in search results. It should not be a dump of every route your app can render.
For multilingual sites, missing return links are easy to miss. Each language version should list itself and every alternate version. If Japanese points to English, English should point back to Japanese with the same cluster.
Finally, escape XML values. Query strings with & must become &. The Node example includes escapeXml() for that reason.
Monetization and the final check
A sitemap will not monetize a site by itself, but it protects the discovery layer for pages that do monetize: tutorials, product comparisons, lead magnets, and consultation pages. After you fix the sitemap, review internal links and CTAs so readers can move naturally from free articles to Claude Code training or related resources.
In Masa’s ClaudeCodeLab workflow, the practical result was clearest after removing stale ping code, tying lastmod to updatedDate, and grouping ten locale versions with reciprocal hreflang. Review became simpler because the sitemap reflected the same dates and slugs that editors checked in MDX frontmatter.
Free PDF: Claude Code Cheatsheet
Enter your email and download the one-page Claude Code cheatsheet for commands, review habits, and safe workflows.
We handle your data with care and never send spam.
Level up your Claude Code workflow
Start with the free PDF, use Gumroad guides when you need repeatable workflows, and book consultation when rollout or revenue paths need human judgment.
About the Author
Masa
Engineer focused on practical Claude Code workflows. Runs claudecode-lab.com, a 10-language technical media site.
Related Posts
Claude Code Obsidian to CLAUDE.md Workflow: Stop Re-explaining Context
Turn Obsidian working notes into concise CLAUDE.md operating notes that make Claude Code sessions easier to resume.
Claude Code Revenue CTA Routing: Send Articles to PDF, Gumroad, and Consultation
A Claude Code workflow for routing article readers to the free PDF, Gumroad products, or consultation by intent.
Claude Code Team Handoff Rules: Review Evidence, Permissions, Rollback, and Revenue Paths
A practical Claude Code handoff format for team review, proof, permission rules, rollback, free PDF, Gumroad, and consultation paths.
Related Products
50 Battle-Tested Claude Code Prompt Templates
Copy, paste, ship. 50 production-ready prompts.
Use proven prompts for code review, refactoring, testing, documentation, debugging, architecture, and incident response.
The Complete Claude Code Setup & Configuration Guide
From install to team-ready workflow.
A practical guide to installation, CLAUDE.md, hooks, MCP servers, permissions, IDE setup, and CI/CD workflows.