Build a Design System with Claude Code: Design Tokens, Storybook, and CI
Use Claude Code for design tokens, React/TypeScript components, Storybook, accessibility, visual tests, and CI.
A design system is an operating model, not just a component gallery
When teams start a design system, they often rush to build buttons, cards, and forms. That is useful, but the real value is a repeatable way to change color, spacing, typography, states, reviews, and tests without breaking product screens.
Claude Code is a good fit for this work because it can read the existing codebase, edit several files, run Storybook and tests, and report the diff. It is not a replacement for product judgment, brand decisions, or final accessibility review. The best results come when you give it a bounded task and a clear review checklist.
This guide covers design tokens, React/TypeScript components, Storybook, accessibility, CI visual/a11y checks, realistic Figma integration boundaries, and the level of task detail that works well with Claude Code.
For related reading, see design token management with Claude Code, Storybook development with Claude Code, and accessibility work with Claude Code.
Target Architecture
The source of truth in this workflow is tokens.json. Figma remains essential for design work, but code needs a reviewable contract that CI can validate.
flowchart LR
Figma["Figma Variables"]
Tokens["tokens.json"]
Build["token build script"]
CSS["CSS variables"]
TS["TypeScript token map"]
Components["React components"]
Storybook["Storybook stories"]
CI["Visual and a11y CI"]
Figma -->|review input| Tokens
Tokens --> Build
Build --> CSS
Build --> TS
CSS --> Components
TS --> Components
Components --> Storybook
Storybook --> CI
Design tokens are named design decisions: colors, spacing, radii, typography, and component states stored as data. A component should avoid raw values like #2563eb when a semantic token such as action.background.primary would explain the purpose.
For current references, check the Design Tokens Community Group, Claude Code docs, Claude Code security guidance, Storybook accessibility testing, Storybook visual tests, Playwright accessibility testing, and the Figma REST API docs.
The Right Task Size for Claude Code
Claude Code performs best when the boundaries are explicit and the output can be verified. A vague prompt such as “make a design system” usually produces a large, hard-to-review diff. A scoped request such as “migrate only Button, keep the existing public API, add Storybook states, and run a11y tests” is much safer.
| Work area | Good Claude Code scope | Human decision |
|---|---|---|
| Tokens | Extract repeated colors and spacing from CSS | Brand meaning and token names |
| Components | Implement typed Button, Input, and Alert primitives | Public API and product semantics |
| Storybook | Add variants, states, and interaction stories | Which states matter in real workflows |
| Accessibility | Detect missing labels, focus issues, and axe violations | Final screen reader and UX judgment |
| CI | Add visual and a11y checks to pull requests | Failure policy and exception process |
Use a short project rule before asking Claude Code to edit files:
Design system task rules:
- Edit only src/components, src/styles, .storybook, tests, scripts, and tokens.json.
- Do not change brand colors without listing old and new token names.
- Every new component needs TypeScript props, keyboard behavior, Storybook stories, and a11y notes.
- Run npm run tokens:build, npm run test:storybook, npm run test:a11y, and npm run test:visual before reporting done.
- If focus behavior changes, include manual review steps.
Security is part of the design system workflow. Do not paste Figma tokens, npm tokens, CI secrets, or private customer screenshots into prompts. Keep Claude Code permissions narrow, review commands before approval, and treat large snapshot updates as human-approved changes.
Minimal Setup
This example assumes a React and TypeScript app with utility classes. Adapt the commands to your package manager if needed.
npm install class-variance-authority clsx tailwind-merge
npm install -D @storybook/react-vite @storybook/addon-a11y @storybook/test-runner @playwright/test @axe-core/playwright concurrently http-server wait-on
npx storybook init
npx playwright install chromium
Add scripts that make the workflow reproducible locally and in CI:
{
"scripts": {
"tokens:build": "node scripts/build-tokens.mjs",
"storybook": "storybook dev -p 6006",
"build-storybook": "storybook build",
"test:storybook": "test-storybook --url http://127.0.0.1:6006",
"test:a11y": "playwright test tests/a11y.spec.ts",
"test:visual": "playwright test tests/button.visual.spec.ts"
}
}
Make Design Tokens the Contract
Split tokens into primitive, semantic, and component layers. Primitive tokens store raw values, semantic tokens describe meaning, and component tokens capture UI-specific state.
{
"primitive": {
"color": {
"blue": {
"50": { "$type": "color", "$value": "#eff6ff" },
"600": { "$type": "color", "$value": "#2563eb" },
"700": { "$type": "color", "$value": "#1d4ed8" }
},
"gray": {
"50": { "$type": "color", "$value": "#f9fafb" },
"200": { "$type": "color", "$value": "#e5e7eb" },
"900": { "$type": "color", "$value": "#111827" }
},
"red": {
"600": { "$type": "color", "$value": "#dc2626" },
"700": { "$type": "color", "$value": "#b91c1c" }
},
"white": { "$type": "color", "$value": "#ffffff" }
},
"space": {
"2": { "$type": "dimension", "$value": "0.5rem" },
"3": { "$type": "dimension", "$value": "0.75rem" },
"4": { "$type": "dimension", "$value": "1rem" },
"6": { "$type": "dimension", "$value": "1.5rem" }
},
"radius": {
"md": { "$type": "dimension", "$value": "0.375rem" },
"lg": { "$type": "dimension", "$value": "0.5rem" }
}
},
"semantic": {
"color": {
"surface": { "$type": "color", "$value": "{primitive.color.white}" },
"text": { "$type": "color", "$value": "{primitive.color.gray.900}" },
"border": { "$type": "color", "$value": "{primitive.color.gray.200}" },
"focus": { "$type": "color", "$value": "{primitive.color.blue.600}" }
}
},
"component": {
"button": {
"primary": {
"background": { "$type": "color", "$value": "{primitive.color.blue.600}" },
"backgroundHover": { "$type": "color", "$value": "{primitive.color.blue.700}" },
"text": { "$type": "color", "$value": "{primitive.color.white}" }
},
"danger": {
"background": { "$type": "color", "$value": "{primitive.color.red.600}" },
"backgroundHover": { "$type": "color", "$value": "{primitive.color.red.700}" },
"text": { "$type": "color", "$value": "{primitive.color.white}" }
}
}
}
}
Generate CSS variables and a TypeScript token map from that file:
import { mkdirSync, readFileSync, writeFileSync } from "node:fs";
import { dirname } from "node:path";
const source = JSON.parse(readFileSync("tokens.json", "utf8"));
function getToken(path) {
const node = path.split(".").reduce((current, key) => current?.[key], source);
if (!node || typeof node.$value === "undefined") {
throw new Error(`Unknown token reference: ${path}`);
}
return node.$value;
}
function resolveValue(value) {
if (typeof value === "string" && value.startsWith("{") && value.endsWith("}")) {
return resolveValue(getToken(value.slice(1, -1)));
}
return value;
}
function walk(node, pathParts = [], result = {}) {
if (node && typeof node === "object" && typeof node.$value !== "undefined") {
result[pathParts.join("-")] = resolveValue(node.$value);
return result;
}
for (const [key, value] of Object.entries(node)) {
walk(value, [...pathParts, key], result);
}
return result;
}
const flat = walk(source);
const css = [
":root {",
...Object.entries(flat).map(([name, value]) => ` --${name}: ${value};`),
"}",
""
].join("\n");
mkdirSync(dirname("src/styles/tokens.css"), { recursive: true });
mkdirSync(dirname("src/tokens.ts"), { recursive: true });
writeFileSync("src/styles/tokens.css", css);
writeFileSync("src/tokens.ts", `export const tokens = ${JSON.stringify(flat, null, 2)} as const;\n`);
console.log(`Generated ${Object.keys(flat).length} tokens.`);
Ask Claude Code to extract candidates first, not to rewrite every UI at once. A good prompt is: “Find repeated raw colors and spacing values, map them to proposed tokens, and return a report before editing.”
Build Typed React Components
The component layer should be boring and predictable. This Button includes variants, sizes, loading state, disabled behavior, and visible focus treatment.
import { forwardRef, type ButtonHTMLAttributes } from "react";
import { cva, type VariantProps } from "class-variance-authority";
import { clsx, type ClassValue } from "clsx";
import { twMerge } from "tailwind-merge";
function cn(...inputs: ClassValue[]) {
return twMerge(clsx(inputs));
}
const buttonVariants = cva(
[
"inline-flex items-center justify-center gap-2 rounded-md font-medium",
"transition-colors focus-visible:outline-none focus-visible:ring-2",
"focus-visible:ring-[var(--semantic-color-focus)] focus-visible:ring-offset-2",
"disabled:pointer-events-none disabled:opacity-50"
],
{
variants: {
variant: {
primary: [
"bg-[var(--component-button-primary-background)]",
"text-[var(--component-button-primary-text)]",
"hover:bg-[var(--component-button-primary-backgroundHover)]"
],
secondary: "border border-[var(--semantic-color-border)] bg-[var(--semantic-color-surface)] text-[var(--semantic-color-text)] hover:bg-gray-50",
danger: [
"bg-[var(--component-button-danger-background)]",
"text-[var(--component-button-danger-text)]",
"hover:bg-[var(--component-button-danger-backgroundHover)]"
]
},
size: {
sm: "h-8 px-3 text-sm",
md: "h-10 px-4 text-sm",
lg: "h-12 px-6 text-base"
}
},
defaultVariants: {
variant: "primary",
size: "md"
}
}
);
export interface ButtonProps
extends ButtonHTMLAttributes<HTMLButtonElement>,
VariantProps<typeof buttonVariants> {
loading?: boolean;
}
export const Button = forwardRef<HTMLButtonElement, ButtonProps>(function Button(
{ className, variant, size, loading = false, disabled, children, ...props },
ref
) {
return (
<button
ref={ref}
className={cn(buttonVariants({ variant, size }), className)}
disabled={disabled || loading}
aria-busy={loading || undefined}
{...props}
>
{loading ? (
<span
aria-hidden="true"
className="h-4 w-4 animate-spin rounded-full border-2 border-current border-r-transparent"
/>
) : null}
<span>{children}</span>
</button>
);
});
The review question is not “does the button look nice?” The review question is “is this API stable enough for many product teams to use?”
Turn Storybook into a Specification
Every state that matters should exist as a story. If it is not in Storybook, it is difficult to review, test, or discuss.
import type { Meta, StoryObj } from "@storybook/react";
import { Button } from "./Button";
const meta = {
title: "Design System/Button",
component: Button,
parameters: {
layout: "centered",
a11y: {
test: "error"
}
},
argTypes: {
variant: {
control: "select",
options: ["primary", "secondary", "danger"]
},
size: {
control: "select",
options: ["sm", "md", "lg"]
},
loading: { control: "boolean" },
disabled: { control: "boolean" }
}
} satisfies Meta<typeof Button>;
export default meta;
type Story = StoryObj<typeof meta>;
export const Primary: Story = {
args: {
children: "Save changes",
variant: "primary"
}
};
export const Danger: Story = {
args: {
children: "Delete",
variant: "danger"
}
};
export const Loading: Story = {
args: {
children: "Saving",
loading: true
}
};
export const AllStates: Story = {
render: () => (
<div className="flex flex-wrap items-center gap-3">
<Button variant="primary" size="sm">Small</Button>
<Button variant="primary" size="md">Medium</Button>
<Button variant="primary" size="lg">Large</Button>
<Button variant="secondary">Secondary</Button>
<Button variant="danger">Danger</Button>
<Button disabled>Disabled</Button>
<Button loading>Loading</Button>
</div>
)
};
Tell Claude Code to preserve existing stories, add missing states, and explain any story ID changes. This keeps visual snapshots and a11y reports reviewable.
Run Visual and A11y Checks in CI
Automated accessibility checks do not replace manual review, but they catch obvious violations early. Playwright plus axe is a practical baseline.
import { expect, test } from "@playwright/test";
import AxeBuilder from "@axe-core/playwright";
const storyPaths = [
"/iframe.html?id=design-system-button--primary",
"/iframe.html?id=design-system-button--danger",
"/iframe.html?id=design-system-button--loading",
"/iframe.html?id=design-system-button--all-states"
];
for (const storyPath of storyPaths) {
test(`a11y ${storyPath}`, async ({ page }) => {
await page.goto(`http://127.0.0.1:6006${storyPath}`);
const results = await new AxeBuilder({ page })
.withTags(["wcag2a", "wcag2aa", "wcag21a", "wcag21aa"])
.analyze();
expect(results.violations).toEqual([]);
});
}
Use screenshots sparingly at first. Start with high-value stories that represent common UI states.
import { expect, test } from "@playwright/test";
test("button all states visual snapshot", async ({ page }) => {
await page.goto("http://127.0.0.1:6006/iframe.html?id=design-system-button--all-states");
await expect(page).toHaveScreenshot("button-all-states.png", {
fullPage: true,
animations: "disabled"
});
});
Then wire it into GitHub Actions:
name: design-system-quality
on:
pull_request:
paths:
- "tokens.json"
- "scripts/build-tokens.mjs"
- "src/components/**"
- "src/styles/**"
- ".storybook/**"
- "tests/**"
- "package.json"
- "package-lock.json"
jobs:
check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 22
cache: npm
- run: npm ci
- run: npm run tokens:build
- run: npm run build-storybook
- run: npx playwright install --with-deps chromium
- run: >
npx concurrently -k -s first -n server,tests
"npx http-server storybook-static -p 6006"
"npx wait-on http://127.0.0.1:6006 && npm run test:storybook && npm run test:a11y && npm run test:visual"
When CI fails, give Claude Code the failing story ID, axe violation, changed files, and visual diff. Avoid dumping secrets or entire logs into the prompt.
A Realistic Boundary for Figma Integration
Figma Variables are a strong input to token work, but automatic two-way sync is usually too risky at the beginning. Unapproved experiments, old component names, and private design notes can leak into production tokens.
| Area | Good automation | Avoid |
|---|---|---|
| Figma Variables | Export and compare with tokens.json | Blindly overwrite production tokens |
| Figma Components | Collect state and prop candidates | Auto-decide React APIs |
| Figma comments | Summarize unresolved questions | Infer final design intent |
| Storybook links | Attach story URLs to design review | Treat Storybook as design approval |
Use Claude Code to create a review report first:
Read figma-tokens-export.json and tokens.json.
Create a markdown report with:
1. tokens that exist in Figma but not in code
2. tokens that exist in code but not in Figma
3. value differences for matching semantic tokens
Do not edit tokens.json. Do not rename tokens. Mark risky differences around focus, danger, and text color.
The goal is not synchronization for its own sake. The goal is a safe, reviewable diff.
Three Practical Use Cases
The first use case is a SaaS admin UI. Buttons, forms, tables, and modals have many states. Ask Claude Code to inventory current usage, create compatibility props, and migrate one screen at a time.
The second use case is a white-label product. Primitive brand colors vary per customer, while semantic tokens stay stable. Claude Code can generate per-brand CSS variables and Storybook theme switches.
The third use case is legacy CSS cleanup. Claude Code can find repeated raw values, cluster them into token candidates, and produce a migration table. Do not replace everything in one commit; use visual snapshots to control risk.
The fourth use case is a marketing or inquiry funnel. Consistent CTA buttons, pricing cards, and form states help visitors trust the site and make conversion experiments easier to run.
Failure Cases to Avoid
Do not let primitive tokens spread directly through components. If components depend on blue-600, a later brand change becomes a search-and-replace problem. Prefer semantic or component tokens.
Do not treat Storybook as complete unless it runs in CI. A component catalog that can silently break is documentation, not a safety net.
Do not over-expand visual tests. Animations, dates, external fonts, and random IDs create noisy snapshots. Freeze dynamic content and start with the most important stories.
Do not assume an axe pass means the component is accessible. Automated checks miss context, copy quality, keyboard flow quality, and screen reader comprehension.
Do not ask Claude Code for a huge migration. Work component by component, require tests, and review file scope before accepting changes.
Review Checklist
Before merging, check the following:
- Token names express meaning, not only appearance
- Component props are minimal and stable
- disabled, loading, error, focus, and hover states exist in Storybook
- Keyboard-only operation works
- ARIA is present where needed and not added where native HTML already works
- Visual snapshot changes were reviewed by a human
- Figma differences are saved as a review artifact
- Claude Code only edited the requested file areas
- No secrets or private customer data appear in prompts, logs, stories, or screenshots
Add this checklist to your project instructions so Claude Code can reuse it in later sessions.
Verification Points Before You Try This
Confirm that tokens.json generates CSS variables and TypeScript constants, Button stories render all states, and CI can reproduce Storybook build, accessibility checks, and visual snapshots. Keep Figma integration in report-only mode until the team agrees on the source of truth.
If your team needs help with design system implementation, Storybook adoption, accessibility review, or a Claude Code workflow for UI refactoring, the training and consultation page is the best next step.
Free PDF: Claude Code Cheatsheet
Enter your email and download the one-page Claude Code cheatsheet for commands, review habits, and safe workflows.
We handle your data with care and never send spam.
Level up your Claude Code workflow
Start with the free PDF, use Gumroad guides when you need repeatable workflows, and book consultation when rollout or revenue paths need human judgment.
About the Author
Masa
Engineer focused on practical Claude Code workflows. Runs claudecode-lab.com, a 10-language technical media site.
Related Posts
Claude Code Permission Receipt Pattern: Record Scope, Proof, and Rollback
A permission receipt pattern for Claude Code: allowed actions, approval boundaries, proof commands, rollback, and revenue CTA checks.
Safe Agent Harness Design for Claude Code and Codex: Permissions, Checks, and Rollback
Build a practical agent harness for Claude Code and Codex with policy, planning, verification, and recovery layers.
Claude Code Subagents: A Practical Guide to Safe Agent Delegation
Claude Code subagent guide for safe parallel article and code work: delegation rules, prompts, pitfalls, and checks.
Related Products
50 Battle-Tested Claude Code Prompt Templates
Copy, paste, ship. 50 production-ready prompts.
Use proven prompts for code review, refactoring, testing, documentation, debugging, architecture, and incident response.
The Complete Claude Code Setup & Configuration Guide
From install to team-ready workflow.
A practical guide to installation, CLAUDE.md, hooks, MCP servers, permissions, IDE setup, and CI/CD workflows.