Advanced (Updated: 6/1/2026)

Build a Design System with Claude Code: Design Tokens, Storybook, and CI

Use Claude Code for design tokens, React/TypeScript components, Storybook, accessibility, visual tests, and CI.

Build a Design System with Claude Code: Design Tokens, Storybook, and CI

When teams start a design system, they often rush to build buttons, cards, and forms. That is useful, but the real value is a repeatable way to change color, spacing, typography, states, reviews, and tests without breaking product screens.

Claude Code is a good fit for this work because it can read the existing codebase, edit several files, run Storybook and tests, and report the diff. It is not a replacement for product judgment, brand decisions, or final accessibility review. The best results come when you give it a bounded task and a clear review checklist.

This guide covers design tokens, React/TypeScript components, Storybook, accessibility, CI visual/a11y checks, realistic Figma integration boundaries, and the level of task detail that works well with Claude Code.

For related reading, see design token management with Claude Code, Storybook development with Claude Code, and accessibility work with Claude Code.

Target Architecture

The source of truth in this workflow is tokens.json. Figma remains essential for design work, but code needs a reviewable contract that CI can validate.

flowchart LR
  Figma["Figma Variables"]
  Tokens["tokens.json"]
  Build["token build script"]
  CSS["CSS variables"]
  TS["TypeScript token map"]
  Components["React components"]
  Storybook["Storybook stories"]
  CI["Visual and a11y CI"]

  Figma -->|review input| Tokens
  Tokens --> Build
  Build --> CSS
  Build --> TS
  CSS --> Components
  TS --> Components
  Components --> Storybook
  Storybook --> CI

Design tokens are named design decisions: colors, spacing, radii, typography, and component states stored as data. A component should avoid raw values like #2563eb when a semantic token such as action.background.primary would explain the purpose.

For current references, check the Design Tokens Community Group, Claude Code docs, Claude Code security guidance, Storybook accessibility testing, Storybook visual tests, Playwright accessibility testing, and the Figma REST API docs.

The Right Task Size for Claude Code

Claude Code performs best when the boundaries are explicit and the output can be verified. A vague prompt such as “make a design system” usually produces a large, hard-to-review diff. A scoped request such as “migrate only Button, keep the existing public API, add Storybook states, and run a11y tests” is much safer.

Work areaGood Claude Code scopeHuman decision
TokensExtract repeated colors and spacing from CSSBrand meaning and token names
ComponentsImplement typed Button, Input, and Alert primitivesPublic API and product semantics
StorybookAdd variants, states, and interaction storiesWhich states matter in real workflows
AccessibilityDetect missing labels, focus issues, and axe violationsFinal screen reader and UX judgment
CIAdd visual and a11y checks to pull requestsFailure policy and exception process

Use a short project rule before asking Claude Code to edit files:

Design system task rules:
- Edit only src/components, src/styles, .storybook, tests, scripts, and tokens.json.
- Do not change brand colors without listing old and new token names.
- Every new component needs TypeScript props, keyboard behavior, Storybook stories, and a11y notes.
- Run npm run tokens:build, npm run test:storybook, npm run test:a11y, and npm run test:visual before reporting done.
- If focus behavior changes, include manual review steps.

Security is part of the design system workflow. Do not paste Figma tokens, npm tokens, CI secrets, or private customer screenshots into prompts. Keep Claude Code permissions narrow, review commands before approval, and treat large snapshot updates as human-approved changes.

Minimal Setup

This example assumes a React and TypeScript app with utility classes. Adapt the commands to your package manager if needed.

npm install class-variance-authority clsx tailwind-merge
npm install -D @storybook/react-vite @storybook/addon-a11y @storybook/test-runner @playwright/test @axe-core/playwright concurrently http-server wait-on
npx storybook init
npx playwright install chromium

Add scripts that make the workflow reproducible locally and in CI:

{
  "scripts": {
    "tokens:build": "node scripts/build-tokens.mjs",
    "storybook": "storybook dev -p 6006",
    "build-storybook": "storybook build",
    "test:storybook": "test-storybook --url http://127.0.0.1:6006",
    "test:a11y": "playwright test tests/a11y.spec.ts",
    "test:visual": "playwright test tests/button.visual.spec.ts"
  }
}

Make Design Tokens the Contract

Split tokens into primitive, semantic, and component layers. Primitive tokens store raw values, semantic tokens describe meaning, and component tokens capture UI-specific state.

{
  "primitive": {
    "color": {
      "blue": {
        "50": { "$type": "color", "$value": "#eff6ff" },
        "600": { "$type": "color", "$value": "#2563eb" },
        "700": { "$type": "color", "$value": "#1d4ed8" }
      },
      "gray": {
        "50": { "$type": "color", "$value": "#f9fafb" },
        "200": { "$type": "color", "$value": "#e5e7eb" },
        "900": { "$type": "color", "$value": "#111827" }
      },
      "red": {
        "600": { "$type": "color", "$value": "#dc2626" },
        "700": { "$type": "color", "$value": "#b91c1c" }
      },
      "white": { "$type": "color", "$value": "#ffffff" }
    },
    "space": {
      "2": { "$type": "dimension", "$value": "0.5rem" },
      "3": { "$type": "dimension", "$value": "0.75rem" },
      "4": { "$type": "dimension", "$value": "1rem" },
      "6": { "$type": "dimension", "$value": "1.5rem" }
    },
    "radius": {
      "md": { "$type": "dimension", "$value": "0.375rem" },
      "lg": { "$type": "dimension", "$value": "0.5rem" }
    }
  },
  "semantic": {
    "color": {
      "surface": { "$type": "color", "$value": "{primitive.color.white}" },
      "text": { "$type": "color", "$value": "{primitive.color.gray.900}" },
      "border": { "$type": "color", "$value": "{primitive.color.gray.200}" },
      "focus": { "$type": "color", "$value": "{primitive.color.blue.600}" }
    }
  },
  "component": {
    "button": {
      "primary": {
        "background": { "$type": "color", "$value": "{primitive.color.blue.600}" },
        "backgroundHover": { "$type": "color", "$value": "{primitive.color.blue.700}" },
        "text": { "$type": "color", "$value": "{primitive.color.white}" }
      },
      "danger": {
        "background": { "$type": "color", "$value": "{primitive.color.red.600}" },
        "backgroundHover": { "$type": "color", "$value": "{primitive.color.red.700}" },
        "text": { "$type": "color", "$value": "{primitive.color.white}" }
      }
    }
  }
}

Generate CSS variables and a TypeScript token map from that file:

import { mkdirSync, readFileSync, writeFileSync } from "node:fs";
import { dirname } from "node:path";

const source = JSON.parse(readFileSync("tokens.json", "utf8"));

function getToken(path) {
  const node = path.split(".").reduce((current, key) => current?.[key], source);
  if (!node || typeof node.$value === "undefined") {
    throw new Error(`Unknown token reference: ${path}`);
  }
  return node.$value;
}

function resolveValue(value) {
  if (typeof value === "string" && value.startsWith("{") && value.endsWith("}")) {
    return resolveValue(getToken(value.slice(1, -1)));
  }
  return value;
}

function walk(node, pathParts = [], result = {}) {
  if (node && typeof node === "object" && typeof node.$value !== "undefined") {
    result[pathParts.join("-")] = resolveValue(node.$value);
    return result;
  }

  for (const [key, value] of Object.entries(node)) {
    walk(value, [...pathParts, key], result);
  }

  return result;
}

const flat = walk(source);
const css = [
  ":root {",
  ...Object.entries(flat).map(([name, value]) => `  --${name}: ${value};`),
  "}",
  ""
].join("\n");

mkdirSync(dirname("src/styles/tokens.css"), { recursive: true });
mkdirSync(dirname("src/tokens.ts"), { recursive: true });
writeFileSync("src/styles/tokens.css", css);
writeFileSync("src/tokens.ts", `export const tokens = ${JSON.stringify(flat, null, 2)} as const;\n`);

console.log(`Generated ${Object.keys(flat).length} tokens.`);

Ask Claude Code to extract candidates first, not to rewrite every UI at once. A good prompt is: “Find repeated raw colors and spacing values, map them to proposed tokens, and return a report before editing.”

Build Typed React Components

The component layer should be boring and predictable. This Button includes variants, sizes, loading state, disabled behavior, and visible focus treatment.

import { forwardRef, type ButtonHTMLAttributes } from "react";
import { cva, type VariantProps } from "class-variance-authority";
import { clsx, type ClassValue } from "clsx";
import { twMerge } from "tailwind-merge";

function cn(...inputs: ClassValue[]) {
  return twMerge(clsx(inputs));
}

const buttonVariants = cva(
  [
    "inline-flex items-center justify-center gap-2 rounded-md font-medium",
    "transition-colors focus-visible:outline-none focus-visible:ring-2",
    "focus-visible:ring-[var(--semantic-color-focus)] focus-visible:ring-offset-2",
    "disabled:pointer-events-none disabled:opacity-50"
  ],
  {
    variants: {
      variant: {
        primary: [
          "bg-[var(--component-button-primary-background)]",
          "text-[var(--component-button-primary-text)]",
          "hover:bg-[var(--component-button-primary-backgroundHover)]"
        ],
        secondary: "border border-[var(--semantic-color-border)] bg-[var(--semantic-color-surface)] text-[var(--semantic-color-text)] hover:bg-gray-50",
        danger: [
          "bg-[var(--component-button-danger-background)]",
          "text-[var(--component-button-danger-text)]",
          "hover:bg-[var(--component-button-danger-backgroundHover)]"
        ]
      },
      size: {
        sm: "h-8 px-3 text-sm",
        md: "h-10 px-4 text-sm",
        lg: "h-12 px-6 text-base"
      }
    },
    defaultVariants: {
      variant: "primary",
      size: "md"
    }
  }
);

export interface ButtonProps
  extends ButtonHTMLAttributes<HTMLButtonElement>,
    VariantProps<typeof buttonVariants> {
  loading?: boolean;
}

export const Button = forwardRef<HTMLButtonElement, ButtonProps>(function Button(
  { className, variant, size, loading = false, disabled, children, ...props },
  ref
) {
  return (
    <button
      ref={ref}
      className={cn(buttonVariants({ variant, size }), className)}
      disabled={disabled || loading}
      aria-busy={loading || undefined}
      {...props}
    >
      {loading ? (
        <span
          aria-hidden="true"
          className="h-4 w-4 animate-spin rounded-full border-2 border-current border-r-transparent"
        />
      ) : null}
      <span>{children}</span>
    </button>
  );
});

The review question is not “does the button look nice?” The review question is “is this API stable enough for many product teams to use?”

Turn Storybook into a Specification

Every state that matters should exist as a story. If it is not in Storybook, it is difficult to review, test, or discuss.

import type { Meta, StoryObj } from "@storybook/react";
import { Button } from "./Button";

const meta = {
  title: "Design System/Button",
  component: Button,
  parameters: {
    layout: "centered",
    a11y: {
      test: "error"
    }
  },
  argTypes: {
    variant: {
      control: "select",
      options: ["primary", "secondary", "danger"]
    },
    size: {
      control: "select",
      options: ["sm", "md", "lg"]
    },
    loading: { control: "boolean" },
    disabled: { control: "boolean" }
  }
} satisfies Meta<typeof Button>;

export default meta;
type Story = StoryObj<typeof meta>;

export const Primary: Story = {
  args: {
    children: "Save changes",
    variant: "primary"
  }
};

export const Danger: Story = {
  args: {
    children: "Delete",
    variant: "danger"
  }
};

export const Loading: Story = {
  args: {
    children: "Saving",
    loading: true
  }
};

export const AllStates: Story = {
  render: () => (
    <div className="flex flex-wrap items-center gap-3">
      <Button variant="primary" size="sm">Small</Button>
      <Button variant="primary" size="md">Medium</Button>
      <Button variant="primary" size="lg">Large</Button>
      <Button variant="secondary">Secondary</Button>
      <Button variant="danger">Danger</Button>
      <Button disabled>Disabled</Button>
      <Button loading>Loading</Button>
    </div>
  )
};

Tell Claude Code to preserve existing stories, add missing states, and explain any story ID changes. This keeps visual snapshots and a11y reports reviewable.

Run Visual and A11y Checks in CI

Automated accessibility checks do not replace manual review, but they catch obvious violations early. Playwright plus axe is a practical baseline.

import { expect, test } from "@playwright/test";
import AxeBuilder from "@axe-core/playwright";

const storyPaths = [
  "/iframe.html?id=design-system-button--primary",
  "/iframe.html?id=design-system-button--danger",
  "/iframe.html?id=design-system-button--loading",
  "/iframe.html?id=design-system-button--all-states"
];

for (const storyPath of storyPaths) {
  test(`a11y ${storyPath}`, async ({ page }) => {
    await page.goto(`http://127.0.0.1:6006${storyPath}`);

    const results = await new AxeBuilder({ page })
      .withTags(["wcag2a", "wcag2aa", "wcag21a", "wcag21aa"])
      .analyze();

    expect(results.violations).toEqual([]);
  });
}

Use screenshots sparingly at first. Start with high-value stories that represent common UI states.

import { expect, test } from "@playwright/test";

test("button all states visual snapshot", async ({ page }) => {
  await page.goto("http://127.0.0.1:6006/iframe.html?id=design-system-button--all-states");
  await expect(page).toHaveScreenshot("button-all-states.png", {
    fullPage: true,
    animations: "disabled"
  });
});

Then wire it into GitHub Actions:

name: design-system-quality

on:
  pull_request:
    paths:
      - "tokens.json"
      - "scripts/build-tokens.mjs"
      - "src/components/**"
      - "src/styles/**"
      - ".storybook/**"
      - "tests/**"
      - "package.json"
      - "package-lock.json"

jobs:
  check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 22
          cache: npm
      - run: npm ci
      - run: npm run tokens:build
      - run: npm run build-storybook
      - run: npx playwright install --with-deps chromium
      - run: >
          npx concurrently -k -s first -n server,tests
          "npx http-server storybook-static -p 6006"
          "npx wait-on http://127.0.0.1:6006 && npm run test:storybook && npm run test:a11y && npm run test:visual"

When CI fails, give Claude Code the failing story ID, axe violation, changed files, and visual diff. Avoid dumping secrets or entire logs into the prompt.

A Realistic Boundary for Figma Integration

Figma Variables are a strong input to token work, but automatic two-way sync is usually too risky at the beginning. Unapproved experiments, old component names, and private design notes can leak into production tokens.

AreaGood automationAvoid
Figma VariablesExport and compare with tokens.jsonBlindly overwrite production tokens
Figma ComponentsCollect state and prop candidatesAuto-decide React APIs
Figma commentsSummarize unresolved questionsInfer final design intent
Storybook linksAttach story URLs to design reviewTreat Storybook as design approval

Use Claude Code to create a review report first:

Read figma-tokens-export.json and tokens.json.
Create a markdown report with:
1. tokens that exist in Figma but not in code
2. tokens that exist in code but not in Figma
3. value differences for matching semantic tokens
Do not edit tokens.json. Do not rename tokens. Mark risky differences around focus, danger, and text color.

The goal is not synchronization for its own sake. The goal is a safe, reviewable diff.

Three Practical Use Cases

The first use case is a SaaS admin UI. Buttons, forms, tables, and modals have many states. Ask Claude Code to inventory current usage, create compatibility props, and migrate one screen at a time.

The second use case is a white-label product. Primitive brand colors vary per customer, while semantic tokens stay stable. Claude Code can generate per-brand CSS variables and Storybook theme switches.

The third use case is legacy CSS cleanup. Claude Code can find repeated raw values, cluster them into token candidates, and produce a migration table. Do not replace everything in one commit; use visual snapshots to control risk.

The fourth use case is a marketing or inquiry funnel. Consistent CTA buttons, pricing cards, and form states help visitors trust the site and make conversion experiments easier to run.

Failure Cases to Avoid

Do not let primitive tokens spread directly through components. If components depend on blue-600, a later brand change becomes a search-and-replace problem. Prefer semantic or component tokens.

Do not treat Storybook as complete unless it runs in CI. A component catalog that can silently break is documentation, not a safety net.

Do not over-expand visual tests. Animations, dates, external fonts, and random IDs create noisy snapshots. Freeze dynamic content and start with the most important stories.

Do not assume an axe pass means the component is accessible. Automated checks miss context, copy quality, keyboard flow quality, and screen reader comprehension.

Do not ask Claude Code for a huge migration. Work component by component, require tests, and review file scope before accepting changes.

Review Checklist

Before merging, check the following:

  • Token names express meaning, not only appearance
  • Component props are minimal and stable
  • disabled, loading, error, focus, and hover states exist in Storybook
  • Keyboard-only operation works
  • ARIA is present where needed and not added where native HTML already works
  • Visual snapshot changes were reviewed by a human
  • Figma differences are saved as a review artifact
  • Claude Code only edited the requested file areas
  • No secrets or private customer data appear in prompts, logs, stories, or screenshots

Add this checklist to your project instructions so Claude Code can reuse it in later sessions.

Verification Points Before You Try This

Confirm that tokens.json generates CSS variables and TypeScript constants, Button stories render all states, and CI can reproduce Storybook build, accessibility checks, and visual snapshots. Keep Figma integration in report-only mode until the team agrees on the source of truth.

If your team needs help with design system implementation, Storybook adoption, accessibility review, or a Claude Code workflow for UI refactoring, the training and consultation page is the best next step.

#Claude Code #design system #Design Tokens #Storybook #accessibility
Free

Free PDF: Claude Code Cheatsheet

Enter your email and download the one-page Claude Code cheatsheet for commands, review habits, and safe workflows.

We handle your data with care and never send spam.

Level up your Claude Code workflow

Start with the free PDF, use Gumroad guides when you need repeatable workflows, and book consultation when rollout or revenue paths need human judgment.

Masa

About the Author

Masa

Engineer focused on practical Claude Code workflows. Runs claudecode-lab.com, a 10-language technical media site.