Feature Flags with Claude Code: Safe Rollouts, Experiments, and Kill Switches

Start with the Operating Rule, Not the Toggle

A feature flag is a runtime switch: code can be deployed while a capability stays off, rolls out gradually, or turns off quickly when something breaks. The beginner mistake is not the if statement. The real mistake is treating every flag the same. A release flag, an experiment flag, and a kill switch have different lifetimes, owners, metrics, and cleanup rules.

Claude Code can generate the UI branch in seconds. Production work needs more structure: a safe default, a targeting context, server/client boundaries, exposure logging, guardrail metrics, rollback instructions, and a date when short-lived flags are removed. In Masa’s site and small SaaS workflows, the best prompt is not “add feature flags.” It is “show me what fails, what I can switch off, and how we will know whether the rollout is healthy.”

Use primary docs for the mental model. OpenFeature separates the application-facing evaluation API from the provider behind it and uses an evaluation context for user, app, and environment data. LaunchDarkly documents flag use cases such as release flags, experiment flags, and kill switches. Unleash documents a lifecycle that moves flags through Define, Develop, Production, Cleanup, and Archived states. Anthropic’s Claude Code guidance also emphasizes giving the agent a verification path.

Primary references used for this refresh:

Separate Release Flags, Experiments, and Kill Switches

Classify flags by lifetime before writing code. A release flag hides unfinished work, rolls it out to a growing audience, then disappears after 100% rollout. An experiment flag tests a hypothesis and must record exposure plus outcome metrics. A kill switch is a longer-lived safety control for external API failures, cost spikes, slow recommendation services, or risky automation.

Use case	Flag type	Success metric	Failure action
Roll out a new SaaS checkout to 25% of Pro accounts	Release	Checkout completion, payment error rate	Turn off `checkout_v2_release`
Compare pricing page CTA copy	Experiment	Signup start rate, paid-intent clicks	Stop experiment and serve control
Move a blog affiliate block into the article body	Experiment	Product clicks, read completion	Return block to article footer
Disable recommendations during a vendor incident	Kill switch	p95 latency, 5xx rate	Turn off `recommendations_enabled`

This article pairs well with A/B testing with Claude Code and analytics implementation with Claude Code, because flags without measurement become guesswork. For monetized sites, the natural CTA is not only a button click: protect affiliate revenue, AdSense quality, read completion, and paid consultation intent. If you want a packaged workflow, see the ClaudeCodeLab products or book a consultation.

A Minimal Config and Evaluator

Start with a small, vendor-neutral pattern. Your app should evaluate a flag by key, default value, and context; the backing provider can later become LaunchDarkly, Unleash, OpenFeature, a JSON file, or an internal service. The following snippet is intentionally small enough to paste into flag-demo.ts and run with npx tsx flag-demo.ts.

type FlagValue = boolean | string | number;
type FlagKind = "release" | "experiment" | "kill_switch";
type Plan = "free" | "pro" | "enterprise";
type Role = "user" | "admin";
type Operator = "equals" | "in";

type FlagContext = {
  targetingKey: string;
  plan: Plan;
  country: string;
  role: Role;
  appVersion: string;
};

type FlagRule = {
  attribute: keyof Omit<FlagContext, "targetingKey">;
  operator: Operator;
  values: string[];
  value: FlagValue;
  percentage?: number;
};

type FlagConfig = {
  key: string;
  kind: FlagKind;
  enabled: boolean;
  defaultValue: FlagValue;
  offValue: FlagValue;
  owner: string;
  removeAfter?: string;
  rules: FlagRule[];
};

const registry: Record<string, FlagConfig> = {
  checkout_v2_release: {
    key: "checkout_v2_release",
    kind: "release",
    enabled: true,
    defaultValue: false,
    offValue: false,
    owner: "growth-platform",
    removeAfter: "2026-07-15",
    rules: [
      {
        attribute: "role",
        operator: "equals",
        values: ["admin"],
        value: true,
      },
      {
        attribute: "plan",
        operator: "in",
        values: ["pro", "enterprise"],
        value: true,
        percentage: 25,
      },
    ],
  },
  pricing_copy_2026_06: {
    key: "pricing_copy_2026_06",
    kind: "experiment",
    enabled: true,
    defaultValue: "control",
    offValue: "control",
    owner: "monetization",
    removeAfter: "2026-06-30",
    rules: [
      {
        attribute: "country",
        operator: "in",
        values: ["JP", "US", "DE"],
        value: "simple",
        percentage: 50,
      },
    ],
  },
  recommendations_enabled: {
    key: "recommendations_enabled",
    kind: "kill_switch",
    enabled: true,
    defaultValue: true,
    offValue: false,
    owner: "sre",
    rules: [],
  },
};

function bucketFor(flagKey: string, targetingKey: string): number {
  const input = `${flagKey}:${targetingKey}`;
  let hash = 0;

  for (const char of input) {
    hash = (hash * 31 + char.charCodeAt(0)) >>> 0;
  }

  return hash % 100;
}

function ruleMatches(
  flagKey: string,
  rule: FlagRule,
  context: FlagContext,
): boolean {
  const actual = String(context[rule.attribute]);
  const matched =
    rule.operator === "equals"
      ? actual === rule.values[0]
      : rule.values.includes(actual);

  if (!matched) return false;
  if (rule.percentage === undefined) return true;

  return bucketFor(flagKey, context.targetingKey) < rule.percentage;
}

export function evaluateFlag<T extends FlagValue = FlagValue>(
  key: string,
  context: FlagContext,
): T {
  const flag = registry[key];
  if (!flag) return false as T;
  if (!flag.enabled) return flag.offValue as T;

  for (const rule of flag.rules) {
    if (ruleMatches(flag.key, rule, context)) {
      return rule.value as T;
    }
  }

  return flag.defaultValue as T;
}

const demoContexts: FlagContext[] = [
  {
    targetingKey: "user_001",
    plan: "pro",
    country: "JP",
    role: "user",
    appVersion: "1.8.0",
  },
  {
    targetingKey: "user_002",
    plan: "free",
    country: "BR",
    role: "admin",
    appVersion: "1.8.0",
  },
];

for (const context of demoContexts) {
  console.log(context.targetingKey, {
    checkout: evaluateFlag<boolean>("checkout_v2_release", context),
    pricingCopy: evaluateFlag<string>("pricing_copy_2026_06", context),
    recommendations: evaluateFlag<boolean>(
      "recommendations_enabled",
      context,
    ),
  });
}

The important details are boring on purpose. Unknown flags fail closed. Percentage rollout uses a stable targetingKey, not Math.random(). Each temporary flag has an owner and removeAfter date. The registry can later move to a control plane, but the application contract stays small.

Keep Server Evaluation and Client Display Separate

Evaluate anything related to billing, authorization, quota, inventory, or backend cost on the server. Client-side flags are fine for already-authorized UI copy, layout, onboarding hints, or low-risk visual changes. Do not ship secret targeting rules to the browser and do not rely on hidden buttons as access control.

type User = {
  id: string;
  plan: "free" | "pro" | "enterprise";
  role: "user" | "admin";
};

type RequestLike = {
  headers: {
    get(name: string): string | null;
  };
};

export function buildFlagContext(
  user: User,
  request: RequestLike,
): FlagContext {
  return {
    targetingKey: user.id,
    plan: user.plan,
    role: user.role,
    country: request.headers.get("x-country") ?? "US",
    appVersion: process.env.NEXT_PUBLIC_APP_VERSION ?? "dev",
  };
}

export function getServerFlagSnapshot(context: FlagContext) {
  return {
    checkoutV2: evaluateFlag<boolean>("checkout_v2_release", context),
    pricingCopy: evaluateFlag<string>("pricing_copy_2026_06", context),
  };
}

type PricingFlags = {
  pricingCopy: string;
};

export function PricingCta({ flags }: { flags: PricingFlags }) {
  const label =
    flags.pricingCopy === "simple"
      ? "Start with the free plan"
      : "Start free trial";

  return <a href="/signup">{label}</a>;
}

This keeps the React component dumb. The server produces a small snapshot, and the client renders it. When prompting Claude Code, say that permission and billing checks must stay server-side and that client code may only consume a pre-evaluated snapshot.

Roll Out with Observability, Not Hope

A safe rollout is not simply “start at 1%.” It is a plan with a ramp schedule, metrics, and a rollback threshold. Unleash gradual rollouts combine percentage, stickiness, and constraints. LaunchDarkly guarded rollouts connect rollouts to metrics and can pause or roll back when regressions are detected. Even if you use a small internal evaluator, copy that operating model.

Track three layers. Exposure tells you who saw which flag value. The primary metric tells you whether the intended behavior improved. Guardrails tell you whether you damaged speed, errors, revenue quality, support load, or trust.

type FlagExposure = {
  flagKey: string;
  value: FlagValue;
  targetingKey: string;
  route: string;
  evaluatedAt: string;
};

export function trackFlagExposure(event: FlagExposure) {
  console.log(
    JSON.stringify({
      event_name: "feature_flag_exposure",
      ...event,
    }),
  );
}

For checkout, watch 5xx rate, payment failures, and support tickets. For a monetized blog, do not look only at affiliate clicks; watch read completion, bounce rate, Core Web Vitals, and paid-intent clicks. For AI features, watch token spend, p95 latency, and per-user quotas. A flag that raises clicks while lowering buyer quality is not a win.

Concrete Failure Modes

The first failure is random assignment on every page load. If a user can refresh from A to B, exposure and conversion data are corrupted. Use a stable targeting key.

The second failure is client-only entitlement. Hiding a premium button in React does not protect the backend API. Flags can shape UX, but they are not authorization.

The third failure is an unsafe default. A missing release flag should usually return false. If a typo serves true, you just launched by accident.

The fourth failure is never deleting temporary flags. Six months later, checkout_v2_release becomes mystery logic. Move release and experiment flags into cleanup as soon as the decision is made.

The fifth failure is over-nested rules. Parent flags, child flags, and overlapping percentage rollouts make it hard to explain who actually sees the feature. Keep dependencies rare and documented.

Safer Claude Code Prompts

Claude Code can read files, edit code, run tests, and continue through a task. Give it the safety boundaries and verification commands up front.

Add a feature flag workflow to this repository.
The first flag is checkout_v2_release for a staged rollout.

Constraints:
- Evaluate billing and authorization flags on the server.
- Unknown release flags must return false.
- Use a stable targetingKey for percentage rollout.
- Include owner and removeAfter in the flag registry.
- Do not modify unrelated files.

Required output:
- Minimal flag registry and evaluateFlag function
- Exposure event type
- At least three product use cases
- Failure examples and rollback steps
- Test commands that were run

Use a separate review prompt before merging:

Review this feature flag implementation.
Focus on defaults, server/client boundaries, stable bucketing,
missing exposure events, cleanup dates, and rollback behavior.
List findings by severity and point to exact files.

These prompts move Claude Code away from a decorative toggle and toward an operating system your team can maintain.

Cleanup Is Part of Shipping

Every feature flag starts aging the day it is created. Release flags should disappear after full rollout. Experiment flags should be removed after the winner is selected. Kill switches can stay, but they need an owner, a runbook, and alerts. Put owner, removeAfter, metrics, and the planned removal PR into your pull request template.

I verified the evaluator pattern in this article as a runnable TypeScript demo. The same targetingKey lands in the same bucket, unknown flags return the safe fallback, and the kill switch has an explicit off value. Masa’s practical note from monetized content work is simple: when a flag touches revenue, measure quality as well as clicks. Start with one release flag, one experiment flag, and one kill switch before introducing a full platform.

Feature Flags with Claude Code: Safe Rollouts, Experiments, and Kill Switches

Start with the Operating Rule, Not the Toggle

Separate Release Flags, Experiments, and Kill Switches

A Minimal Config and Evaluator

Keep Server Evaluation and Client Display Separate

Roll Out with Observability, Not Hope

Concrete Failure Modes

Safer Claude Code Prompts

Cleanup Is Part of Shipping

Free PDF: Claude Code Cheatsheet

Level up your Claude Code workflow

Related Posts

Claude Code Permission Safety Ladder: Expand Access Without Losing Control

Claude Code Small PR Proof Pack: Make Tiny Changes Reviewable

Claude Code Review Gate Before Commit: Diff, Tests, Public URL, and CTA Checks

Related Products

50 Battle-Tested Claude Code Prompt Templates

The Complete Claude Code Setup & Configuration Guide