Claude Code A/B Testing for SaaS and Blog Monetization

Start with the Hypothesis, Not the Toggle

A/B testing is not just showing two versions of a page. For a SaaS product or a monetized blog, it is a controlled way to ask whether one change improves a business outcome without damaging the rest of the funnel. Claude Code can generate the toggle quickly, but a useful experiment needs more: a hypothesis, random assignment, a typed event schema, guardrail metrics, sample-size discipline, rollout rules, privacy checks, and a rollback path.

Here is the plain-language vocabulary. A variant is one version being tested. Exposure is the moment a user first sees a variant and you record it. A guardrail metric is a number that must not get worse, such as page speed, paid-intent clicks, bounce rate, or JavaScript errors. A false positive is a result that looks like a win only because of random noise, repeated peeking, or a biased sample.

Give Claude Code the business question before asking for code:

Build an A/B testing workflow for a Next.js App Router SaaS/blog.
The goal is monetization, not vanity clicks.

Experiment id: pricing_page_offer_2026_06
Hypothesis: changing the pricing CTA from "Start free trial" to "Start with the free plan" will increase signup starts without reducing paid-intent clicks.
Primary metric: signup_start_rate
Guardrails: purchase_link_click_rate, p75 LCP, JavaScript error rate
Required output: event schema, server-side assignment, cookie/localStorage caveats, BigQuery-style SQL, Playwright verification, rollout and rollback checklist.

Use at least three concrete cases so the model does not produce a generic demo. For SaaS, test pricing copy, onboarding steps, or trial-to-paid prompts. For a blog, test affiliate block placement, email capture copy, or an AdSense-adjacent layout that must not reduce read completion. For a consulting or template funnel, test the order of a free checklist, product card, and booking CTA. For related implementation context, see feature flags with Claude Code and analytics implementation with Claude Code.

Use case	Primary metric	Guardrails	Common failure
SaaS pricing CTA	Signup start rate	Paid-intent clicks, errors, LCP	More signups, lower buyer quality
Blog affiliate block	Product link click rate	Read completion, bounce, speed	Revenue block appears too early and hurts trust
Newsletter form	Completed subscriptions	Spam rate, unsubscribe rate	Count goes up while list quality drops
Onboarding screen	First success rate	Support tickets, activation quality	Short-term completion hides later churn

Freeze the Event Schema Before UI Work

The most expensive A/B testing mistake is discovering after launch that the data cannot be joined. If the same click is tracked as button_click, ctaClicked, and signup_click, your analysis becomes manual cleanup. Ask Claude Code for a typed event contract first. If you use Google Analytics, read the official GA4 event reference and the Google tag parameter reference before naming events and parameters.

// lib/experiment-events.ts
export type ExperimentId = "pricing_page_offer_2026_06";
export type VariantId = "control" | "free_plan_copy";

export type ExperimentEvent =
  | {
      event_name: "experiment_exposure";
      experiment_id: ExperimentId;
      variant: VariantId;
      anonymous_id: string;
      page_path: string;
    }
  | {
      event_name: "cta_click";
      experiment_id: ExperimentId;
      variant: VariantId;
      anonymous_id: string;
      cta_id: "pricing_primary" | "article_bottom" | "sidebar_offer";
      page_path: string;
    }
  | {
      event_name: "purchase_link_click";
      experiment_id: ExperimentId;
      variant: VariantId;
      anonymous_id: string;
      product_id: string;
      value_usd: number;
      page_path: string;
    }
  | {
      event_name: "guardrail_metric";
      experiment_id: ExperimentId;
      variant: VariantId;
      anonymous_id: string;
      metric_name: "lcp_ms" | "js_error" | "bounce";
      value: number;
      page_path: string;
    };

declare global {
  interface Window {
    gtag?: (command: "event", name: string, params: Record<string, unknown>) => void;
  }
}

export function trackExperimentEvent(event: ExperimentEvent) {
  if (typeof window === "undefined") return;

  window.gtag?.("event", event.event_name, {
    experiment_id: event.experiment_id,
    variant: event.variant,
    anonymous_id: event.anonymous_id,
    page_path: event.page_path,
    ...event,
  });
}

Do not put email addresses, names, company names, or free-form user input in these events. If your market requires consent for analytics or advertising storage, initialize consent before sending tags. Google documents this in its official consent mode guide. The practical rule is simple: consent state is part of the experiment setup, not a patch you add after the dashboard is live.

Assign Variants on the Server

Client-only assignment with localStorage is tempting because it is easy. It also creates real problems: first-paint flicker, different variants before and after login, private-browsing resets, blocked storage, and unreliable bot behavior. MDN describes localStorage as origin-scoped storage that persists across browser sessions, but that does not make it a good source of truth for first render. See MDN localStorage.

For a Next.js App Router app, a Route Handler is a small, copy-pasteable starting point. The current Next.js docs describe route.ts as a file convention for custom request handlers using Web Request and Response APIs. NextResponse can set cookies; its API is documented in the official NextResponse reference. If you do need edge request rewriting, note that Next.js 16 renamed Middleware to Proxy; use the official proxy.js reference.

// app/api/experiments/assign/route.ts
import { NextRequest, NextResponse } from "next/server";

export const runtime = "edge";

type Variant = "control" | "free_plan_copy";

const EXPERIMENTS = {
  pricing_page_offer_2026_06: {
    cookieName: "ab_pricing_page_offer_2026_06",
    variants: [
      { id: "control", weight: 50 },
      { id: "free_plan_copy", weight: 50 },
    ] satisfies Array<{ id: Variant; weight: number }>,
  },
};

function hashToBucket(input: string) {
  let hash = 2166136261;
  for (let index = 0; index < input.length; index += 1) {
    hash ^= input.charCodeAt(index);
    hash = Math.imul(hash, 16777619);
  }
  return Math.abs(hash) % 100;
}

function chooseVariant(experimentId: keyof typeof EXPERIMENTS, anonymousId: string): Variant {
  const experiment = EXPERIMENTS[experimentId];
  const bucket = hashToBucket(`${experimentId}:${anonymousId}`);
  let cumulative = 0;

  for (const variant of experiment.variants) {
    cumulative += variant.weight;
    if (bucket < cumulative) return variant.id;
  }

  return experiment.variants[0].id;
}

export async function GET(request: NextRequest) {
  const experimentId = request.nextUrl.searchParams.get("experiment");

  if (experimentId !== "pricing_page_offer_2026_06") {
    return NextResponse.json({ error: "Unknown experiment" }, { status: 404 });
  }

  const experiment = EXPERIMENTS[experimentId];
  const testAnonymousId = request.headers.get("x-test-anonymous-id");
  const existingCookie = request.cookies.get(experiment.cookieName)?.value;
  const anonymousId = testAnonymousId ?? existingCookie ?? crypto.randomUUID();
  const variant = chooseVariant(experimentId, anonymousId);

  const response = NextResponse.json({
    experimentId,
    variant,
    anonymousId,
  });

  response.cookies.set(experiment.cookieName, anonymousId, {
    httpOnly: true,
    sameSite: "lax",
    secure: process.env.NODE_ENV === "production",
    path: "/",
    maxAge: 60 * 60 * 24 * 30,
  });

  return response;
}

Cookies also need caveats. MDN’s secure cookie configuration guide covers Secure, HttpOnly, and SameSite. A SaaS app can often use a stable hashed account or user id after login. A public blog may only have a short-lived anonymous cookie. Cross-device identity, consent, ad platform policy, and regional privacy rules should be handled before the experiment starts.

Separate the Experiment from the Rollout

A winning test is still a release risk. Put the experiment behind a feature flag so you can change allocation without redeploying. Vercel users can evaluate whether Vercel Flags fits their stack, but a config file is enough for a first controlled test.

# config/experiments.yaml
experiments:
  pricing_page_offer_2026_06:
    status: running
    owner: masa
    hypothesis: "Free-plan copy increases signup starts without hurting paid intent."
    allocation_percent: 50
    variants:
      control: 50
      free_plan_copy: 50
    primary_metric: signup_start_rate
    guardrails:
      - purchase_link_click_rate
      - p75_lcp_ms
      - js_error_rate
    rollback:
      if_js_error_rate_increases_by: 0.02
      if_p75_lcp_ms_worse_by_ms: 300
      action: "set allocation_percent to 0 and keep logging exposure for audit"

The rollback rule matters because teams get emotionally attached to the new version. If errors increase, LCP gets worse, or paid-intent clicks drop, stop the exposure and keep logging enough state to audit what happened. Roll out from 10% to 50% to 100% only after the primary metric and guardrails are stable.

Analyze from Exposure and Avoid False Positives

Use exposure as the denominator. Users who never saw a variant should not be counted. Users who saw multiple variants should be excluded or investigated. The following query is intentionally modest: it summarizes conversion and guardrails without pretending to replace statistical review. BigQuery’s official SAFE_DIVIDE documentation is useful for avoiding divide-by-zero failures in dashboards.

-- BigQuery Standard SQL
WITH exposure_raw AS (
  SELECT
    anonymous_id,
    experiment_id,
    ARRAY_AGG(variant ORDER BY event_timestamp LIMIT 1)[OFFSET(0)] AS variant,
    MIN(event_timestamp) AS first_exposed_at,
    COUNT(DISTINCT variant) AS variant_count
  FROM `project.dataset.events`
  WHERE event_name = 'experiment_exposure'
    AND experiment_id = 'pricing_page_offer_2026_06'
  GROUP BY anonymous_id, experiment_id
),
exposure AS (
  SELECT anonymous_id, experiment_id, variant, first_exposed_at
  FROM exposure_raw
  WHERE variant_count = 1
),
events_after_exposure AS (
  SELECT
    e.variant,
    e.anonymous_id,
    ev.event_name,
    ev.value_usd,
    ev.value_ms
  FROM exposure e
  LEFT JOIN `project.dataset.events` ev
    ON ev.anonymous_id = e.anonymous_id
   AND ev.experiment_id = e.experiment_id
   AND ev.event_timestamp >= e.first_exposed_at
)
SELECT
  variant,
  COUNT(DISTINCT anonymous_id) AS exposed_users,
  COUNT(DISTINCT IF(event_name = 'cta_click', anonymous_id, NULL)) AS cta_users,
  SAFE_DIVIDE(
    COUNT(DISTINCT IF(event_name = 'cta_click', anonymous_id, NULL)),
    COUNT(DISTINCT anonymous_id)
  ) AS cta_click_rate,
  COUNT(DISTINCT IF(event_name = 'purchase_link_click', anonymous_id, NULL)) AS purchase_intent_users,
  SAFE_DIVIDE(
    COUNT(DISTINCT IF(event_name = 'purchase_link_click', anonymous_id, NULL)),
    COUNT(DISTINCT anonymous_id)
  ) AS purchase_intent_rate,
  AVG(IF(event_name = 'guardrail_metric' AND value_ms IS NOT NULL, value_ms, NULL)) AS avg_guardrail_ms,
  SUM(IF(event_name = 'guardrail_metric' AND value_usd IS NOT NULL, value_usd, 0)) AS revenue_proxy_usd
FROM events_after_exposure
GROUP BY variant
ORDER BY variant;

Sample size must be planned before launch. If you peek every day and stop as soon as the new version looks good, you increase the chance of a false positive. The same problem appears when you test many variants, slice by many segments, change the primary metric after the fact, or start the experiment on the same day as a paid campaign. Ask Claude Code to produce a pre-launch decision record: minimum sample, observation window, exclusion rules, guardrails, and the exact date when results can be reviewed.

Verify the Implementation with Playwright

Before publishing, check the mechanics. The same anonymous id should receive the same variant. Unknown experiment ids should fail. The monetization CTA should render once. Playwright documents the test and expect APIs in its official test reference, and its assertions guide explains auto-retrying web assertions.

// tests/experiments.spec.ts
import { test, expect } from "@playwright/test";

test.describe("pricing_page_offer_2026_06", () => {
  test("keeps assignment stable for the same anonymous id", async ({ request, baseURL }) => {
    const url = `${baseURL}/api/experiments/assign?experiment=pricing_page_offer_2026_06`;
    const headers = { "x-test-anonymous-id": "demo-user-42" };

    const first = await request.get(url, { headers });
    const second = await request.get(url, { headers });

    expect(first.ok()).toBeTruthy();
    expect(second.ok()).toBeTruthy();
    expect(await first.json()).toMatchObject(await second.json());
  });

  test("rejects unknown experiments", async ({ request, baseURL }) => {
    const response = await request.get(`${baseURL}/api/experiments/assign?experiment=missing`);

    expect(response.status()).toBe(404);
  });

  test("renders one monetization CTA on the pricing page", async ({ page }) => {
    await page.goto("/pricing?e2e_anonymous_id=demo-user-42");

    await expect(page.getByTestId("pricing-cta")).toBeVisible();
    await expect(page.getByTestId("pricing-cta")).toHaveCount(1);
  });
});

This test does not prove revenue lift. It proves that the experiment plumbing is not broken. That distinction is important: broken assignment can make any later statistical result meaningless.

A monetization experiment touches user behavior data, so keep the rules explicit. Do not send personal data in event parameters. Respect consent before analytics or ad-related storage. Define what happens when cookies are blocked. Document the owner, start date, end date, hypothesis, metrics, guardrails, and rollback steps. Keep the result even if the test loses; the loss prevents the same idea from being repeated later.

For blogs, the strongest short-term revenue treatment is sometimes the one that damages trust. More ads can create more impressions while lowering read completion and search performance. For SaaS, softer copy can increase free signups while lowering paid conversion. That is why guardrails are not optional. They protect the business from a local optimum.

Masa’s practical result from using this workflow was not a magical lift number; it was fewer rework cycles. Once the event table and assignment test were written before the UI copy, Claude Code stopped inventing event names and the rollout conversation became concrete. In a small CTA test, the Playwright stable-id check caught a localStorage flicker before publication, which pushed the implementation back to server-side assignment.

Use Claude Code for code, types, tests, SQL drafts, and documentation. Keep legal judgment, consent policy, and the final statistical decision with humans. For implementation help beyond this article, review the Claude Code training page or the product templates and connect the experiment to a real monetization funnel.

Claude Code A/B Testing for SaaS and Blog Monetization

Start with the Hypothesis, Not the Toggle

Freeze the Event Schema Before UI Work

Assign Variants on the Server

Separate the Experiment from the Rollout

Analyze from Exposure and Avoid False Positives

Verify the Implementation with Playwright

Free PDF: Claude Code Cheatsheet

Level up your Claude Code workflow

Related Posts

Claude Code Permission Safety Ladder: Expand Access Without Losing Control

Claude Code Small PR Proof Pack: Make Tiny Changes Reviewable

Claude Code Review Gate Before Commit: Diff, Tests, Public URL, and CTA Checks

Related Products

50 Battle-Tested Claude Code Prompt Templates

The Complete Claude Code Setup & Configuration Guide

Start with the Hypothesis, Not the Toggle

Freeze the Event Schema Before UI Work

Assign Variants on the Server

Separate the Experiment from the Rollout

Analyze from Exposure and Avoid False Positives

Verify the Implementation with Playwright

Privacy, Consent, and Operational Discipline

Free PDF: Claude Code Cheatsheet

Level up your Claude Code workflow

Related Posts

Claude Code Permission Safety Ladder: Expand Access Without Losing Control

Claude Code Small PR Proof Pack: Make Tiny Changes Reviewable

Claude Code Review Gate Before Commit: Diff, Tests, Public URL, and CTA Checks

Related Products

50 Battle-Tested Claude Code Prompt Templates

The Complete Claude Code Setup & Configuration Guide