Claude Code A/B Testing for SaaS and Blog Monetization
Build safer A/B tests with Claude Code: hypothesis design, server assignment, event schema, SQL analysis, and rollback.
Start with the Hypothesis, Not the Toggle
A/B testing is not just showing two versions of a page. For a SaaS product or a monetized blog, it is a controlled way to ask whether one change improves a business outcome without damaging the rest of the funnel. Claude Code can generate the toggle quickly, but a useful experiment needs more: a hypothesis, random assignment, a typed event schema, guardrail metrics, sample-size discipline, rollout rules, privacy checks, and a rollback path.
Here is the plain-language vocabulary. A variant is one version being tested. Exposure is the moment a user first sees a variant and you record it. A guardrail metric is a number that must not get worse, such as page speed, paid-intent clicks, bounce rate, or JavaScript errors. A false positive is a result that looks like a win only because of random noise, repeated peeking, or a biased sample.
Give Claude Code the business question before asking for code:
Build an A/B testing workflow for a Next.js App Router SaaS/blog.
The goal is monetization, not vanity clicks.
Experiment id: pricing_page_offer_2026_06
Hypothesis: changing the pricing CTA from "Start free trial" to "Start with the free plan" will increase signup starts without reducing paid-intent clicks.
Primary metric: signup_start_rate
Guardrails: purchase_link_click_rate, p75 LCP, JavaScript error rate
Required output: event schema, server-side assignment, cookie/localStorage caveats, BigQuery-style SQL, Playwright verification, rollout and rollback checklist.
Use at least three concrete cases so the model does not produce a generic demo. For SaaS, test pricing copy, onboarding steps, or trial-to-paid prompts. For a blog, test affiliate block placement, email capture copy, or an AdSense-adjacent layout that must not reduce read completion. For a consulting or template funnel, test the order of a free checklist, product card, and booking CTA. For related implementation context, see feature flags with Claude Code and analytics implementation with Claude Code.
| Use case | Primary metric | Guardrails | Common failure |
|---|---|---|---|
| SaaS pricing CTA | Signup start rate | Paid-intent clicks, errors, LCP | More signups, lower buyer quality |
| Blog affiliate block | Product link click rate | Read completion, bounce, speed | Revenue block appears too early and hurts trust |
| Newsletter form | Completed subscriptions | Spam rate, unsubscribe rate | Count goes up while list quality drops |
| Onboarding screen | First success rate | Support tickets, activation quality | Short-term completion hides later churn |
Freeze the Event Schema Before UI Work
The most expensive A/B testing mistake is discovering after launch that the data cannot be joined. If the same click is tracked as button_click, ctaClicked, and signup_click, your analysis becomes manual cleanup. Ask Claude Code for a typed event contract first. If you use Google Analytics, read the official GA4 event reference and the Google tag parameter reference before naming events and parameters.
// lib/experiment-events.ts
export type ExperimentId = "pricing_page_offer_2026_06";
export type VariantId = "control" | "free_plan_copy";
export type ExperimentEvent =
| {
event_name: "experiment_exposure";
experiment_id: ExperimentId;
variant: VariantId;
anonymous_id: string;
page_path: string;
}
| {
event_name: "cta_click";
experiment_id: ExperimentId;
variant: VariantId;
anonymous_id: string;
cta_id: "pricing_primary" | "article_bottom" | "sidebar_offer";
page_path: string;
}
| {
event_name: "purchase_link_click";
experiment_id: ExperimentId;
variant: VariantId;
anonymous_id: string;
product_id: string;
value_usd: number;
page_path: string;
}
| {
event_name: "guardrail_metric";
experiment_id: ExperimentId;
variant: VariantId;
anonymous_id: string;
metric_name: "lcp_ms" | "js_error" | "bounce";
value: number;
page_path: string;
};
declare global {
interface Window {
gtag?: (command: "event", name: string, params: Record<string, unknown>) => void;
}
}
export function trackExperimentEvent(event: ExperimentEvent) {
if (typeof window === "undefined") return;
window.gtag?.("event", event.event_name, {
experiment_id: event.experiment_id,
variant: event.variant,
anonymous_id: event.anonymous_id,
page_path: event.page_path,
...event,
});
}
Do not put email addresses, names, company names, or free-form user input in these events. If your market requires consent for analytics or advertising storage, initialize consent before sending tags. Google documents this in its official consent mode guide. The practical rule is simple: consent state is part of the experiment setup, not a patch you add after the dashboard is live.
Assign Variants on the Server
Client-only assignment with localStorage is tempting because it is easy. It also creates real problems: first-paint flicker, different variants before and after login, private-browsing resets, blocked storage, and unreliable bot behavior. MDN describes localStorage as origin-scoped storage that persists across browser sessions, but that does not make it a good source of truth for first render. See MDN localStorage.
For a Next.js App Router app, a Route Handler is a small, copy-pasteable starting point. The current Next.js docs describe route.ts as a file convention for custom request handlers using Web Request and Response APIs. NextResponse can set cookies; its API is documented in the official NextResponse reference. If you do need edge request rewriting, note that Next.js 16 renamed Middleware to Proxy; use the official proxy.js reference.
// app/api/experiments/assign/route.ts
import { NextRequest, NextResponse } from "next/server";
export const runtime = "edge";
type Variant = "control" | "free_plan_copy";
const EXPERIMENTS = {
pricing_page_offer_2026_06: {
cookieName: "ab_pricing_page_offer_2026_06",
variants: [
{ id: "control", weight: 50 },
{ id: "free_plan_copy", weight: 50 },
] satisfies Array<{ id: Variant; weight: number }>,
},
};
function hashToBucket(input: string) {
let hash = 2166136261;
for (let index = 0; index < input.length; index += 1) {
hash ^= input.charCodeAt(index);
hash = Math.imul(hash, 16777619);
}
return Math.abs(hash) % 100;
}
function chooseVariant(experimentId: keyof typeof EXPERIMENTS, anonymousId: string): Variant {
const experiment = EXPERIMENTS[experimentId];
const bucket = hashToBucket(`${experimentId}:${anonymousId}`);
let cumulative = 0;
for (const variant of experiment.variants) {
cumulative += variant.weight;
if (bucket < cumulative) return variant.id;
}
return experiment.variants[0].id;
}
export async function GET(request: NextRequest) {
const experimentId = request.nextUrl.searchParams.get("experiment");
if (experimentId !== "pricing_page_offer_2026_06") {
return NextResponse.json({ error: "Unknown experiment" }, { status: 404 });
}
const experiment = EXPERIMENTS[experimentId];
const testAnonymousId = request.headers.get("x-test-anonymous-id");
const existingCookie = request.cookies.get(experiment.cookieName)?.value;
const anonymousId = testAnonymousId ?? existingCookie ?? crypto.randomUUID();
const variant = chooseVariant(experimentId, anonymousId);
const response = NextResponse.json({
experimentId,
variant,
anonymousId,
});
response.cookies.set(experiment.cookieName, anonymousId, {
httpOnly: true,
sameSite: "lax",
secure: process.env.NODE_ENV === "production",
path: "/",
maxAge: 60 * 60 * 24 * 30,
});
return response;
}
Cookies also need caveats. MDN’s secure cookie configuration guide covers Secure, HttpOnly, and SameSite. A SaaS app can often use a stable hashed account or user id after login. A public blog may only have a short-lived anonymous cookie. Cross-device identity, consent, ad platform policy, and regional privacy rules should be handled before the experiment starts.
Separate the Experiment from the Rollout
A winning test is still a release risk. Put the experiment behind a feature flag so you can change allocation without redeploying. Vercel users can evaluate whether Vercel Flags fits their stack, but a config file is enough for a first controlled test.
# config/experiments.yaml
experiments:
pricing_page_offer_2026_06:
status: running
owner: masa
hypothesis: "Free-plan copy increases signup starts without hurting paid intent."
allocation_percent: 50
variants:
control: 50
free_plan_copy: 50
primary_metric: signup_start_rate
guardrails:
- purchase_link_click_rate
- p75_lcp_ms
- js_error_rate
rollback:
if_js_error_rate_increases_by: 0.02
if_p75_lcp_ms_worse_by_ms: 300
action: "set allocation_percent to 0 and keep logging exposure for audit"
The rollback rule matters because teams get emotionally attached to the new version. If errors increase, LCP gets worse, or paid-intent clicks drop, stop the exposure and keep logging enough state to audit what happened. Roll out from 10% to 50% to 100% only after the primary metric and guardrails are stable.
Analyze from Exposure and Avoid False Positives
Use exposure as the denominator. Users who never saw a variant should not be counted. Users who saw multiple variants should be excluded or investigated. The following query is intentionally modest: it summarizes conversion and guardrails without pretending to replace statistical review. BigQuery’s official SAFE_DIVIDE documentation is useful for avoiding divide-by-zero failures in dashboards.
-- BigQuery Standard SQL
WITH exposure_raw AS (
SELECT
anonymous_id,
experiment_id,
ARRAY_AGG(variant ORDER BY event_timestamp LIMIT 1)[OFFSET(0)] AS variant,
MIN(event_timestamp) AS first_exposed_at,
COUNT(DISTINCT variant) AS variant_count
FROM `project.dataset.events`
WHERE event_name = 'experiment_exposure'
AND experiment_id = 'pricing_page_offer_2026_06'
GROUP BY anonymous_id, experiment_id
),
exposure AS (
SELECT anonymous_id, experiment_id, variant, first_exposed_at
FROM exposure_raw
WHERE variant_count = 1
),
events_after_exposure AS (
SELECT
e.variant,
e.anonymous_id,
ev.event_name,
ev.value_usd,
ev.value_ms
FROM exposure e
LEFT JOIN `project.dataset.events` ev
ON ev.anonymous_id = e.anonymous_id
AND ev.experiment_id = e.experiment_id
AND ev.event_timestamp >= e.first_exposed_at
)
SELECT
variant,
COUNT(DISTINCT anonymous_id) AS exposed_users,
COUNT(DISTINCT IF(event_name = 'cta_click', anonymous_id, NULL)) AS cta_users,
SAFE_DIVIDE(
COUNT(DISTINCT IF(event_name = 'cta_click', anonymous_id, NULL)),
COUNT(DISTINCT anonymous_id)
) AS cta_click_rate,
COUNT(DISTINCT IF(event_name = 'purchase_link_click', anonymous_id, NULL)) AS purchase_intent_users,
SAFE_DIVIDE(
COUNT(DISTINCT IF(event_name = 'purchase_link_click', anonymous_id, NULL)),
COUNT(DISTINCT anonymous_id)
) AS purchase_intent_rate,
AVG(IF(event_name = 'guardrail_metric' AND value_ms IS NOT NULL, value_ms, NULL)) AS avg_guardrail_ms,
SUM(IF(event_name = 'guardrail_metric' AND value_usd IS NOT NULL, value_usd, 0)) AS revenue_proxy_usd
FROM events_after_exposure
GROUP BY variant
ORDER BY variant;
Sample size must be planned before launch. If you peek every day and stop as soon as the new version looks good, you increase the chance of a false positive. The same problem appears when you test many variants, slice by many segments, change the primary metric after the fact, or start the experiment on the same day as a paid campaign. Ask Claude Code to produce a pre-launch decision record: minimum sample, observation window, exclusion rules, guardrails, and the exact date when results can be reviewed.
Verify the Implementation with Playwright
Before publishing, check the mechanics. The same anonymous id should receive the same variant. Unknown experiment ids should fail. The monetization CTA should render once. Playwright documents the test and expect APIs in its official test reference, and its assertions guide explains auto-retrying web assertions.
// tests/experiments.spec.ts
import { test, expect } from "@playwright/test";
test.describe("pricing_page_offer_2026_06", () => {
test("keeps assignment stable for the same anonymous id", async ({ request, baseURL }) => {
const url = `${baseURL}/api/experiments/assign?experiment=pricing_page_offer_2026_06`;
const headers = { "x-test-anonymous-id": "demo-user-42" };
const first = await request.get(url, { headers });
const second = await request.get(url, { headers });
expect(first.ok()).toBeTruthy();
expect(second.ok()).toBeTruthy();
expect(await first.json()).toMatchObject(await second.json());
});
test("rejects unknown experiments", async ({ request, baseURL }) => {
const response = await request.get(`${baseURL}/api/experiments/assign?experiment=missing`);
expect(response.status()).toBe(404);
});
test("renders one monetization CTA on the pricing page", async ({ page }) => {
await page.goto("/pricing?e2e_anonymous_id=demo-user-42");
await expect(page.getByTestId("pricing-cta")).toBeVisible();
await expect(page.getByTestId("pricing-cta")).toHaveCount(1);
});
});
This test does not prove revenue lift. It proves that the experiment plumbing is not broken. That distinction is important: broken assignment can make any later statistical result meaningless.
Privacy, Consent, and Operational Discipline
A monetization experiment touches user behavior data, so keep the rules explicit. Do not send personal data in event parameters. Respect consent before analytics or ad-related storage. Define what happens when cookies are blocked. Document the owner, start date, end date, hypothesis, metrics, guardrails, and rollback steps. Keep the result even if the test loses; the loss prevents the same idea from being repeated later.
For blogs, the strongest short-term revenue treatment is sometimes the one that damages trust. More ads can create more impressions while lowering read completion and search performance. For SaaS, softer copy can increase free signups while lowering paid conversion. That is why guardrails are not optional. They protect the business from a local optimum.
Masa’s practical result from using this workflow was not a magical lift number; it was fewer rework cycles. Once the event table and assignment test were written before the UI copy, Claude Code stopped inventing event names and the rollout conversation became concrete. In a small CTA test, the Playwright stable-id check caught a localStorage flicker before publication, which pushed the implementation back to server-side assignment.
Use Claude Code for code, types, tests, SQL drafts, and documentation. Keep legal judgment, consent policy, and the final statistical decision with humans. For implementation help beyond this article, review the Claude Code training page or the product templates and connect the experiment to a real monetization funnel.
Free PDF: Claude Code Cheatsheet
Enter your email and download the one-page Claude Code cheatsheet for commands, review habits, and safe workflows.
We handle your data with care and never send spam.
Level up your Claude Code workflow
Start with the free PDF, use Gumroad guides when you need repeatable workflows, and book consultation when rollout or revenue paths need human judgment.
About the Author
Masa
Engineer focused on practical Claude Code workflows. Runs claudecode-lab.com, a 10-language technical media site.
Related Posts
Claude Code Permission Safety Ladder: Expand Access Without Losing Control
A beginner-friendly ladder for moving Claude Code from read-only to limited edits, proof commands, and deploy checks.
Claude Code Small PR Proof Pack: Make Tiny Changes Reviewable
A practical proof pack for Claude Code PRs: diff, checks, public URL, CTA path, and rollback note.
Claude Code Review Gate Before Commit: Diff, Tests, Public URL, and CTA Checks
A commit-time review gate for Claude Code work: diff scope, build, public URL, revenue CTA links, missing tests, and unrelated files.
Related Products
50 Battle-Tested Claude Code Prompt Templates
Copy, paste, ship. 50 production-ready prompts.
Use proven prompts for code review, refactoring, testing, documentation, debugging, architecture, and incident response.
The Complete Claude Code Setup & Configuration Guide
From install to team-ready workflow.
A practical guide to installation, CLAUDE.md, hooks, MCP servers, permissions, IDE setup, and CI/CD workflows.