Claude Code API Rate Limiting Guide: 429, Redis, and Cloudflare

API rate limiting means deciding how many requests the same client may send in a short period, then asking that client to slow down when it crosses the line. It is like handing out numbered tickets at a busy counter. You are not closing the service for everyone; you are stopping one user, bot, script, or integration from consuming more than its fair share.

Claude Code can build endpoints, authentication checks, and tests quickly. The trap is that a working API without rate limits can still be unsafe for production. Login attempts, search pages, AI generation, SMS verification, email sending, and webhook retries all have a real cost. When Masa tested a small contact form, duplicate submits looked harmless until the mail provider quota started disappearing during QA. The bug was not the form alone. The missing design was: “how often may this action happen, and what happens after the limit?”

This guide turns rate limiting into a practical Claude Code workflow. You will get a design checklist, a no-dependency Node.js demo, a Redis-backed Express implementation, client retry code, Cloudflare placement advice, security failure cases, and a consulting CTA. For adjacent foundations, read production API development with Claude Code, Claude Code security best practices, and the Cloudflare Workers guide.

Keep the official references open while adapting the code: Cloudflare Rate limiting rules, OWASP API Security 2023 API4: Unrestricted Resource Consumption and API6: Unrestricted Access to Sensitive Business Flows, plus MDN’s 429 Too Many Requests reference.

Decide What You Are Protecting

Beginners often start with a number such as “60 requests per minute.” That is backwards. First decide what is at risk. Rate limiting protects server capacity, external API spend, inventory, password reset flows, email quotas, AI credits, lead quality, and business rules.

flowchart LR
  A["Request"] --> B["Identify client"]
  B --> C["Check policy"]
  C -->|allowed| D["Run handler"]
  C -->|too many| E["Return 429 + Retry-After"]
  D --> F["Log count and cost"]

Here are realistic starting points:

Use case	Limit key	Starting point	What it protects
Login, OTP, password reset	IP + account id	5 attempts / 10 min	Brute force, SMS cost
Search and list APIs	User id + path	60 requests / min	Database load, scraping
AI or image generation	User id + plan	10 per day for free plans	LLM spend, free tier
Webhook receivers	Sender + event id	Allow short retry bursts	Duplicate processing, queues

Do not rely on IP alone. In a company, school, or mobile carrier network, many legitimate users can share one IP address. Attackers can also rotate IP addresses. For authenticated APIs, combine user id, API key, organization id, plan, endpoint, and sometimes operation type.

Give Claude Code a Real Specification

“Add rate limiting” is too vague. Tell Claude Code the algorithm, keying strategy, response shape, headers, tests, logs, and local-vs-production storage. This prompt is a useful starting point:

Add rate limiting to the existing API.

Requirements:
- Scope: POST /api/contact and POST /api/login
- If authenticated, key by userId; otherwise key by IP
- 429 JSON body: { "error": "rate_limited", "retryAfter": seconds }
- Return Retry-After, X-RateLimit-Limit, X-RateLimit-Remaining
- Tests must cover allowed requests, limit reached, and recovery after time passes
- Use Redis in production and an in-memory store locally
- Make limits configurable through environment variables

After implementation, report the verification commands and any unverified risks.

This makes the acceptance criteria reviewable. It also prevents the common failure where Claude Code adds middleware but forgets the client contract. Pair this with the API testing automation guide so 429 behavior is tested just like success responses.

Runnable Minimal Example: Node.js 429 Server

Save this as rate-limit-demo.mjs and run it with Node.js 20 or newer. It uses a token bucket: the bucket refills at a steady rate, and every request consumes one token. That allows a short burst while controlling the long-term average.

import http from "node:http";

class TokenBucket {
  constructor({ capacity, refillPerSecond }) {
    this.capacity = capacity;
    this.refillPerSecond = refillPerSecond;
    this.tokens = capacity;
    this.updatedAt = Date.now();
  }

  take(now = Date.now()) {
    const elapsed = (now - this.updatedAt) / 1000;
    this.tokens = Math.min(
      this.capacity,
      this.tokens + elapsed * this.refillPerSecond,
    );
    this.updatedAt = now;

    if (this.tokens >= 1) {
      this.tokens -= 1;
      return { allowed: true, remaining: Math.floor(this.tokens), retryAfter: 0 };
    }

    const missing = 1 - this.tokens;
    const retryAfter = Math.ceil(missing / this.refillPerSecond);
    return { allowed: false, remaining: 0, retryAfter };
  }
}

const buckets = new Map();

function clientKey(req) {
  return req.headers["x-api-key"] ?? req.socket.remoteAddress ?? "anonymous";
}

function checkLimit(req) {
  const key = clientKey(req);
  if (!buckets.has(key)) {
    buckets.set(key, new TokenBucket({ capacity: 5, refillPerSecond: 1 }));
  }
  return buckets.get(key).take();
}

const server = http.createServer((req, res) => {
  if (req.url !== "/api/demo") {
    res.writeHead(404, { "content-type": "application/json" });
    res.end(JSON.stringify({ error: "not_found" }));
    return;
  }

  const result = checkLimit(req);
  res.setHeader("X-RateLimit-Limit", "5");
  res.setHeader("X-RateLimit-Remaining", String(result.remaining));

  if (!result.allowed) {
    res.writeHead(429, {
      "content-type": "application/json",
      "Retry-After": String(result.retryAfter),
    });
    res.end(JSON.stringify({
      error: "rate_limited",
      retryAfter: result.retryAfter,
    }));
    return;
  }

  res.writeHead(200, { "content-type": "application/json" });
  res.end(JSON.stringify({ ok: true, remaining: result.remaining }));
});

server.listen(3000, () => {
  console.log("Listening on http://localhost:3000/api/demo");
});

node rate-limit-demo.mjs

In another terminal:

for i in 1 2 3 4 5 6 7; do
  curl -i http://localhost:3000/api/demo
done

On Windows PowerShell:

1..7 | ForEach-Object {
  curl.exe -i http://localhost:3000/api/demo
}

The sixth or seventh request should return 429 Too Many Requests. MDN notes that a 429 response can include Retry-After; that header is the difference between a client that politely waits and a client that keeps hammering the server.

Production Shape: Redis Sliding Window

The in-memory version is good for learning, but it breaks when you run multiple API instances. Server A may think the user has no tokens left while server B still has a full bucket. Redis gives all instances one shared counter.

This Express example uses a Redis sorted set as a sliding window. A sliding window counts the last 60 seconds from “now” instead of resetting every minute on the clock, which avoids sharp boundary bursts.

npm init -y
npm i express ioredis
docker run --rm --name redis-rate-limit -p 6379:6379 redis:7-alpine

import express from "express";
import Redis from "ioredis";

const app = express();
const redis = new Redis(process.env.REDIS_URL ?? "redis://127.0.0.1:6379");

const limitScript = `
local key = KEYS[1]
local limit = tonumber(ARGV[1])
local window_ms = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local member = ARGV[4]

redis.call("ZREMRANGEBYSCORE", key, 0, now - window_ms)

local count = redis.call("ZCARD", key)
if count >= limit then
  local oldest = redis.call("ZRANGE", key, 0, 0, "WITHSCORES")[2]
  local retry_ms = math.max(1, oldest + window_ms - now)
  return {0, 0, retry_ms}
end

redis.call("ZADD", key, now, member)
redis.call("PEXPIRE", key, window_ms)
return {1, limit - count - 1, 0}
`;

async function rateLimit(req, res, next) {
  const user = req.get("authorization")?.replace(/^Bearer\s+/i, "");
  const identity = user || req.ip || "anonymous";
  const key = `rl:${identity}:${req.path}`;
  const limit = Number(process.env.RATE_LIMIT_REQUESTS ?? 10);
  const windowMs = Number(process.env.RATE_LIMIT_WINDOW_MS ?? 60000);
  const now = Date.now();
  const member = `${now}:${Math.random()}`;

  const [allowed, remaining, retryMs] = await redis.eval(
    limitScript,
    1,
    key,
    limit,
    windowMs,
    now,
    member,
  );

  res.setHeader("X-RateLimit-Limit", String(limit));
  res.setHeader("X-RateLimit-Remaining", String(remaining));

  if (allowed === 1) return next();

  const retryAfter = Math.ceil(Number(retryMs) / 1000);
  res.setHeader("Retry-After", String(retryAfter));
  res.status(429).json({ error: "rate_limited", retryAfter });
}

app.use(rateLimit);

app.get("/api/search", (req, res) => {
  res.json({ data: ["claude-code", "rate-limit"], at: new Date().toISOString() });
});

app.listen(3000, () => {
  console.log("API ready on http://localhost:3000/api/search");
});

node redis-rate-limit-server.mjs

for i in $(seq 1 12); do
  curl -s -o /dev/null -w "%{http_code}\n" http://localhost:3000/api/search
done

When asking Claude Code for a production version, also specify the Redis failure mode. A marketing contact form might fail open for a short time. Login, payment, or AI-credit endpoints may need to fail closed. That is a business decision, not a library default.

Clients Must Respect Retry-After

A server-side limiter is only half the design. SDKs, batch jobs, and webhook senders should read Retry-After and wait before retrying.

const sleep = (ms) => new Promise((resolve) => setTimeout(resolve, ms));

async function fetchWithRateLimit(url, options = {}, maxRetries = 3) {
  for (let attempt = 0; attempt <= maxRetries; attempt += 1) {
    const res = await fetch(url, options);
    if (res.status !== 429) return res;

    const retryAfter = Number(res.headers.get("retry-after") ?? "1");
    const waitMs = Math.max(1, retryAfter) * 1000;
    console.log(`429 received. Waiting ${waitMs}ms before retry.`);
    await sleep(waitMs);
  }

  throw new Error("Rate limit retry budget exhausted");
}

for (let i = 0; i < 8; i += 1) {
  const res = await fetchWithRateLimit("http://localhost:3000/api/demo");
  console.log(i + 1, res.status, await res.text());
}

For external API clients, tell Claude Code: “On 429, use exponential backoff, prefer Retry-After if present, cap retries, and log the final failure.” Infinite retry loops are a production incident waiting to happen.

Cloudflare at the Edge, App Logic by User

Cloudflare Rate Limiting Rules are strong at rejecting obvious traffic spikes before they reach your origin. The official docs describe expressions, periods, request thresholds, mitigation timeouts, and actions. They are a good fit for login pages, public search APIs, admin routes, AI generation entry points, and known bot patterns.

Cloudflare alone is not enough for product limits. Free-vs-paid plan quotas, organization usage, per-user AI credits, refund abuse, and invitation abuse require application data. In practice, use layers:

Layer	Role	Example
Cloudflare/WAF	Stop obvious bursts and bot traffic early	Limit `/api/login` by IP
Application	Enforce user, organization, plan, and operation rules	Free users get 10 generations/day
Queue/worker	Smooth expensive asynchronous work	Email, image generation, PDF jobs
Billing/monitoring	Detect cost anomalies	SMS spend and LLM usage alerts

OWASP API4 treats unlimited CPU, memory, file size, and third-party-service consumption as a security risk. OWASP API6 covers automated abuse of sensitive business flows such as purchasing, reservation, posting, and referrals. Rate limiting is therefore not just anti-DDoS plumbing. It is revenue protection against free-tier exhaustion, resale abuse, spam, SMS cost spikes, and automated account attacks.

Failure Cases to Avoid

The first mistake is one global limit for everything. A profile read endpoint and a password reset endpoint should not share the same threshold. Judge by cost and risk per operation.

The second mistake is an undefined 429 response. If one route returns HTML, another returns plain text, and another returns JSON, clients become brittle. Standardize the JSON body, Retry-After, and rate-limit headers.

The third mistake is counting only successful requests. Failed logins, invalid payloads, and password reset attempts for unknown accounts can still cost money and expose abuse signals. Often failures need stricter limits than successes.

The fourth mistake is storing personal data in limiter keys. Do not put raw email addresses or phone numbers into Redis keys and logs. Hash them when needed and keep TTLs short.

The fifth mistake is slow tests. A 60-second window should not make CI sleep for 60 seconds. Design the limiter so tests can inject now.

The final mistake is blocking legitimate infrastructure. Search bots, uptime checks, internal monitoring, payment webhooks, and verified partner callbacks may need separate policies. Exceptions should be narrow and logged, not broad bypasses.

Claude Code Review Checklist

After implementation, ask Claude Code to review these points:

Is every 429 response the same JSON shape?
Are Retry-After and remaining-count headers set?
Is the key strategy correct for IP, user id, API key, and organization id?
Is Redis failure behavior explicit?
Are authentication failures, validation failures, and external API failures counted where needed?
Do tests cover allowed, blocked, and recovered states?
Are admin, monitoring, webhook, and crawler exceptions narrow?

This is not only code quality. For AI, SMS, email, and payment-backed products, limiter mistakes show up directly in the bill.

Consulting CTA

ClaudeCodeLab covers API implementation, security review, rate limiting, billing safeguards, and monitoring in Claude Code training and consulting. With an existing Next.js, Express, Cloudflare Workers, or AWS API Gateway setup, the useful work is to decide which operation is allowed, for whom, how often, and how the proof appears in tests and logs.

For solo projects, start with the Node.js demo, then move to Redis when you deploy more than one instance. For teams, add the Claude Code prompt, review checklist, environment variables, and runbook so changing a threshold later does not become guesswork.

I tested the examples in this article as a small hands-on check. The in-memory server returned 429 after repeated requests, and the Redis version returned Retry-After once the configured 10-request window was exceeded. Adding the client wait logic stopped wasteful immediate retries. The lesson was simple: a rate limit is only production-ready when the response, retry behavior, logs, and exceptions are verified together.

Claude Code API Rate Limiting Guide: 429, Redis, and Cloudflare

Decide What You Are Protecting

Give Claude Code a Real Specification

Runnable Minimal Example: Node.js 429 Server

Production Shape: Redis Sliding Window

Clients Must Respect Retry-After

Cloudflare at the Edge, App Logic by User

Failure Cases to Avoid

Claude Code Review Checklist

Consulting CTA

Free PDF: Claude Code Cheatsheet

Level up your Claude Code workflow

Related Posts

Claude Code Permission Receipt Pattern: Record Scope, Proof, and Rollback

Safe Agent Harness Design for Claude Code and Codex: Permissions, Checks, and Rollback

Claude Code Subagents: A Practical Guide to Safe Agent Delegation

Related Products

The Complete Claude Code Setup & Configuration Guide

50 Battle-Tested Claude Code Prompt Templates