Claude Code से API Rate Limiting: 429, Redis और Cloudflare Guide

API rate limiting का मतलब है: एक ही user, script या client कम समय में कितनी requests भेज सकता है, इसकी सीमा तय करना और सीमा पार होने पर उसे इंतजार कराना। यह service को बंद करना नहीं है। यह बस इतना कहता है कि एक client पूरी capacity, SMS quota, email quota या AI credit अकेले consume न करे।

Claude Code endpoints, auth checks और tests तेजी से बना देता है। लेकिन working API हमेशा production-ready API नहीं होती। Login attempts, search API, AI generation, SMS OTP, email sending और webhook retries सभी real cost बनाते हैं। Masa ने एक small contact form test किया था। Duplicate submit पहले छोटा UX issue लगा, लेकिन QA में email provider quota तेजी से खर्च हुआ। Root cause form नहीं था; missing rule था: यह action कितनी बार allowed है?

इस guide में Claude Code के साथ practical workflow है: पहले design, फिर dependency-free Node.js demo, Redis + Express implementation, Retry-After respect करने वाला client, Cloudflare edge placement, abuse-prevention angle, concrete failure cases और consulting CTA. Related basics के लिए Claude Code API development, Claude Code security best practices, और Cloudflare Workers guide पढ़ें।

Official references साथ रखें: Cloudflare Rate limiting rules, OWASP API Security 2023 API4: Unrestricted Resource Consumption और API6: Unrestricted Access to Sensitive Business Flows, और MDN का 429 Too Many Requests.

पहले तय करें क्या protect करना है

Beginner mistake है कि सीधे “60 requests per minute” लिख दिया जाए। पहले risk समझें। Rate limiting server, database, external API cost, inventory, password reset flow, email quota, AI credits, lead quality और business rules protect करता है।

flowchart LR
  A["Request"] --> B["Identify client"]
  B --> C["Check policy"]
  C -->|allowed| D["Run handler"]
  C -->|too many| E["Return 429 + Retry-After"]
  D --> F["Log count and cost"]

Practical use cases:

Use case	Limit key	Starting point	Protection
Login, OTP, password reset	IP + account id	5 attempts / 10 min	Brute force, SMS cost
Search/list API	User id + path	60 / min	DB load, scraping
AI/image generation	User id + plan	Free plan 10/day	LLM cost, free tier
Webhook receiver	Sender + event id	Short bursts allowed	Duplicate processing

सिर्फ IP पर भरोसा न करें। Office, school या mobile network में कई valid users same IP share कर सकते हैं। Attackers भी IP rotate कर सकते हैं। Authenticated API में user id, API key, organization id, plan, endpoint और operation type मिलाकर key बनाना बेहतर है।

Claude Code को clear specification दें

“Rate limiting add कर दो” बहुत vague है। Algorithm, key strategy, 429 response, headers, tests, logs और local/production storage लिखें। यह prompt copy-paste कर सकते हैं:

Add rate limiting to the existing API.

Requirements:
- Scope: POST /api/contact and POST /api/login
- If authenticated, key by userId; otherwise key by IP
- 429 JSON body: { "error": "rate_limited", "retryAfter": seconds }
- Return Retry-After, X-RateLimit-Limit, X-RateLimit-Remaining
- Tests must cover allowed requests, limit reached, and recovery after time passes
- Use Redis in production and an in-memory store locally
- Make limits configurable through environment variables

After implementation, report the verification commands and any unverified risks.

इससे Claude Code को acceptance criteria मिलते हैं। वह केवल middleware नहीं जोड़ेगा, बल्कि client contract भी बनाएगा। 429 behavior को API testing guide की तरह tests में fix करें।

Runnable minimal example: Node.js 429 server

इसे rate-limit-demo.mjs नाम से save करें और Node.js 20+ से run करें। यह token bucket use करता है: bucket में fixed speed से token आते हैं और हर request एक token खर्च करती है। इससे short burst allowed रहता है, लेकिन long-term average control में रहता है।

import http from "node:http";

class TokenBucket {
  constructor({ capacity, refillPerSecond }) {
    this.capacity = capacity;
    this.refillPerSecond = refillPerSecond;
    this.tokens = capacity;
    this.updatedAt = Date.now();
  }

  take(now = Date.now()) {
    const elapsed = (now - this.updatedAt) / 1000;
    this.tokens = Math.min(
      this.capacity,
      this.tokens + elapsed * this.refillPerSecond,
    );
    this.updatedAt = now;

    if (this.tokens >= 1) {
      this.tokens -= 1;
      return { allowed: true, remaining: Math.floor(this.tokens), retryAfter: 0 };
    }

    const missing = 1 - this.tokens;
    const retryAfter = Math.ceil(missing / this.refillPerSecond);
    return { allowed: false, remaining: 0, retryAfter };
  }
}

const buckets = new Map();

function clientKey(req) {
  return req.headers["x-api-key"] ?? req.socket.remoteAddress ?? "anonymous";
}

function checkLimit(req) {
  const key = clientKey(req);
  if (!buckets.has(key)) {
    buckets.set(key, new TokenBucket({ capacity: 5, refillPerSecond: 1 }));
  }
  return buckets.get(key).take();
}

const server = http.createServer((req, res) => {
  if (req.url !== "/api/demo") {
    res.writeHead(404, { "content-type": "application/json" });
    res.end(JSON.stringify({ error: "not_found" }));
    return;
  }

  const result = checkLimit(req);
  res.setHeader("X-RateLimit-Limit", "5");
  res.setHeader("X-RateLimit-Remaining", String(result.remaining));

  if (!result.allowed) {
    res.writeHead(429, {
      "content-type": "application/json",
      "Retry-After": String(result.retryAfter),
    });
    res.end(JSON.stringify({
      error: "rate_limited",
      retryAfter: result.retryAfter,
    }));
    return;
  }

  res.writeHead(200, { "content-type": "application/json" });
  res.end(JSON.stringify({ ok: true, remaining: result.remaining }));
});

server.listen(3000, () => {
  console.log("Listening on http://localhost:3000/api/demo");
});

node rate-limit-demo.mjs

दूसरे terminal में:

for i in 1 2 3 4 5 6 7; do
  curl -i http://localhost:3000/api/demo
done

Windows PowerShell:

1..7 | ForEach-Object {
  curl.exe -i http://localhost:3000/api/demo
}

6th या 7th request पर 429 Too Many Requests आना चाहिए। MDN बताता है कि 429 response में Retry-After हो सकता है। यह client को बताता है कि दोबारा कब try करना है।

Redis से multiple instances handle करें

Memory implementation learning के लिए ठीक है, लेकिन multiple API servers पर टूट जाता है। Server A कह सकता है remaining 0 है, Server B कह सकता है अभी 5 बचे हैं। Redis shared counter देता है।

यह Express example Redis sorted set से sliding window बनाता है। Sliding window “अभी से पिछले 60 seconds” count करता है, इसलिए fixed minute reset की boundary problem कम होती है।

npm init -y
npm i express ioredis
docker run --rm --name redis-rate-limit -p 6379:6379 redis:7-alpine

import express from "express";
import Redis from "ioredis";

const app = express();
const redis = new Redis(process.env.REDIS_URL ?? "redis://127.0.0.1:6379");

const limitScript = `
local key = KEYS[1]
local limit = tonumber(ARGV[1])
local window_ms = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local member = ARGV[4]

redis.call("ZREMRANGEBYSCORE", key, 0, now - window_ms)

local count = redis.call("ZCARD", key)
if count >= limit then
  local oldest = redis.call("ZRANGE", key, 0, 0, "WITHSCORES")[2]
  local retry_ms = math.max(1, oldest + window_ms - now)
  return {0, 0, retry_ms}
end

redis.call("ZADD", key, now, member)
redis.call("PEXPIRE", key, window_ms)
return {1, limit - count - 1, 0}
`;

async function rateLimit(req, res, next) {
  const user = req.get("authorization")?.replace(/^Bearer\s+/i, "");
  const identity = user || req.ip || "anonymous";
  const key = `rl:${identity}:${req.path}`;
  const limit = Number(process.env.RATE_LIMIT_REQUESTS ?? 10);
  const windowMs = Number(process.env.RATE_LIMIT_WINDOW_MS ?? 60000);
  const now = Date.now();
  const member = `${now}:${Math.random()}`;

  const [allowed, remaining, retryMs] = await redis.eval(
    limitScript,
    1,
    key,
    limit,
    windowMs,
    now,
    member,
  );

  res.setHeader("X-RateLimit-Limit", String(limit));
  res.setHeader("X-RateLimit-Remaining", String(remaining));

  if (allowed === 1) return next();

  const retryAfter = Math.ceil(Number(retryMs) / 1000);
  res.setHeader("Retry-After", String(retryAfter));
  res.status(429).json({ error: "rate_limited", retryAfter });
}

app.use(rateLimit);

app.get("/api/search", (req, res) => {
  res.json({ data: ["claude-code", "rate-limit"], at: new Date().toISOString() });
});

app.listen(3000, () => {
  console.log("API ready on http://localhost:3000/api/search");
});

node redis-rate-limit-server.mjs

for i in $(seq 1 12); do
  curl -s -o /dev/null -w "%{http_code}\n" http://localhost:3000/api/search
done

Production prompt में Redis failure mode भी लिखें। Contact form थोड़े समय के लिए fail open हो सकता है। Login, payment और AI credit endpoint fail closed होने चाहिए। यह business risk decision है।

Client को Retry-After respect करना चाहिए

Server-side limit आधा design है। SDK, batch job और webhook sender को Retry-After पढ़कर wait करना चाहिए।

const sleep = (ms) => new Promise((resolve) => setTimeout(resolve, ms));

async function fetchWithRateLimit(url, options = {}, maxRetries = 3) {
  for (let attempt = 0; attempt <= maxRetries; attempt += 1) {
    const res = await fetch(url, options);
    if (res.status !== 429) return res;

    const retryAfter = Number(res.headers.get("retry-after") ?? "1");
    const waitMs = Math.max(1, retryAfter) * 1000;
    console.log(`429 received. Waiting ${waitMs}ms before retry.`);
    await sleep(waitMs);
  }

  throw new Error("Rate limit retry budget exhausted");
}

for (let i = 0; i < 8; i += 1) {
  const res = await fetchWithRateLimit("http://localhost:3000/api/demo");
  console.log(i + 1, res.status, await res.text());
}

External API client के लिए Claude Code को कहें: 429 पर exponential backoff, Retry-After हो तो उसे priority, max retries, और final failure log. Infinite retry incident बन जाता है।

Cloudflare edge पर, app user rules पर

Cloudflare Rate Limiting Rules origin तक पहुंचने से पहले obvious spikes रोकने में मजबूत है। Official docs expressions, periods, thresholds, mitigation timeout और actions समझाते हैं। Login, public search, admin routes, AI generation entry और bot patterns के लिए अच्छा है।

लेकिन Cloudflare product limits नहीं जानता। Free vs paid quota, organization usage, per-user AI credits, refund abuse और invitation abuse application data से तय होते हैं। Practical layering:

Layer	Role	Example
Cloudflare/WAF	obvious bursts और bots रोकना	`/api/login` को IP से limit
Application	user, org, plan, operation rules	Free user 10 generations/day
Queue/worker	heavy async work smooth करना	Email, image, PDF
Billing/monitoring	cost anomaly detect करना	SMS और LLM alerts

OWASP API4 unlimited CPU, memory, file size और third-party service consumption को security risk मानता है। OWASP API6 purchase, reservation, posting, referral जैसे sensitive business flows की automation abuse पर बात करता है। इसलिए rate limiting केवल DDoS protection नहीं, revenue protection भी है।

Common pitfalls

पहली गलती है सभी APIs पर एक ही limit। Profile read और password reset समान risk नहीं रखते। Endpoint और operation के हिसाब से limit करें।

दूसरी गलती है inconsistent 429. कहीं HTML, कहीं text, कहीं JSON होगा तो client brittle होगा। JSON body, Retry-After और headers standardize करें।

तीसरी गलती है सिर्फ successful requests count करना। Failed login, invalid payload और unknown email password reset भी cost और attack signal रखते हैं। Failures को अक्सर stricter limit चाहिए।

चौथी गलती है personal data को key में plain text रखना। Email या phone को Redis key/log में न रखें। जरूरत हो तो hash करें और TTL छोटा रखें।

पांचवीं गलती है tests में सचमुच 60 seconds wait करना। Function में now inject करें और test में time advance करें।

अंतिम गलती है legitimate infrastructure block करना। Search bots, uptime checks, internal monitoring, payment webhooks और partner callbacks के लिए narrow exceptions रखें।

Claude Code review checklist

Implementation के बाद Claude Code से ये check करवाएं:

क्या सभी 429 same JSON shape हैं?
क्या Retry-After और remaining headers हैं?
IP, user id, API key, organization key design सही है?
Redis failure behavior explicit है?
Auth failure, validation failure और external API failure count होते हैं?
Tests allowed, blocked, recovered states cover करते हैं?
Admin, monitoring, webhook, crawler exceptions बहुत broad तो नहीं?

यह सिर्फ code quality नहीं है। AI, SMS, email या payment वाले products में limiter mistake सीधे bill में दिखती है।

Consulting CTA

ClaudeCodeLab Claude Code training and consulting में API implementation, security review, rate limiting, billing safeguards और monitoring design पर काम करता है। Existing Next.js, Express, Cloudflare Workers या AWS API Gateway project में सबसे जरूरी काम है “कौन-सा action, किस identity के लिए, कितनी बार” को code, tests और logs में बदलना।

Solo project में पहले Node.js demo चलाएं, फिर multiple instances पर Redis अपनाएं। Team में prompt, review checklist, environment variables और runbook साथ रखें ताकि बाद में limit बदलना guesswork न बने।

मैंने इस article के examples run करके देखा। Memory server repeated requests के बाद 429 देता है। Redis version 10-request window cross होने पर Retry-After वाला 429 देता है। Client wait logic से immediate retry रुक गया। Lesson साफ है: rate limit production-ready तभी है जब response, retry, logs और exceptions साथ verify हों।

Claude Code से API Rate Limiting: 429, Redis और Cloudflare Guide

पहले तय करें क्या protect करना है

Claude Code को clear specification दें

Runnable minimal example: Node.js 429 server

Redis से multiple instances handle करें

Client को Retry-After respect करना चाहिए

Cloudflare edge पर, app user rules पर

Common pitfalls

Claude Code review checklist

Consulting CTA

मुफ़्त PDF: Claude Code cheatsheet

संबंधित लेख

Claude Code Permission Receipt Pattern: scope, proof और rollback लिखना

Claude Code और Codex के लिए सुरक्षित Agent Harness: permissions, verification और rollback

Claude Code Subagents गाइड: article और code work को सुरक्षित तरीके से delegate करें