Claude Code × Amazon Bedrock Guide: Run Claude Safely in Production on AWS

Calling the Anthropic API directly is usually the fastest way to prototype with Claude. Production on AWS is different. You need to explain API key handling, IAM boundaries, audit logs, billing attribution, data retention, retries, and cost controls. That is where Amazon Bedrock as the Claude runtime becomes useful.

Amazon Bedrock is AWS’s managed foundation-model service. In practical terms, it lets your application invoke Claude through AWS identity, authorization, logging, billing, and operational controls instead of treating the model as a separate external API. Claude Code is still valuable here, but only when you ask it to generate reviewable infrastructure-aware code: IAM, model invocation, guardrails, logging, retry behavior, and budget limits.

Masa’s hard lesson from real work was simple: a Bedrock demo can work in an hour, but production review asks harder questions. Which role invokes the model? Which model IDs are allowed? What happens on throttling? Are prompts stored? Who pays for each request? This guide turns those questions into code and prompts you can reuse.

This article follows AWS documentation as of June 3, 2026. Bedrock model IDs, Regions, quotas, and pricing can change, so verify against official docs before deployment.

Production Shape

Do not start by asking Claude Code to “add Bedrock chat.” Start with the operating shape: caller, runtime, model, guardrails, logs, and cost attribution.

flowchart LR
  U["User / Admin UI"] --> A["API Gateway or ALB"]
  A --> R["Lambda or ECS task"]
  R --> G["Input validation and budget guard"]
  G --> B["Amazon Bedrock Runtime<br/>Converse / ConverseStream"]
  B --> C["Claude model"]
  R --> L["App logs<br/>CloudWatch Logs"]
  B --> M["Model invocation logs<br/>CloudWatch Logs or S3"]
  R --> K["Knowledge Bases<br/>optional RAG"]
  R --> Q["Cost Explorer / CUR<br/>IAM principal attribution"]

The Converse API is Bedrock’s unified chat-style interface across supported models. Guardrails evaluate user input and model output for configured safety policies. model invocation logging can send invocation data to CloudWatch Logs or S3. IAM principal attribution helps assign Bedrock inference cost to IAM users and roles.

Where This Fits

The first strong use case is internal document Q&A. Store runbooks, product specs, and support procedures in S3-backed Knowledge Bases, retrieve the relevant chunks, and let Claude draft the answer with citations. This is RAG, or retrieval augmented generation: the model answers with context retrieved from your own data.

The second use case is support drafting. Claude can turn a customer question, plan details, and response templates into a first draft for an operator to approve. Keep a human review step until your policy, guardrails, and measurement are mature.

The third use case is an engineering operations assistant. It can summarize CloudWatch logs, draft deployment checklists, prepare incident notes, or turn a runbook into a task list. Claude Code helps because it can update the API, Lambda, IAM policy, tests, and docs in one reviewable change.

The fourth use case is content operations for a site like ClaudeCodeLab. Bedrock can run article QA, description length checks, internal-link suggestions, and code-block review under AWS billing and IAM controls. Pair this with the Claude Code API cost guide and the verification receipt workflow if content quality affects revenue.

Setup

You need Node.js 20 or newer, AWS CLI credentials, access to Bedrock in your AWS account, and an Anthropic model available in your target Region. Depending on your account, Anthropic models can require first-time use details, AWS Marketplace permissions, and a valid payment method.

Do not hard-code a model ID from a blog post. List what your account can use and put the chosen ID in an environment variable.

export AWS_REGION=us-east-1

aws bedrock list-foundation-models \
  --region "$AWS_REGION" \
  --query "modelSummaries[?providerName=='Anthropic'].[modelId,modelName]" \
  --output table

export BEDROCK_MODEL_ID="anthropic.claude-sonnet-4-20250514-v1:0"
aws bedrock get-foundation-model \
  --region "$AWS_REGION" \
  --model-identifier "$BEDROCK_MODEL_ID" \
  --query "modelDetails.{input:inputModalities,output:outputModalities,streaming:responseStreamingSupported}"

Create a small TypeScript project.

mkdir bedrock-claude-lab
cd bedrock-claude-lab
npm init -y
npm install @aws-sdk/client-bedrock @aws-sdk/client-bedrock-runtime @aws-sdk/client-bedrock-agent-runtime
npm install --save-dev typescript tsx @types/node
npx tsc --init --module NodeNext --moduleResolution NodeNext --target ES2022
mkdir -p src/lambda

Add scripts to package.json.

{
  "type": "module",
  "scripts": {
    "chat": "tsx src/chat.ts",
    "stream": "tsx src/stream.ts",
    "typecheck": "tsc --noEmit"
  }
}

IAM Baseline

Converse still relies on model invocation permissions. Non-streaming calls need bedrock:InvokeModel; streaming calls need bedrock:InvokeModelWithResponseStream. Keep app logging permissions separate from Bedrock model invocation logging settings.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "ListBedrockModelsForStartupCheck",
      "Effect": "Allow",
      "Action": ["bedrock:ListFoundationModels", "bedrock:GetFoundationModel"],
      "Resource": "*"
    },
    {
      "Sid": "InvokeOnlyApprovedClaudeModels",
      "Effect": "Allow",
      "Action": ["bedrock:InvokeModel", "bedrock:InvokeModelWithResponseStream"],
      "Resource": [
        "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-*",
        "arn:aws:bedrock:us-west-2::foundation-model/anthropic.claude-*",
        "arn:aws:bedrock:us-east-1:123456789012:inference-profile/*"
      ]
    },
    {
      "Sid": "ApplyApprovedGuardrail",
      "Effect": "Allow",
      "Action": ["bedrock:ApplyGuardrail"],
      "Resource": "arn:aws:bedrock:us-east-1:123456789012:guardrail/your-guardrail-id"
    }
  ]
}

Replace the account ID, Regions, guardrail ID, and model pattern. If you use cross-Region inference, include the inference profile and destination model resources required by your profile. For deeper least-privilege review, use the Claude Code AWS IAM guide.

Invoke Claude

The key production detail is requestMetadata. AWS documents it as useful for filtering model invocation logs, so add request ID, feature name, and caller category.

// src/bedrock-client.ts
import { randomUUID } from "node:crypto";
import {
  BedrockRuntimeClient,
  ConverseCommand,
  ConverseStreamCommand,
  type ConverseCommandInput,
} from "@aws-sdk/client-bedrock-runtime";

export const AWS_REGION = process.env.AWS_REGION ?? "us-east-1";
export const BEDROCK_MODEL_ID =
  process.env.BEDROCK_MODEL_ID ?? "anthropic.claude-sonnet-4-20250514-v1:0";

export const bedrock = new BedrockRuntimeClient({
  region: AWS_REGION,
  maxAttempts: Number(process.env.BEDROCK_MAX_ATTEMPTS ?? "3"),
});

type AskClaudeInput = {
  prompt: string;
  system?: string;
  maxTokens?: number;
  temperature?: number;
  userId?: string;
  feature?: string;
};

function optionalGuardrail(): ConverseCommandInput["guardrailConfig"] | undefined {
  const guardrailIdentifier = process.env.BEDROCK_GUARDRAIL_ID;
  if (!guardrailIdentifier) return undefined;

  return {
    guardrailIdentifier,
    guardrailVersion: process.env.BEDROCK_GUARDRAIL_VERSION ?? "DRAFT",
    trace: "enabled",
  };
}

export async function askClaude(input: AskClaudeInput) {
  const requestId = randomUUID();
  const startedAt = Date.now();

  const response = await bedrock.send(
    new ConverseCommand({
      modelId: BEDROCK_MODEL_ID,
      system: input.system ? [{ text: input.system }] : undefined,
      messages: [{ role: "user", content: [{ text: input.prompt }] }],
      inferenceConfig: {
        maxTokens: input.maxTokens ?? 800,
        temperature: input.temperature ?? 0.2,
      },
      guardrailConfig: optionalGuardrail(),
      requestMetadata: {
        requestId,
        feature: input.feature ?? "local-cli",
        userId: input.userId ?? "anonymous",
      },
    })
  );

  const text =
    response.output?.message?.content
      ?.map((block: { text?: string }) => block.text ?? "")
      .join("") ?? "";

  console.log(
    JSON.stringify({
      level: "info",
      event: "bedrock_converse",
      requestId,
      modelId: BEDROCK_MODEL_ID,
      latencyMs: Date.now() - startedAt,
      stopReason: response.stopReason,
      usage: response.usage,
      metrics: response.metrics,
    })
  );

  return { text, usage: response.usage, stopReason: response.stopReason, requestId };
}

export async function streamClaude(prompt: string) {
  const response = await bedrock.send(
    new ConverseStreamCommand({
      modelId: BEDROCK_MODEL_ID,
      messages: [{ role: "user", content: [{ text: prompt }] }],
      inferenceConfig: { maxTokens: 1200, temperature: 0.2 },
      guardrailConfig: optionalGuardrail(),
      requestMetadata: { feature: "stream-cli", requestId: randomUUID() },
    })
  );

  if (!response.stream) throw new Error("Bedrock did not return a stream.");

  for await (const event of response.stream) {
    const text = event.contentBlockDelta?.delta?.text;
    if (text) process.stdout.write(text);

    if (event.metadata?.usage) {
      process.stderr.write(`\nusage=${JSON.stringify(event.metadata.usage)}\n`);
    }
  }
}

// src/chat.ts
import { askClaude } from "./bedrock-client.js";

const prompt = process.argv.slice(2).join(" ").trim();
if (!prompt) {
  console.error('Usage: npm run chat -- "Summarize Amazon Bedrock in three bullets"');
  process.exit(1);
}

const result = await askClaude({
  prompt,
  system: "You are a concise AWS assistant. If you are unsure, say what to verify.",
  maxTokens: 600,
  feature: "developer-chat",
});

console.log(result.text);

// src/stream.ts
import { streamClaude } from "./bedrock-client.js";

const prompt = process.argv.slice(2).join(" ").trim();
if (!prompt) {
  console.error('Usage: npm run stream -- "Write a deployment checklist"');
  process.exit(1);
}

await streamClaude(prompt);

Run it.

export AWS_REGION=us-east-1
export BEDROCK_MODEL_ID="anthropic.claude-sonnet-4-20250514-v1:0"

npm run chat -- "Explain Amazon Bedrock in three short lines"
npm run stream -- "Create a production Bedrock checklist"
npm run typecheck

Lambda Pattern

For Lambda, initialize the client outside the handler, validate input, cap maxTokens on the server, and return retryable failures clearly.

// src/lambda/assistant-handler.ts
import { askClaude } from "../bedrock-client.js";

type ApiEvent = {
  body?: string | null;
  requestContext?: { requestId?: string };
};

const headers = { "content-type": "application/json; charset=utf-8" };

export const handler = async (event: ApiEvent) => {
  try {
    const body = JSON.parse(event.body ?? "{}") as {
      prompt?: string;
      maxTokens?: number;
      userId?: string;
    };

    if (!body.prompt || body.prompt.length > 8000) {
      return {
        statusCode: 400,
        headers,
        body: JSON.stringify({ error: "prompt is required and must be <= 8000 chars" }),
      };
    }

    const result = await askClaude({
      prompt: body.prompt,
      maxTokens: Math.min(body.maxTokens ?? 800, 1200),
      userId: body.userId ?? "anonymous",
      feature: "support-assistant",
    });

    return {
      statusCode: 200,
      headers,
      body: JSON.stringify({
        text: result.text,
        usage: result.usage,
        stopReason: result.stopReason,
        requestId: result.requestId,
      }),
    };
  } catch (error) {
    const name =
      typeof error === "object" && error && "name" in error ? String(error.name) : "UnknownError";
    const retryable = ["ThrottlingException", "ServiceUnavailableException", "InternalServerException"].includes(name);

    console.error(JSON.stringify({ level: "error", event: "assistant_failed", name, retryable }));

    return {
      statusCode: retryable ? 503 : 500,
      headers,
      body: JSON.stringify({ error: retryable ? "Please retry later" : "Generation failed" }),
    };
  }
};

Do not retry ValidationException blindly; the request is malformed or outside allowed constraints. Retry ThrottlingException, ServiceUnavailableException, and temporary service errors with exponential backoff and jitter. For the wider serverless implementation, continue with the Claude Code AWS Lambda guide.

RAG With Knowledge Bases

For internal document chat, start with Bedrock Knowledge Bases before building your own vector stack. It gives you retrieval and generated answers with citations. AWS notes that guardrails apply to input and generated output, not to the retrieved references themselves, so restrict sensitive source data with S3, KMS, IAM, and data classification.

// src/rag.ts
import {
  BedrockAgentRuntimeClient,
  RetrieveAndGenerateCommand,
} from "@aws-sdk/client-bedrock-agent-runtime";

const agentRuntime = new BedrockAgentRuntimeClient({
  region: process.env.AWS_REGION ?? "us-east-1",
});

export async function askKnowledgeBase(question: string) {
  const knowledgeBaseId = process.env.BEDROCK_KNOWLEDGE_BASE_ID;
  const modelArn = process.env.BEDROCK_GENERATION_MODEL_ARN;

  if (!knowledgeBaseId || !modelArn) {
    throw new Error("Set BEDROCK_KNOWLEDGE_BASE_ID and BEDROCK_GENERATION_MODEL_ARN");
  }

  const response = await agentRuntime.send(
    new RetrieveAndGenerateCommand({
      input: { text: question },
      retrieveAndGenerateConfiguration: {
        type: "KNOWLEDGE_BASE",
        knowledgeBaseConfiguration: {
          knowledgeBaseId,
          modelArn,
          retrievalConfiguration: {
            vectorSearchConfiguration: { numberOfResults: 5 },
          },
        },
      },
    })
  );

  const sources =
    response.citations
      ?.flatMap((citation) => citation.retrievedReferences ?? [])
      .map((reference) => reference.location?.s3Location?.uri)
      .filter(Boolean) ?? [];

  return { answer: response.output?.text ?? "", sources };
}

Logging And Cost

Use two logging layers. Application logs should include requestId, feature, caller category, model ID, usage, latency, and stop reason. Avoid storing raw prompts unless your privacy and retention policy explicitly allows it.

Bedrock model invocation logging can send invocation data to CloudWatch Logs or S3. It is useful for audit and debugging, but it can also store sensitive inputs and outputs. Decide retention, encryption, S3 lifecycle, and access controls before enabling it broadly.

For cost control, cap maxTokens in the API layer, choose models by task complexity, log usage, and group costs with IAM principal attribution. Prompt caching can reduce latency and input-token cost for long repeated contexts, but it only helps when the stable prefix is large enough and reused often.

Prompt For Claude Code

claude -p "
Add Claude invocation through Amazon Bedrock to this repository.

Requirements:
- Use AWS SDK v3 Converse API
- Read model ID from BEDROCK_MODEL_ID
- Default AWS_REGION to us-east-1
- Cap maxTokens at 1200 on the server
- Add requestMetadata with requestId, feature, and userId
- Add guardrailConfig only when BEDROCK_GUARDRAIL_ID is set
- Log usage, latencyMs, and stopReason as JSON
- Do not retry ValidationException
- Treat ThrottlingException and ServiceUnavailableException as retryable
- Document a least-privilege IAM policy in README
- Mock the Bedrock client in tests; do not call the real API

Show the plan before editing, then report typecheck and test results.
"

Pitfalls

The first pitfall is skipping model-access verification. Test list-foundation-models and one small Converse request before asking Claude Code to wire the app.

The second is freezing a model ID from an article. Bedrock model IDs and regional support change. Put the model ID in configuration and validate it at startup.

The third is treating Guardrails as a full correctness guarantee. Use them for safety policies, not as a replacement for human review, domain validation, or authorization.

The fourth is over-logging. Full prompts help debugging, but they can also leak personal or internal data into long-lived logs.

The fifth is retrying the wrong errors. Validation failures need code or input fixes; throttling and temporary service availability can be retried.

The sixth is trusting frontend limits for cost control. Enforce token limits and feature quotas on the API side, then monitor with Cost Explorer, CUR, and budgets.

Monetization Path

Bedrock content becomes valuable when it helps readers move from demo to production: IAM, logs, cost controls, guardrails, reviews, and team rollout. For reusable templates, start with ClaudeCodeLab products. For a repository-specific rollout with CLAUDE.md, IAM review, verification receipts, and CI gates, use Claude Code training and consultation.

What Happened In Practice

After testing this pattern, the biggest improvement came from the prompt given to Claude Code, not from adding more code. Specifying “model ID from env”, “log usage”, “do not retry ValidationException”, “guardrails are optional by env”, and “document IAM in README” produced smaller diffs and easier reviews. Bedrock success depends less on memorizing SDK calls and more on stating production constraints before code generation starts.

Claude Code × Amazon Bedrock Guide: Run Claude Safely in Production on AWS

Production Shape

Where This Fits

Setup

IAM Baseline

Invoke Claude

Lambda Pattern

RAG With Knowledge Bases

Logging And Cost

Prompt For Claude Code

Pitfalls

Monetization Path

What Happened In Practice

Free PDF: Claude Code Cheatsheet

Level up your Claude Code workflow

Related Posts

Claude Code Obsidian to CLAUDE.md Workflow: Stop Re-explaining Context

Claude Code Revenue CTA Routing: Send Articles to PDF, Gumroad, and Consultation

Claude Code Team Handoff Rules: Review Evidence, Permissions, Rollback, and Revenue Paths

Related Products

50 Battle-Tested Claude Code Prompt Templates

The Complete Claude Code Setup & Configuration Guide