हार्नेस इंजीनियरिंग गाइड: Claude Code से AI एजेंट बनाना सीखें

AI एजेंट बनाने में अब केवल अच्छा prompt लिखना पर्याप्त नहीं है। 2026 में असली ध्यान harness engineering पर है। beginner-friendly भाषा में harness का अर्थ है “एजेंट का scaffold” यानी वह ढांचा जो मॉडल को सुरक्षित और उपयोगी तरीके से काम करने देता है। software testing में जिस तरह test harness किसी code को चलाने, जांचने और नियंत्रित करने का वातावरण देता है, उसी तरह AI harness LLM के चारों ओर tools, context, permissions, verification और control loop रखता है।

Claude Code इस विषय को समझने का अच्छा उदाहरण है। यह सिर्फ chat UI नहीं है। इसमें file tools, shell commands, CLAUDE.md, hooks, permission modes, subagents और memory जैसी परतें हैं। इस लेख में हम देखेंगे कि harness engineering क्यों trend में है, runnable Node.js mini harness कैसे बनता है, policy JSON कैसे काम करता है, और content automation, code review, SaaS integration, cloud operations तथा security boundaries जैसे use case में इसे कैसे लागू करें।

harness engineering क्यों जरूरी हो गई

LLM अगला कदम सोच सकता है, लेकिन वास्तविक काम में केवल सोचने से काम नहीं चलता। उसे workspace पढ़ना होता है, जरूरी context चुनना होता है, tool चलाना होता है, failure समझना होता है, risky action रोकना होता है, और output को verify करना होता है। यही बाहरी व्यवस्था harness है।

Claude Code इसी कारण मजबूत लगता है। Claude Agent SDK documentation बताती है कि SDK Claude Code जैसी filesystem-based features, project instructions, skills, hooks और permissions लोड कर सकता है। permissions docs allow, deny, permission mode और runtime callback समझाती हैं। prompt caching docs लंबे static context को reuse करने की बात करती हैं। ये सारी चीजें prompt नहीं, harness हैं।

OODA loop से बात साफ होती है:

चरण	काम	मालिक
Observe	files, logs, URLs, tickets, API state पढ़ना	Harness
Orient	context को compress और organize करना	Harness
Decide	अगला action चुनना	LLM
Act	tool चलाना, file लिखना, API call करना	Harness

चार में से तीन चरण मुख्य रूप से harness के हैं। इसलिए केवल prompt engineering से production-grade agent नहीं बनता।

एक practical harness में क्या होना चाहिए

काम शुरू होने से पहले harness चार निर्णय लेता है: agent क्या पढ़ सकता है, output क्या बनेगा, success कैसे verify होगा, और कौन सा action automatic, ask-first या forbidden होगा।

blog workflow में इसका अर्थ है: existing slugs पढ़ना, duplicate topic से बचना, MDX लिखना, frontmatter check करना, code fence validate करना, official links और internal links जोड़ना, /products/ और /training/ CTA लगाना, site build करना और public URL देखना। prompt केवल इस workflow का एक हिस्सा है।

layer	example	pitfall
context	project rules, style guide, old failures	पुरानी assumptions बची रहती हैं
tools	read, grep, write, test, API call	बहुत सारे tools model को confuse करते हैं
policy	allow, ask, deny, sandbox	destructive action unattended चल जाता है
verification	test, diff, screenshot, public URL	output सही दिखता है लेकिन broken होता है
memory	reusable preferences	temporary note permanent rule बन जाता है

concept diagram

harness model के पहले और बाद की control layer है। prompt जरूरी है, लेकिन policy, context, tools, permission gate और verification loop workflow को भरोसेमंद बनाते हैं।

flowchart LR
  A["Goal"] --> B["Harness policy"]
  B --> C["Context"]
  B --> D["Tools"]
  B --> E["Permissions"]
  C --> F["LLM decision"]
  D --> F
  E --> G["Safe action"]
  F --> G
  G --> H["Verification"]
  H --> I["Artifact"]
  H --> B

runnable Node.js mini harness

नीचे का example छोटा है, लेकिन इसमें model, दो tools, policy, loop, path boundary और readable error शामिल हैं। पहले ANTHROPIC_API_KEY set करें।

mkdir harness-demo
cd harness-demo
npm init -y
npm install @anthropic-ai/sdk
node -e "const fs=require('node:fs');fs.mkdirSync('sandbox',{recursive:true});fs.writeFileSync('sandbox/README.md','# Demo\nShip a safer agent workflow.\nKeep writes inside sandbox.\n');"

policy.json सेव करें:

{
  "workspace": "./sandbox",
  "maxSteps": 6,
  "tools": {
    "read_file": {
      "allow": true,
      "risk": "Read UTF-8 text only inside workspace"
    },
    "write_file": {
      "allow": true,
      "risk": "Write UTF-8 text only inside workspace"
    }
  }
}

mini-harness.mjs सेव करें:

import Anthropic from "@anthropic-ai/sdk";
import { mkdir, readFile, writeFile } from "node:fs/promises";
import path from "node:path";

const client = new Anthropic();
const policy = JSON.parse(await readFile(new URL("./policy.json", import.meta.url), "utf8"));
const model = process.env.ANTHROPIC_MODEL || "claude-sonnet-4-6";
const workspace = path.resolve(policy.workspace);

function safePath(requestedPath) {
  const resolved = path.resolve(workspace, requestedPath);
  const inside = resolved === workspace || resolved.startsWith(workspace + path.sep);
  if (!inside) {
    throw new Error(`Path escapes workspace: ${requestedPath}. Use a path under ${policy.workspace}.`);
  }
  return resolved;
}

function ensureAllowed(toolName) {
  const rule = policy.tools?.[toolName];
  if (!rule?.allow) {
    throw new Error(`Tool '${toolName}' is not allowed by policy.json.`);
  }
}

const tools = [
  {
    name: "read_file",
    description: "Read a UTF-8 text file from the allowed workspace.",
    input_schema: {
      type: "object",
      properties: { path: { type: "string" } },
      required: ["path"],
      additionalProperties: false
    }
  },
  {
    name: "write_file",
    description: "Write a UTF-8 text file inside the allowed workspace.",
    input_schema: {
      type: "object",
      properties: {
        path: { type: "string" },
        content: { type: "string" }
      },
      required: ["path", "content"],
      additionalProperties: false
    }
  }
];

async function executeTool(name, input) {
  ensureAllowed(name);
  if (name === "read_file") {
    return await readFile(safePath(input.path), "utf8");
  }
  if (name === "write_file") {
    const target = safePath(input.path);
    await mkdir(path.dirname(target), { recursive: true });
    await writeFile(target, input.content, "utf8");
    return `written ${input.path}`;
  }
  throw new Error(`Unknown tool: ${name}`);
}

async function run(goal) {
  const messages = [{ role: "user", content: goal }];

  for (let step = 0; step < policy.maxSteps; step++) {
    const response = await client.messages.create({
      model,
      max_tokens: 1200,
      tools,
      system: "You are a careful file assistant. Use tools when needed. Keep writes under policy workspace.",
      messages
    });

    messages.push({ role: "assistant", content: response.content });
    const toolUses = response.content.filter((block) => block.type === "tool_use");

    if (toolUses.length === 0) {
      const text = response.content
        .filter((block) => block.type === "text")
        .map((block) => block.text)
        .join("\n");
      console.log(text);
      return;
    }

    const results = [];
    for (const toolUse of toolUses) {
      try {
        const output = await executeTool(toolUse.name, toolUse.input);
        results.push({ type: "tool_result", tool_use_id: toolUse.id, content: String(output).slice(0, 8000) });
      } catch (error) {
        results.push({
          type: "tool_result",
          tool_use_id: toolUse.id,
          is_error: true,
          content: error instanceof Error ? error.message : String(error)
        });
      }
    }
    messages.push({ role: "user", content: results });
  }

  throw new Error(`Max steps reached: ${policy.maxSteps}`);
}

const goal = process.argv.slice(2).join(" ") || "Read README.md and write summary.md with three bullet points.";
await run(goal);

चलाएं:

node mini-harness.mjs

यह छोटा example भी core pattern दिखाता है: tool schema, policy, sandbox path, max steps, readable tool errors और concrete artifact। इसमें grep, tests, approval UI, SaaS API और hooks जोड़ें तो यह Claude Code style harness बनना शुरू हो जाता है।

पांच concrete use case

1. content automation कमजोर prompt कहता है “blog post लिखो।” मजबूत harness existing posts पढ़ता है, duplicate topic रोकता है, MDX लिखता है, frontmatter और code fence check करता है, official links और internal links जोड़ता है, /products/ और /training/ CTA डालता है, build करता है और public URL देखता है। pitfall है fast publishing के चक्कर में shallow translation और repeated content बना देना।

2. code review review harness git diff, test output, changed files और project rules पढ़ता है। output findings-first होना चाहिए: bug, risk, regression और missing tests पहले। pitfall है कि model सिर्फ बदलावों का summary दे और असली defect न पकड़े।

3. SaaS integration Notion, HubSpot, Stripe या CRM में read-only lookup, dry-run mutation और approved write अलग होने चाहिए। consultation leads को classify करके CRM update draft बनाना ठीक है, पर production write human approval के बाद ही होना चाहिए। risk है गलत customer note या billing change को तुरंत लिख देना।

4. cloud operations cloud workflow में deploy command से पहले और बाद के checks जरूरी हैं: environment variables, build result, diff, target environment, rollback plan, health endpoint और public URL। pitfall है log की आखिरी line देखकर गलत root cause fix करना। retry limit और log summary जरूरी हैं।

5. security boundaries security boundary शुरुआत में design होती है। Read थोड़ा broad हो सकता है, लेकिन Write workspace तक सीमित होना चाहिए। shell commands allow-list में हों। rm, force push, production DB write, billing change और secrets access deny या ask-first होने चाहिए। harness model पर भरोसा करने के लिए नहीं, जरूरत से ज्यादा भरोसा रोकने के लिए है।

Claude Code से क्या सीखें

पहला lesson है context layering। स्थिर project rules CLAUDE.md या equivalent config में रखें, current session की progress plan में, और reusable preference memory में। इससे temporary निर्णय permanent rule नहीं बनते।

दूसरा lesson है hooks। formatting, lint, tests, link checks और screenshot verification जैसे deterministic काम command से होने चाहिए। Claude failure पढ़कर fix करे, लेकिन check खुद deterministic रहे।

तीसरा lesson है delegation। लंबे logs, broad search, multilingual translation और बड़े refactor को subagent या अलग stage में भेजें। main context में decision रहे, noise नहीं।

common pitfall

बहुत सारे tools accuracy घटाते हैं। शुरुआत 5 से 10 focused tools से करें।

unreadable error self-repair रोकता है। Error: failed की जगह बताएं कि क्या missing है और अगला कदम क्या है।

prompt caching न करने से लंबा static context हर बार भेजना पड़ता है। fixed और dynamic context अलग करें।

verification न हो तो polished लेकिन broken output publish हो सकता है। articles में frontmatter और code fence checks, code में tests, cloud में health checks और SaaS में audit logs जरूरी हैं।

permission drift खतरनाक है। temporary convenience को permanent risk न बनने दें।

अगला कदम

सुरक्षा से शुरू करना है तो Claude Code permissions guide पढ़ें। project rules के लिए CLAUDE.md best practices उपयोगी है। heavy काम बांटना हो तो Claude Code subagent patterns देखें, और cost control के लिए Claude Code token optimization पढ़ें।

quick reference के लिए free Claude Code Quick Reference Cheatsheet रखें। templates और playbooks के लिए /products/ देखें। अगर team workflow, permissions, review gates, verification और revenue path साथ में design करना है, तो /training/ से consultation शुरू करें।

मैंने क्या verify किया

ClaudeCodeLab जैसे multilingual content workflow में harness का सबसे बड़ा लाभ यह है कि failure visible हो जाता है। prompt article बना सकता है, लेकिन harness बताता है कि body depth, code fence, frontmatter, links, CTA और public URL सही हैं या नहीं। इससे output पर blind trust कम होता है और workflow operate करना आसान होता है।

सारांश

Harness engineering यह तय करने की discipline है कि model क्या देखे, क्या करे, कहां रुके और result कैसे verify हो। Claude Code अच्छा उदाहरण है क्योंकि उसकी शक्ति सिर्फ model नहीं, बल्कि उसके चारों ओर बना scaffold है। ऊपर का mini harness चलाएं, फिर अपने use case में एक boundary और एक verification step जोड़ें।

हार्नेस इंजीनियरिंग गाइड: Claude Code से AI एजेंट बनाना सीखें

harness engineering क्यों जरूरी हो गई

एक practical harness में क्या होना चाहिए

concept diagram

runnable Node.js mini harness

पांच concrete use case

Claude Code से क्या सीखें

common pitfall

अगला कदम

मैंने क्या verify किया

सारांश

References

मुफ़्त PDF: Claude Code cheatsheet

संबंधित लेख

Claude Code permission safety ladder: access धीरे-धीरे बढ़ाएं

Claude Code Small PR Proof Pack: छोटे PR को review-ready बनाना

Claude Code Review Gate Before Commit: diff, test, public URL और CTA जांच