Build an AI Agent That Actually Works

Something you can use today.

Jun 03, 2026

Build an AI Agent That Actually Works

The Weekly Build

Most people ask a model to "write the agent" or "set up RAG". They get a plausible description, not something that actually runs. The fix is to force the model into a workflow: name the moving parts, request a minimal working plan first, then expand only after you see something testable.

This is the same pattern behind skill-based coding assistants, agent harnesses, RAG engines and browser automation tools. Small loops, concrete interfaces, measurable outcomes.

Here is the prompt that changes the conversation:

You are building an AI agent for me that can complete ONE job reliably.

Job: <describe the job in one sentence>
Inputs I will provide: <list inputs>
Outputs I need: <list outputs>
Tools I want to use: <firecrawl | browser-use | ragflow | daytona | graphify | composio skills | other>
Environment: <Claude Code, Cursor, local CLI, ChatGPT, Gemini>
Hard constraints: <time limit, cost limit, no external calls, must cite sources, etc.>

Step 1. Ask only clarifying questions. If anything is ambiguous, propose defaults and mark them as defaults.
Step 2. Produce a minimal architecture with interfaces:
- Tool calls allowed
- Data flow from input to output
- Failure modes and what to do when each fails
Step 3. Generate a minimal working version plan:
- File or module list
- Configuration needed
- A single test case that proves it works
Step 4. Then provide implementation instructions in this order:
- Setup commands
- Configuration values to set
- Code or exact CLI commands to run
- How to verify the test case
Step 5. Finally provide an "expansion backlog":
- 3 improvements you would add next, each with a reason and a verification step.

Rules:
- Do not write speculative code. If unsure, show a placeholder and a verification step.
- Keep the first run small and testable.
- End by asking me to confirm I can run the test case.

Why this works

This prompt blocks the most common failure mode: big bang agent planning. By requiring interfaces, a minimal architecture, and a single test case, you force the model to deliver something you can run and verify. It also makes tool choice explicit, so you get correct wiring instead of generic advice.

How to adapt it

Swap the Job and Tools fields, then keep the same structure. For RAG, set Inputs as "query plus documents" and Outputs as "answer with source snippets". For web agents, set Outputs as "a structured result" and add a constraint to avoid unsafe actions. For coding assistants, set Environment as "Claude Code or Cursor" and require a single failing test first, then fix it.

This Week's Challenge

Take fifteen minutes right now. Open Claude, ChatGPT or Gemini. Paste the prompt above. Fill in the fields for a real task you have been meaning to automate.

Start small. One job. One input. One output.

You should end with a minimal architecture diagram, a file list, and a single test case you can run. If the model produces anything you cannot verify, send it back. The goal is not a perfect agent. The goal is something you can run and test this week.

Try it and reply with what happened. I read every one.

Forward this to one person who should be using AI better than they are. Reply with what you built, tried or broke this week. I read every one.

Gareth, founder of The Anthropic Stack (theanthropicstack.com)

The Anthropic Stack

Discussion about this post

Ready for more?