AI Agent Best Practices

<aside> 🦥

Sources

State of Agent Engineering – LangChain

Effective context engineering for AI agents – Anthropic

Demystifying evals for AI agents – Anthropic

LLM Prompt Injection Prevention – OWASP

</aside>

🦥 Sloth's Simple Version

<aside> 🦥

Making an agent look smart in a demo is easy. Making one that works reliably when real people use it is the actual game. In a survey of 1,300+ teams, the #1 thing blocking agents from production was quality, not cost. These habits are how you get there.

</aside>

1. Start Stupidly Narrow

<aside> 🦥

Give the agent one clear job with clear inputs, outputs, and limits. Agents fall apart when the task is vague, that is when they hallucinate and wander off. Nail one task, then expand.

</aside>

2. Context Engineering Is The Whole Job

<aside> 🦥

When an agent fails, it is usually not a dumb model. It is the wrong context. Your job is to put the smallest set of high-signal info in front of the model at each step. Four moves to remember:

Write: save notes and memory outside the context window.
Select: pull in only the relevant docs, files, or tools.
Compress: summarize old steps so the window stays lean.
Isolate: split work across subagents that each get their own clean context.

Sloth rule: context is like milk, best served fresh and condensed. 🥛

</aside>

3. Give Fewer, Better Tools

<aside> 🦥

A few well-described tools beat a pile of overlapping ones. Write clear tool names and descriptions, use formats the model already knows, and design tools that are hard to use wrong. If the agent keeps picking the wrong tool, fix the descriptions before you blame the model.

</aside>

4. Keep A Human In The Loop

<aside> 🦥

Require approval before risky actions (deleting files, sending emails, spending money).
Set step limits and budgets so a confused agent cannot loop forever.
Add clear stop conditions so it knows when the job is actually done. </aside>

5. Evaluate, Do Not Just Vibe-Check

<aside> 🦥

You cannot improve what you do not measure. Build evals (test cases your agent runs against):

Deterministic checks for exact things (did it return valid JSON? the right value?).
LLM-as-judge for fuzzy things (was the tone right? did it follow the steps?).

Then add observability: trace every step so you can see where it went sideways. Nearly 9 in 10 teams running agents in production do this. It is table stakes.

</aside>

6. Respect The Token Bill

<aside> 🦥

Agentic loops can burn tokens fast, and plenty of teams have been shocked by the bill. Use cheaper and faster models for easy steps, keep context short, and do not let the agent re-read everything on every single turn.

</aside>

7. Security Is Not Optional

<aside> 🦥

Prompt injection is the #1 security risk for LLM apps. Untrusted text (a web page, an email, a file) can secretly tell your agent to do bad things. Defend yourself:

Least privilege: never hand an agent blanket shell, file, or account access.
Separate instructions from data, and validate every input.
Validate outputs and add a kill switch for anything that touches the real world. </aside>

🦥 Common Mistakes (Save Yourself The Pain)

<aside> 🦥

Reaching for a multi-agent system when one good agent would do.
Stuffing the entire codebase or every doc into context.
No evals, no logging, then wondering why it is flaky.
Letting it run fully autonomous with zero guardrails.
Blaming the model when the real fix is better context and clearer tools. </aside>

Sources

🦥 Sloth's Simple Version

1. Start Stupidly Narrow

2. Context Engineering Is The Whole Job

3. Give Fewer, Better Tools

4. Keep A Human In The Loop

5. Evaluate, Do Not Just Vibe-Check

6. Respect The Token Bill

7. Security Is Not Optional

🦥 Common Mistakes (Save Yourself The Pain)

🦥 The Takeaway