References: Building trustworthy agents
Source material
Section titled “Source material”Source curriculum (structural mirror, cited as further study):• Microsoft, "Building Trustworthy AI Agents" (AI Agents for Beginners, Lesson 06) Author: Microsoft Cloud Advocates Lesson page: https://github.com/microsoft/ai-agents-for-beginners/tree/main/06-building-trustworthy-agents License: MITClawdemy's lessons are original prose that follows the pedagogical arc of thissource. We do not reproduce or transcribe it; we cite it as the recommendedcompanion. All rights to the original materials remain with their authors.
Source-scope note: Microsoft Lesson 06 mixes trustworthiness (the agent's ownreliability) with security (adversarial threats), and its content leans towardthe adversarial side (threat understanding, input filtering, access control).This lesson deliberately holds a clean boundary: it covers the agent's OWNfailure modes here, and reserves the adversarial material (prompt injection,tool abuse, exfiltration) for the next lesson on securing agents. The six-modefailure taxonomy here is Clawdemy framing, supplied to keep that boundary cleanand to deliver this lesson's stated capability. The parts of Lesson 06 that fittrustworthiness directly (human-in-the-loop, conversation/turn limits, inputvalidation) are reflected here as guardrails.Read this next
Section titled “Read this next”- Building Trustworthy AI Agents (Microsoft) by Microsoft Cloud Advocates. The practitioner version, strong on human-in-the-loop design and the threat-aware side of trust, with runnable samples. MIT-licensed. Read it alongside this lesson’s failure-mode taxonomy for both halves of the picture, and note that much of its threat content maps to our next lesson on securing agents.
Going deeper
Section titled “Going deeper”A short, durable list.
- A Practical Guide to Building Agents (OpenAI). A practitioner guide that covers guardrails, human oversight, and safe action-taking for agents. Good on the same blast-radius reasoning this lesson uses for high-stakes actions.
- Building Effective Agents (Anthropic). Patterns for reliable agent design, including when to keep a human in the loop and how to bound agent behavior. Practical and provider-grounded.
Adjacent topics
Section titled “Adjacent topics”Where this leads inside this track.
- Securing agents. The next lesson. The other half of “agents you can trust and ship”: defending against an attacker who feeds the agent malicious input to hijack it, abuse its tools, or extract data. Different threat, different defenses.
- The tool-use design pattern in depth. Earlier in the track. Good tool descriptions are a trustworthiness guardrail: clear definitions reduce hallucinated and misdirected tool calls at the source.
- Agents that self-check: metacognition. The previous lesson. A reflection step is one of the guardrails against confidently wrong answers, so self-checking and trustworthiness are tightly linked.