References: Securing agents

Source material

Threat-model framework (primary structural source):
• OWASP, "Top 10 for LLM Applications 2025" (v2.0)
  Released: 2024-11-18 by the OWASP GenAI Security project
  Page: https://genai.owasp.org/llm-top-10/
  Categories cited in this lesson: LLM01:2025 Prompt Injection,
  LLM05:2025 Improper Output Handling, LLM06:2025 Excessive Agency,
  LLM07:2025 System Prompt Leakage.

Foundational source on the indirect-injection threat surface:
• Greshake et al., "Not what you've signed up for: Compromising
  Real-World LLM-Integrated Applications with Indirect Prompt Injection"
  Authors: Kai Greshake, Sahar Abdelnabi, Shailesh Mishra,
  Christoph Endres, Thorsten Holz, Mario Fritz (2023)
  arXiv: https://arxiv.org/abs/2302.12173
  License: open-access (arXiv)
  Demonstrates indirect prompt injection against LLM-integrated
  applications via prompts planted in retrieved content; includes
  attacks against then-real GPT-4 systems.

Public-discourse + honest-limits anchor:
• Simon Willison, "Prompt injection" (ongoing series of posts on
  the term he coined and the defenses people propose for it)
  Series page: https://simonwillison.net/series/prompt-injection/

Source curriculum (parallel-numbered, but covers a narrower topic):
• Microsoft, "Securing AI Agents" (AI Agents for Beginners, Lesson 18)
  Author: Microsoft Cloud Advocates
  Lesson page: https://github.com/microsoft/ai-agents-for-beginners/tree/main/18-securing-ai-agents
  License: MIT
Microsoft's Lesson 18 covers cryptographic receipts for tamper-evident
audit logs specifically (a real and useful defense pattern), but does
not address prompt injection, the three attack categories L11 names,
or the broader threat model. We cite it for the specific
audit-logs pattern only, not as a structural mirror; the lesson's
threat-model framework is OWASP-anchored. Full attribution preserved.

Berkeley CS294 LLM Agents (Fall 2024) cohesion citation:
• Dawn Song, "Towards Building Safe & Trustworthy AI Agents and
  A Path for Science- and Evidence-based AI Policy"
  Course: UC Berkeley CS294/194-196, December 2, 2024
  Syllabus: https://rdi.berkeley.edu/llm-agents/f24
Cited for the upstream-design framing of safe and trustworthy
deployment. T20 cites four lectures from this course (Yao L1,
Wang+Liu L3, Zhou L7, Song L11) as a depth-reference thread.

Read this next

Top 10 for LLM Applications 2025 (OWASP). The industry-canonical threat catalog for LLM-integrated applications. Released 2024-11-18 (v2.0). Strong on the agent-specific risks (LLM01 prompt injection, LLM05 improper output handling, LLM06 excessive agency, LLM07 system prompt leakage) and on what each looks like in practice. Read it as the operational checklist next to this lesson’s design-pattern framing.
Not what you’ve signed up for (Greshake et al., 2023). The foundational paper on indirect prompt injection. Demonstrates the threat surface against then-real LLM-integrated systems and traces the cascade from “instruction planted in retrieved content” to data theft and tool abuse. Read it for the worked attack examples this lesson summarizes.
Simon Willison: Prompt injection series. Willison coined the term and has been writing about defenses (and why most proposed ones do not work) for years. The best public-discourse anchor for the no-perfect-defense framing and for recent papers like the Agents Rule of Two coverage.

Going deeper

A short, durable list.

Securing AI Agents (Microsoft) by Microsoft Cloud Advocates. Narrowly focused on cryptographic receipts and tamper-evident audit logs as an integrity mechanism for agent action records. One concrete instantiation of the audit-logs layer of the defense-in-depth toolkit; the broader threat-model work happens elsewhere. MIT-licensed.
A Practical Guide to Building Agents (OpenAI). Practitioner guide that covers guardrails, human oversight, and safe action-taking for agents. Strong on the same blast-radius reasoning this lesson and Lesson 10 use for high-stakes actions; useful as the deployment-side companion to the threat-model framing here.
Building Effective Agents (Anthropic). Patterns for reliable agent design, including when to keep a human in the loop and how to bound agent behavior. Provider-grounded and practical.
CS294/194-196 LLM Agents, Fall 2024 (UC Berkeley). Course syllabus and lecture recordings. Dawn Song’s December 2 lecture on safe and trustworthy AI agents is the closest sibling to this lesson’s framing in the depth-reference course.

Adjacent topics

Where this connects inside the track, and where the track itself ends.

Building trustworthy agents. The previous lesson and the other half of “agents you can trust and ship.” Together they cover the deploying team’s two questions: does the agent fail safely on its own, and can it survive an attacker. Different threat, different defenses, same agent.
The tool-use design pattern in depth. Earlier in the track. Tool definitions are the surface attackers manipulate when they abuse the agent’s tools; good tool descriptions matter to security as well as trustworthiness.
Agents that retrieve their own information: agentic RAG. Earlier in the track. Anything the agent retrieves is, to the model, instructions; agentic RAG is therefore a security surface, not just an information-retrieval pattern.

This is the closing lesson of Track 20. The first nine lessons made an agent capable; Lessons 10 and 11 made it ready to put in front of real users. Where you go from here depends on what you want to build: an agent of your own (the OpenAI and Anthropic practitioner guides above are the next read), more depth on the threat side (the Greshake paper and Willison’s series), or a different track in the curriculum.