Direct injection: user types attack into the input field
→ operator can at least see the input
Indirect injection: attacker plants attack inside a document
that gets indexed → operator may not have
written the document, may not control it,
and may not have read it
Mitigation
What it does
What it does not do
Source provenance (prefer known authors; risk-flag open-submission sources)
Reduces the chance an attacker-controlled document is in the index
Eliminate the attack surface (any fresh content can be malicious)
Content sanitization at index time (strip instruction-frame patterns)
Catches lazy attacks
Stop adversarial wording the sanitizer did not anticipate
Output filtering (watch for sudden topic changes, unfounded refusals)
Catches some compromises after the fact
Prevent the compromise from happening
Action sandboxing (gate side-effecting actions on confirmations)
Limits damage if the model is compromised
Stop the model from saying compromising things
Visible citations
Makes injection visible to the user
Prevent the injection itself
Design rule: treat the entire retrieval index as untrusted input. Do not give a model on top of it access to anything you would not let the indexed content control directly.
RAG (retrieval-augmented generation): the pattern of retrieving relevant documents from a knowledge base and inserting them into the prompt so the model can answer using them.
Chunk: a coherent text segment (typically a few hundred tokens) extracted from a source document for indexing.
Embedding model: a separate, usually smaller model that maps text to vectors so semantically similar text ends up nearby.
Vector database: a database optimized for fast nearest-neighbor lookup over high-dimensional vectors. Examples: FAISS, Pinecone, Weaviate, Chroma.
Top-K retrieval: returning the K vector-DB entries closest to a query vector. K is usually 5 or 10 in production RAG.
Cosine similarity: the most common similarity metric for text embeddings; measures the angle between two vectors regardless of magnitude.
Hybrid search: combining vector similarity with lexical (keyword) search and merging the rankings.
BM25: the most common lexical-search ranking function; used as the lexical half of hybrid search.
Reranking: an optional second pass that uses a more expensive model to rescore the top-K before they reach the language model.
Bi-encoder: retrieval architecture where the query and each chunk pass through the embedding model separately; comparison happens after via cosine similarity. Fast, recall-heavy. Sentence-BERT is the canonical example.
Cross-encoder: reranking architecture where the query and chunk pass through the encoder together with self-attention across both. Slow, precision-heavy. Used for the second stage of two-stage retrieval.
HyDE (Hypothetical Document Embeddings): retrieval technique that addresses query-vs-document shape mismatch by generating a hypothetical answer document with an LLM call and embedding that document instead of the user’s query.
Grounding instruction: the part of the prompt template that tells the model to use only the provided context and refuse otherwise.
Ungrounded generation: the model producing an answer using its pretraining knowledge instead of, or in addition to, the retrieved context.
Indirect prompt injection: prompt-injection attacks delivered via documents in the retrieval index rather than via the user’s direct input.
Retrieval finds it. The prompt frames it. The model writes it.