Indirect Prompt Injection

Indirect prompt injection is a specific class of prompt injection attacks where the malicious instructions are embedded in third-party content the LLM processes — web pages it browses, documents in a RAG corpus, emails it summarizes, code it reviews — rather than in user input directly. The attacker doesn't need to interact with the application directly; they only need to plant content somewhere the application will eventually consume. Greshake et al.'s 2023 paper "Not What You've Signed Up For" formalized the threat and demonstrated practical attacks against several deployed systems. The threat scales with LLM integration: browser agents that visit untrusted sites, email assistants that process incoming messages, and RAG systems indexing external content are all attack surfaces. Defenses include content provenance tracking, boundary markers that the model is trained to respect, separate models for trusted reasoning versus untrusted content processing, and human-in-the-loop confirmation for high-stakes actions. The OWASP LLM Top 10 and the EU AI Act both flag indirect prompt injection as a primary risk category. AI governance teams must consider every untrusted content source a potential attack vector.

Indirect-injection defenses with Centralpoint: Centralpoint enforces content provenance, boundary markers, and trusted-vs-untrusted separation across RAG pipelines and agent workflows. Tokens are metered per skill, prompts stay local, and hardened chatbots deploy through one line of JavaScript with full audit trails.

Related Keywords:
Indirect Prompt Injection,,

Back