Whitepaper 03
Purple Firefish
A governed AI security gateway for high-consequence LLM systems.
Executive Summary
LLM systems are moving from passive assistants to active operators. They summarize documents, retrieve private knowledge, inspect code, reason over vulnerability data, call tools, write reports, route tickets, and increasingly act on behalf of users.
That is useful. It is also dangerous. The deeper risk is that an LLM may treat untrusted text as an instruction, act outside the user's intent, expose sensitive information, or allow an attacker to influence a workflow through prompts, documents, webpages, tool outputs, or agent memory.
The Problem
Most AI tools are still built like chat interfaces. They answer, summarize, generate, and explain. That is helpful, but it is not enough for serious environments. In a real LLM application, the better security question is: what can this input cause the system to do?
A malicious instruction inside a PDF is just text until a RAG pipeline retrieves it. A webpage is just content until an agent treats it as a command. A tool output is just data until the model uses it to justify an action. A generated response is just language until it leaks a credential, reveals internal instructions, or sends private information somewhere it should not go.
What Purple Firefish Does
Purple Firefish can inspect user prompts, RAG chunks, uploaded files, webpages, emails, tool outputs, agent memory, proposed tool calls, and final model responses. It applies layered analysis to determine whether content is benign, suspicious, malicious, sensitive, or unsafe to act on.
Architecture
Purple Firefish does not only scan the first user message. It scans every untrusted ingress point. Source matters. The same phrase has different risk depending on whether it came from a user, a PDF, a webpage, an email, a tool response, or agent memory.
| Stage | Purpose |
|---|---|
| Canonicalization and provenance | Normalize obfuscation and preserve source: user, document, web, email, tool output, or memory. |
| Layer 1: rules | Fast deterministic checks for prompt injection, jailbreaks, system prompt extraction, exfiltration, and encoding tricks. |
| Layer 2: semantic similarity | Embedding-based comparison against known attack patterns to catch paraphrases and novel wording. |
| Layer 3: LLM-as-judge | Structured classification for ambiguous or high-risk content, with confidence, attack family, and reason codes. |
| Policy engine | Maps risk into allow, flag, redact, sandbox, require approval, or block. |
| Tool-call validator | Checks proposed actions against user intent, data sensitivity, destination trust, reversibility, and required approval. |
| Output scanner | Scans final responses for prompt leakage, secrets, sensitive data, unsafe tool-derived content, and policy violations. |
The Real Breakthrough
Prompt injection becomes much more serious when an AI system has tools. A chatbot can be wrong. An agent can act. Purple Firefish validates proposed tool calls against the user's original intent and the current trust context.
- Did the user actually request this action?
- Is this tool necessary for the task?
- Is the destination trusted?
- Is private data being transmitted?
- Is the action reversible?
- Is the instruction coming from the user or from untrusted content?
- Should the operator approve this first?
Why This Matters
Security teams already deal with too much signal: alerts, tickets, vulnerability feeds, asset inventories, incident notes, threat intelligence, compliance requirements, and leadership questions. Purple Firefish is built to reduce that burden.
For builders, it provides a drop-in gateway that can be placed in front of an LLM app. For security teams, it provides visibility into adversarial input, policy decisions, and unsafe AI actions. For leaders, it creates a more governable AI system: one where risky actions are logged, explainable, reviewable, and bounded.
What Makes It Different
| Operating idea | How Purple Firefish applies it |
|---|---|
| Treat untrusted text as possible instruction | Prompts, documents, webpages, emails, and tool outputs all become part of model context. |
| Make uncertainty visible | Expose confidence, reason codes, triggered layers, source type, and recommended action. |
| Connect detection to action | Map risk to allow, flag, redact, sandbox, approval, or block. |
| Govern agents, not just prompts | Validate actions before execution and scan outputs before release. |
| Preserve the operator | Keep humans in control of high-consequence decisions while reducing noise and manual process. |
Conclusion
Purple Firefish is the AI security layer that becomes necessary when LLM systems stop answering questions and start participating in workflows. It detects prompt injection, jailbreaks, data exfiltration attempts, indirect prompt injection, adversarial obfuscation, risky tool calls, and unsafe outputs.
Its deeper purpose is not detection alone. Its purpose is governed action: a system that helps secure the path between untrusted input and trusted action.