Whitepaper
Anomaly Intelligence for Agentic Systems
A practical framework for unsupervised, drift-aware monitoring of live agent workflows—detecting goal drift, tool anomalies, context poisoning, and hidden failure modes before they trigger unsafe actions or require human intervention.
Thesis
Most agentic systems are governed through static policy, evals, and runtime controls. These layers are necessary, but insufficient once agents operate over long horizons, maintain state, retrieve external context, and chain tool calls.
The most dangerous failures in production are the ones that have never been seen before. They appear as subtle goal drift, unexpected tool parameter distributions, context poisoning that bypasses filters, or cross-agent propagation that evades every predefined rule.
This paper builds directly on The Evening Star AI Engine, Runtime Governance for AI Agents, Red Teaming Agentic AI Systems, Practical Evals for Agentic Systems, and Long-Horizon Agents, Managed Harnesses, and Context Discipline. It supplies the missing continuous runtime layer.
From Static Testing to Continuous Drift Intelligence
Static red teaming and evals catch known attacks. They pressure the prompt, the tool plan, or the policy boundary at test time.
Production agents, however, live in open loops:
- Goal drift over dozens of steps
- Tool misuse that looks “plausible” to the model
- Context poisoning that accumulates across retrieval cycles
- Silent policy erosion through repeated edge cases
- Cross-agent contamination
Unsupervised anomaly detection is purpose-built for these unknowns. It learns normal behavior per agent, per workflow, per tool, and per context window, then flags statistically significant deviations without requiring new labeled attacks.
This is continuous purple teaming: autonomous red/blue co-evolution inside the agent loop, feeding directly into the six-layer Governance Stack.
Reference Architecture: The Anomaly Intelligence Control Plane
The control plane sits between the agent runtime and the existing governance layers. It consumes traces from Purple Firefish or any policy gateway, decision logs, tool broker outputs, and OpenTelemetry spans.
Figure 1: Anomaly Intelligence Control Plane
Agent Request → Context Builder → Tool Broker (via Purple Firefish)
↓
Anomaly Detector Ensemble
├── Baseline models (per-agent, per-tool, per-workflow)
├── Drift detectors (concept, data, prediction)
├── Agreement scoring (model vs. ensemble)
├── Uncertainty quantification
└── Attribution engine (which context/tool caused the spike)
↓
Anomaly Score + Explanation → Policy Decision Point
↓
Action: Auto-throttle | Require approval | Escalate to incident response | Enrich decision log
Key Metrics and Scoring
| Metric | Definition | Target / Threshold | Governance Response |
|---|---|---|---|
| Drift Score | Statistical deviation from learned baseline | > 3σ or ADWIN trigger | Enrich context, require approval |
| Tool-Anomaly Rate | Unusual parameter distributions or sequences | > 95th percentile | Throttle tool, log for red team |
| Confidence Divergence | Model confidence vs. ensemble uncertainty | > 20% delta | Force human review |
| Intervention Quality | Percent of anomalies correctly contained | > 90% | Feed into new evals |
| Trace Enrichment % | Percent of traces with anomaly attribution | > 95% | Improve decision-log auditability |
| False Positive Rate | Operator-accepted anomalies | < 5% (tunable) | Auto-tune detector sensitivity |
Operating Model Table
| Signal | Question | Immediate Action | Feedback Loop |
|---|---|---|---|
| Rising drift score | Is the agent pursuing a new implicit goal? | Insert approval gate | Add scenario to Practical Evals |
| Tool anomaly spike | Is a tool being used outside learned norms? | Scoped credential downgrade | Update MCP tool-space policy |
| Context poisoning flag | Did retrieval introduce outlier content? | Rebuild context with provenance check | New item for Red Teaming scenarios |
| Cross-agent propagation | Is suspicious behavior spreading? | Isolate agent instance | Incident response playbook update |
Practical Implementation Patterns
Pattern 1: Purple Firefish + Anomaly Layer
Wrap existing Firefish gateway with the detector ensemble. Every tool call and context update is scored before execution.
Pattern 2: Long-Horizon Agent Harness
In managed harnesses, insert checkpointing at every N-step boundary. The anomaly plane runs a lightweight “what changed?” analysis.
Pattern 3: OT/ICS and Cyber Use Cases
Detectors run entirely on-premises with air-gapped baselines. Anomalies trigger passive-only alerts or human-verifiable recommendations.
Pattern 4: Purple Radar Integration
Vulnerability intelligence signals can be ingested as additional context features.
Governance Integration & Operational Learning
Anomalies are written to the decision log with full attribution. Failed or near-miss anomalies automatically become new regression tests and adversarial scenarios—creating a closed learning loop.
Why This Matters
Conclusion & Call to Action
The safest agentic systems are not the ones with the longest policy documents. They are the ones whose runtime continuously watches itself, surfaces the unknown unknowns, and routes them to human judgment before damage occurs.
Prototype it in the Open Labs. Join the Fellowship. Or reach out directly if you are operating agents in production and need this layer operational next week.
Selected References
The Evening Star AI Engine • The Evening Star AI Governance Stack • Red Teaming Agentic AI Systems • Runtime Governance for AI Agents • Practical Evals for Agentic Systems • Long-Horizon Agents, Managed Harnesses, and Context Discipline • NIST AI RMF 1.0 & ISO/IEC 42001 • OpenTelemetry + OpenInference tracing standards