Whitepaper

Anomaly Intelligence for Agentic Systems

A practical framework for unsupervised, drift-aware monitoring of live agent workflows—detecting goal drift, tool anomalies, context poisoning, and hidden failure modes before they trigger unsafe actions or require human intervention.

Thesis

Most agentic systems are governed through static policy, evals, and runtime controls. These layers are necessary, but insufficient once agents operate over long horizons, maintain state, retrieve external context, and chain tool calls.

The most dangerous failures in production are the ones that have never been seen before. They appear as subtle goal drift, unexpected tool parameter distributions, context poisoning that bypasses filters, or cross-agent propagation that evades every predefined rule.

Evening Star thesis In live agentic systems, unsupervised anomaly intelligence is the only scalable way to surface novel failure modes in real time. It turns weak behavioral signals into governed interventions, closing the loop between the Governance Stack, red team findings, and operational reality.

This paper builds directly on The Evening Star AI Engine, Runtime Governance for AI Agents, Red Teaming Agentic AI Systems, Practical Evals for Agentic Systems, and Long-Horizon Agents, Managed Harnesses, and Context Discipline. It supplies the missing continuous runtime layer.

From Static Testing to Continuous Drift Intelligence

Static red teaming and evals catch known attacks. They pressure the prompt, the tool plan, or the policy boundary at test time.

Production agents, however, live in open loops:

  • Goal drift over dozens of steps
  • Tool misuse that looks “plausible” to the model
  • Context poisoning that accumulates across retrieval cycles
  • Silent policy erosion through repeated edge cases
  • Cross-agent contamination

Unsupervised anomaly detection is purpose-built for these unknowns. It learns normal behavior per agent, per workflow, per tool, and per context window, then flags statistically significant deviations without requiring new labeled attacks.

This is continuous purple teaming: autonomous red/blue co-evolution inside the agent loop, feeding directly into the six-layer Governance Stack.

Reference Architecture: The Anomaly Intelligence Control Plane

The control plane sits between the agent runtime and the existing governance layers. It consumes traces from Purple Firefish or any policy gateway, decision logs, tool broker outputs, and OpenTelemetry spans.

Figure 1: Anomaly Intelligence Control Plane

Agent Request → Context Builder → Tool Broker (via Purple Firefish)
        ↓
Anomaly Detector Ensemble
├── Baseline models (per-agent, per-tool, per-workflow)
├── Drift detectors (concept, data, prediction)
├── Agreement scoring (model vs. ensemble)
├── Uncertainty quantification
└── Attribution engine (which context/tool caused the spike)
        ↓
Anomaly Score + Explanation → Policy Decision Point
        ↓
Action: Auto-throttle | Require approval | Escalate to incident response | Enrich decision log

Key Metrics and Scoring

Metric Definition Target / Threshold Governance Response
Drift Score Statistical deviation from learned baseline > 3σ or ADWIN trigger Enrich context, require approval
Tool-Anomaly Rate Unusual parameter distributions or sequences > 95th percentile Throttle tool, log for red team
Confidence Divergence Model confidence vs. ensemble uncertainty > 20% delta Force human review
Intervention Quality Percent of anomalies correctly contained > 90% Feed into new evals
Trace Enrichment % Percent of traces with anomaly attribution > 95% Improve decision-log auditability
False Positive Rate Operator-accepted anomalies < 5% (tunable) Auto-tune detector sensitivity

Operating Model Table

Signal Question Immediate Action Feedback Loop
Rising drift score Is the agent pursuing a new implicit goal? Insert approval gate Add scenario to Practical Evals
Tool anomaly spike Is a tool being used outside learned norms? Scoped credential downgrade Update MCP tool-space policy
Context poisoning flag Did retrieval introduce outlier content? Rebuild context with provenance check New item for Red Teaming scenarios
Cross-agent propagation Is suspicious behavior spreading? Isolate agent instance Incident response playbook update

Practical Implementation Patterns

Pattern 1: Purple Firefish + Anomaly Layer

Wrap existing Firefish gateway with the detector ensemble. Every tool call and context update is scored before execution.

Pattern 2: Long-Horizon Agent Harness

In managed harnesses, insert checkpointing at every N-step boundary. The anomaly plane runs a lightweight “what changed?” analysis.

Pattern 3: OT/ICS and Cyber Use Cases

Detectors run entirely on-premises with air-gapped baselines. Anomalies trigger passive-only alerts or human-verifiable recommendations.

Pattern 4: Purple Radar Integration

Vulnerability intelligence signals can be ingested as additional context features.

Governance Integration & Operational Learning

Anomalies are written to the decision log with full attribution. Failed or near-miss anomalies automatically become new regression tests and adversarial scenarios—creating a closed learning loop.

Why This Matters

The organizations that win will not be the ones with the most rules—they will be the ones that notice when they are drifting before anyone else does.

Conclusion & Call to Action

The safest agentic systems are not the ones with the longest policy documents. They are the ones whose runtime continuously watches itself, surfaces the unknown unknowns, and routes them to human judgment before damage occurs.

Prototype it in the Open Labs. Join the Fellowship. Or reach out directly if you are operating agents in production and need this layer operational next week.

Selected References

The Evening Star AI Engine • The Evening Star AI Governance Stack • Red Teaming Agentic AI Systems • Runtime Governance for AI Agents • Practical Evals for Agentic Systems • Long-Horizon Agents, Managed Harnesses, and Context Discipline • NIST AI RMF 1.0 & ISO/IEC 42001 • OpenTelemetry + OpenInference tracing standards