Whitepaper 01

Evening Star AI Engine

Technical architecture for application-agnostic operational intelligence.

Back to Papers

Most AI systems are optimized to answer questions. Operational environments require a harder capability: systems that observe change, evaluate abnormal behavior, expose uncertainty, explain evidence, and help humans move safely from signal to action.

Evening Star AI Engine is designed as an application-agnostic anomaly intelligence core for that work. It accepts normalized observations from host applications, evaluates them against validated baselines, combines multiple unsupervised detectors, and returns human-verifiable decision support: score, severity, confidence, attribution, drift, detector health, and recommended next action.

The engine is not a chatbot, dashboard, or closed application. It is a reusable judgment layer. Purple Firefish can submit prompt-security features. Purple Radar can submit vulnerability-risk features. Candles Edge can submit market-regime features. Future systems can provide their own adapters without changing the core.

Core boundary The engine does not own the domain. It owns the judgment layer.

Technical Thesis

Evening Star uses a strict separation of responsibility. Host applications collect raw data, maintain user workflows, define policy, display results, and execute remediation. The engine receives feature-ready observations or adapter-transformed records and performs reusable intelligence work: baseline comparison, anomaly scoring, drift evaluation, feature attribution, detector-health reporting, confidence estimation, and generic decision support.

This separation keeps the engine application-agnostic. The same core scoring pipeline can evaluate prompts, vulnerability findings, market observations, telemetry, or operational records as long as the host application provides stable numeric features and metadata.

Core Architecture

The engine is structured as a set of reusable technical layers. Each layer has a narrow responsibility so the intelligence core can stay portable across domains.

Layer Technical responsibility
Host app Owns raw data, UI, policy, remediation, users, and business rules.
Adapter Converts domain records into normalized numeric features while preserving metadata such as entity, timestamp, tenant, or request ID.
Schema validation Verifies feature names, order, types, required columns, and schema hash against the fitted baseline before scoring.
Registry Stores model metadata, baseline summaries, feature schema hashes, detector configuration, and tags for reproducible scoring.
Ensemble Combines Isolation Forest, robust z-score/MAD, and PCA residual detectors using configurable profiles and weights.
Judgment layer Normalizes scores, computes votes, confidence, severity, feature attribution, drift, and action guidance.

Detection Pipeline

A production scoring path should be deterministic, inspectable, and safe to repeat. Evening Star converts records into an application-neutral observation stream, validates the feature contract, scores the observation against the correct baseline, and emits structured output that a human or governed automation can inspect.

  1. Raw event or feature row enters from a host application.
  2. An optional adapter produces a feature-ready observation frame.
  3. Schema validation checks the feature contract against the fitted baseline.
  4. The registry selects the correct model and baseline metadata.
  5. The detector ensemble scores, normalizes, votes, and exposes detector health.
  6. The judgment layer adds attribution, reason codes, drift checks, severity, confidence, decision, and action guidance.
  7. The host application maps the result into its workflow.

Strict no-lookahead mode prevents observations under evaluation from becoming part of the baseline used to judge them. For fitted production use, the preferred lifecycle is fit, validate, predict, and explicitly update the baseline only under policy control.

Model Layer

Evening Star uses an ensemble because anomaly detection is context-dependent. No single detector reliably dominates across all distributions and anomaly types. The engine combines complementary methods and exposes detector agreement rather than hiding it behind one opaque score.

Detector Purpose Control
Isolation Forest General unsupervised outlier detection in tabular feature space. Calibrated and combined with votes rather than trusted alone.
Robust z-score Median/MAD deviation for interpretable feature-level abnormality. Outlier-resistant; zero-MAD and nonfinite cases are guarded.
PCA residual Structure-level deviation from the baseline feature manifold. Full-dimensional reconstruction is prevented; degenerate residuals are reported.

Detector profiles and weights let the ensemble tune sensitivity without hardcoding domain logic. Balanced, conservative, spike-sensitive, and structure-sensitive profiles can change the scoring emphasis while preserving the same output contract.

Output Contract

Evening Star output should be structured enough for machines and clear enough for human operators. The core contract is an inspectable judgment package, not a single sentence of advice.

Output group Representative fields
Score/state score, flag, threshold, severity, decision, confidence
Detector health detector_scores, votes, count, used, failed, ensemble_health
Explanation reason_codes, feature_attributions, top_features, summary
Baseline/drift baseline_rows, scored_rows, feature_count, drift_status, drift_score, recalibration_recommendation
Action/trace action, calibration_mode, model_id, baseline_id, schema_hash

A host application can map this output to its own workflow. Purple Firefish may turn high-confidence prompt anomalies into policy review or blocking. Purple Radar may convert vulnerability anomaly clusters into analyst queues. Candles Edge may display abnormal volatility or regime-change evidence.

Feature Attribution and Human Verification

Anomaly scores are not sufficient in high-consequence environments. Operators need to inspect what drove the score. Evening Star should compute structured feature attributions using robust baseline statistics: feature value, baseline median, baseline MAD, robust z-score, direction, severity, contribution, and reason.

A good attribution payload shows which variables were unusual, whether they were high or low, how far they deviated from baseline, and how much they contributed to the result. That supports the operating principle that uncertainty, assumptions, thresholds, evidence, and confidence should be visible rather than hidden.

Drift, Recalibration, and Baseline Governance

A baseline is not permanent. Users change behavior, systems evolve, markets shift, vulnerabilities age, and adversarial patterns adapt. Evening Star therefore treats drift checks and recalibration recommendations as first-class output. Drift detection compares the fitted baseline against current observations and reports whether the model's idea of normal is still valid.

The registry should track model ID, baseline ID, schema hash, training rows, detector configuration, created timestamp, source metadata, and tags. In production, artifacts should be signed and versioned so only trusted artifacts can be loaded.

Streaming and Observation Contracts

Batch analysis is useful, but operational systems often need low-latency scoring. Evening Star should support detect for one-shot analysis, fit/predict for stable baseline scoring, score_one for streaming events, and score_batch for small operational batches. Baseline updates should be explicit and policy-controlled; the engine should not silently learn from flagged anomalies.

A formal Observation object gives APIs and queues a stable contract: entity_id, timestamp, observation_type, source_app metadata, numeric features, and additional context. DataFrame support should remain primary for analysis, but Observation support makes the engine easier to expose through HTTP APIs, message queues, and streaming services.

Governed Agentic Automation

Evening Star should not hide judgment behind automation. Governed agents should act only after a clear detection, explanation, confidence score, and policy boundary are present. The automation layer should show what it did, why it did it, which evidence it used, and where human approval is required.

Examples: a vulnerability signal can become an enriched priority record; an adversarial prompt can become a policy decision and audit event; a drift event can trigger recalibration review; a market anomaly can become a watchlist flag. The common pattern is governed movement from signal to next step, not replacement of operators.

Required Technical Hardening

Priority Upgrade Reason
1 Signed, versioned artifact bundles Prevents unsafe loading of untrusted model artifacts and improves reproducibility.
2 API auth, RBAC, tenant isolation, audit logs Required before exposing the engine to multi-tenant or enterprise workflows.
3 Structured observability Expose latency, detector failures, drift rates, schema failures, and scoring volume.
4 Domain evaluation packs Prove that Firefish, Radar, and Candles Edge features improve signal quality over naive baselines.
5 Deployment assets Docker, Helm, CI, SBOM, release signing, and repeatable private deployment.

Conclusion

Evening Star AI Engine should remain an engine first: reusable, inspectable, application-agnostic, and governed. Its value is not that it replaces a domain application. Its value is that many applications can use the same intelligence layer to detect change, explain abnormal behavior, expose uncertainty, and map signal to action.

The technical direction is clear: harden the artifact lifecycle, preserve the adapter boundary, improve observability, strengthen evaluation, and keep every output human-verifiable. In high-consequence environments, useful AI is not merely fluent. It is inspectable, calibrated, and operationally useful.

Operating alignment Context ingestion, change detection, impact reasoning, action mapping, human-verifiable output, responsible automation, and application-agnostic design.