Architecture

A three-layer runtime for governed execution.

Requests become structured signals, policy chooses the route, and plugins, guardrails, and model selection complete controlled execution.

View guardrails View research

Layer 1

Signal

Intent, context, risk, and domain features are extracted in parallel.

Layer 2

Decision

Explicit policy uses priority to select the one permitted execution path.

Layer 3

Plugin + Selection

Cache, RAG, memory, and safety attach as needed before model selection is finalized.

Core layers

Signal, decision, and plugin plus selection define the runtime.

Signal families

Routing no longer hardcodes intent in applications. It combines heuristic and learned signals.

Selection methods

From static rules to adaptive bandits, latency-aware routing, and multi-round reasoning.

Policy language

The DSL makes policy readable, reviewable, and compilable.

Signal Engine

Before routing, every request becomes structured state.

Each request first yields reusable signals, turning routing from scattered logic into unified policy execution.

Heuristic signals

Sub-millisecond checks handle deterministic splits and boundary control without extra ML cost.

keywordcontextlanguageauthz

Learned signals

ModernBERT and LoRA adapters carry richer signals for domain, difficulty, safety, and preference.

embeddingdomainfact_checkuser_feedbackmodalitycomplexityjailbreakpiipreference

Decision Runtime

Policy, selection, and plugins converge inside one runtime.

Business rules, model choice, safety guardrails, and attached behaviors all run on the same brain instead of being scattered across gateways, prompts, and application services.

Decision engine

Turns business rules into auditable routing behavior rather than burying them in heuristics.

Priority-aware decisions

Policy review and versioning

Bind model pools and plugins per decision

Model selection

Once a decision matches, the system chooses the best-fit model from the allowed pool rather than pinning requests to one model.

Static and Elo scoring

Embedding, cascade, and latency-aware methods

ReMoM multi-round synthesis for complex tasks

Plugin chain

Cache, RAG, memory, prompt shaping, headers, and HaluGate attach per decision instead of being scattered through application code.

Pre-route and post-route plugins

Fast-response safety interception

Shared context lifecycle

Guardrails

Safety is not an add-on. It is part of the architecture.

Jailbreak detection, privacy control, factuality checks, and audit metadata share the same decision path as routing and model selection.

Near-zero added latency

Safety classifiers run in parallel with the rest of the signal graph, so guardrails do not become a serial cost on every request.

Composable policy

Jailbreak, PII, domain, keyword, and complexity rules combine on one decision surface instead of living in isolated enforcement chains.

Unified observability

Safety outcomes appear in routing metadata, headers, traces, and audit logs together with the rest of the request lifecycle.

Brain

Detection, action, and audit close the loop in one brain.

Input threats, privacy risks, factuality issues, and compliance traces are attached by decision rather than patched in afterward.

Jailbreak detection

A two-path design combines fast single-turn classification with contrastive embedding checks for multi-turn escalation chains.

Configurable sensitivity thresholds

Single-turn and multi-turn coverage

Fast response interception when required

PII detection

Token-level classification detects personal information and allows different allow-lists by domain instead of one global privacy stance.

Span-level entity detection

Per-domain policy variation

Block or allow by sensitivity

HaluGate

A gated three-stage pipeline triggers deeper hallucination checks only for factual queries, reducing wasted detection cost.

Sentinel gate

Token-span detector

NLI explainer and action modes

Audit trail

The same brain records safety signals, routing choices, execution actions, and observability data for compliance and post-incident review.

HTTP metadata propagation

Security event logs

OpenTelemetry-compatible tracing

HaluGate

Decide whether deep inspection is worth it before paying for it.

Hallucination checks should be workload-aware. Non-factual tasks should not pay the same inspection cost as factual ones.

Action modes

How results take effect

blockheaderbodyobserve only

One result can block the answer, write metadata into headers, annotate the body, or remain observational for monitoring and threshold tuning.

Stage 1

Run only the sentinel gate first to learn which traffic is factual and worth deeper inspection.

Stage 2

Add the detector to capture unsupported spans and calibrate thresholds before changing user-facing behavior.

Stage 3

Enable the explainer and response actions once the organization is ready to turn observation into policy.

DSL

The policy language is part of the architecture.

The DSL turns semantic routing into an explicit programming model for complex control.

Readable by operators

The intended audience is not only infrastructure engineers. Compliance and platform teams should be able to review routing intent directly.

Compilable for infrastructure

One source can compile into flat YAML, Kubernetes CRDs, or Helm values depending on the operating environment.

Routing as programming

We treat semantic routing as a programming problem so complex control becomes expressible, verifiable, and compilable, allowing AI to help humans build governed routing systems.

One core system, spanning architecture, guardrails, and product packaging.

Go to products for cloud and edge delivery, or continue to research for the frontier work behind routing, safety, and runtime design.

View products View research