Layer 1
Signal
Intent, context, risk, and domain features are extracted in parallel.
Architecture
Requests become structured signals, policy chooses the route, and plugins, guardrails, and model selection complete controlled execution.
Layer 1
Intent, context, risk, and domain features are extracted in parallel.
Layer 2
Explicit policy uses priority to select the one permitted execution path.
Layer 3
Cache, RAG, memory, and safety attach as needed before model selection is finalized.
3
Core layers
Signal, decision, and plugin plus selection define the runtime.
13
Signal families
Routing no longer hardcodes intent in applications. It combines heuristic and learned signals.
7
Selection methods
From static rules to adaptive bandits, latency-aware routing, and multi-round reasoning.
1
Policy language
The DSL makes policy readable, reviewable, and compilable.
Signal Engine
Each request first yields reusable signals, turning routing from scattered logic into unified policy execution.
Sub-millisecond checks handle deterministic splits and boundary control without extra ML cost.
ModernBERT and LoRA adapters carry richer signals for domain, difficulty, safety, and preference.
Decision Runtime
Business rules, model choice, safety guardrails, and attached behaviors all run on the same brain instead of being scattered across gateways, prompts, and application services.
Turns business rules into auditable routing behavior rather than burying them in heuristics.
Priority-aware decisions
Policy review and versioning
Bind model pools and plugins per decision
Once a decision matches, the system chooses the best-fit model from the allowed pool rather than pinning requests to one model.
Static and Elo scoring
Embedding, cascade, and latency-aware methods
ReMoM multi-round synthesis for complex tasks
Cache, RAG, memory, prompt shaping, headers, and HaluGate attach per decision instead of being scattered through application code.
Pre-route and post-route plugins
Fast-response safety interception
Shared context lifecycle
Guardrails
Jailbreak detection, privacy control, factuality checks, and audit metadata share the same decision path as routing and model selection.
Safety classifiers run in parallel with the rest of the signal graph, so guardrails do not become a serial cost on every request.
Jailbreak, PII, domain, keyword, and complexity rules combine on one decision surface instead of living in isolated enforcement chains.
Safety outcomes appear in routing metadata, headers, traces, and audit logs together with the rest of the request lifecycle.
Brain
Input threats, privacy risks, factuality issues, and compliance traces are attached by decision rather than patched in afterward.
A two-path design combines fast single-turn classification with contrastive embedding checks for multi-turn escalation chains.
Configurable sensitivity thresholds
Single-turn and multi-turn coverage
Fast response interception when required
Token-level classification detects personal information and allows different allow-lists by domain instead of one global privacy stance.
Span-level entity detection
Per-domain policy variation
Block or allow by sensitivity
A gated three-stage pipeline triggers deeper hallucination checks only for factual queries, reducing wasted detection cost.
Sentinel gate
Token-span detector
NLI explainer and action modes
The same brain records safety signals, routing choices, execution actions, and observability data for compliance and post-incident review.
HTTP metadata propagation
Security event logs
OpenTelemetry-compatible tracing
HaluGate
Hallucination checks should be workload-aware. Non-factual tasks should not pay the same inspection cost as factual ones.
Action modes
One result can block the answer, write metadata into headers, annotate the body, or remain observational for monitoring and threshold tuning.
Stage 1
Run only the sentinel gate first to learn which traffic is factual and worth deeper inspection.
Stage 2
Add the detector to capture unsupported spans and calibrate thresholds before changing user-facing behavior.
Stage 3
Enable the explainer and response actions once the organization is ready to turn observation into policy.
DSL
The DSL turns semantic routing into an explicit programming model for complex control.
The intended audience is not only infrastructure engineers. Compliance and platform teams should be able to review routing intent directly.
One source can compile into flat YAML, Kubernetes CRDs, or Helm values depending on the operating environment.
We treat semantic routing as a programming problem so complex control becomes expressible, verifiable, and compilable, allowing AI to help humans build governed routing systems.
Related Pages
Go to products for cloud and edge delivery, or continue to research for the frontier work behind routing, safety, and runtime design.