Orion: Building a Sovereign Multi-Agent AI Orchestrator | g3jerrie.com

Enterprise AI has a trust problem. Most LLM solutions require sending sensitive data to external cloud providers. For financial services companies handling customer telemetry from Splunk, New Relic, and Amplitude, that's a non-starter. We needed an AI platform where zero data leaves the perimeter.

That's how Orion was born.

The Architecture

Hexagonal Architecture (Ports & Adapters)

Orion's core is built on Hexagonal Architecture — strictly separating business logic from external dependencies:

Core Domain — Query routing, context assembly, response generation
Input Ports — Chat API, CLI, scheduled queries
Output Ports — LLM adapters, telemetry connectors, storage
Adapters — Pluggable implementations for each port

This means swapping Ollama for a different LLM provider requires changing one adapter — not touching core logic.

MCP Server in Go

The Model Context Protocol (MCP) server is the intelligence router. Built in Go for performance:

Routes queries to locally-hosted SLMs (Gemma 3 for text, Llava for vision)
Retrieves context from a SQLite FTS5 full-text search vector store
Manages multi-turn conversation state with encrypted sessions
Handles concurrent agent coordination for complex queries

Zero-Trust Security

Every component operates under zero-trust principles:

Cloudflare Zero Trust secures all inbound/outbound traffic
SQLCipher AES-256 encrypts the local SQLite database at rest
Session tokens are cryptographically signed and time-limited
No data ever leaves the enterprise network boundary

The RAG Pipeline

Orion's Retrieval-Augmented Generation pipeline:

Ingest — Pull logs, metrics, and traces from Splunk/New Relic/Amplitude
Index — Chunk and index into SQLite FTS5 with semantic embeddings
Retrieve — On query, perform hybrid search (keyword + vector similarity)
Augment — Inject retrieved context into the LLM prompt
Generate — Local SLM generates a response with source citations
Cache — Cache frequent query patterns for sub-100ms responses

Results

Sub-200ms query latency for cached patterns
Zero external data exposure — all inference runs locally
Gained executive approval and budget for a pilot across three business units
Won the TCS AI Hackathon award for best enterprise-grade AI solution

The Plugin System

One of Orion's key design decisions was the plugin architecture:

Future LLM providers can be hot-swapped without downtime
New telemetry sources (Datadog, Grafana) can be added via adapter plugins
Custom prompt templates are version-controlled alongside the codebase

Key Takeaways

Sovereign AI requires architectural discipline — Not just a wrapped LLM
Zero-trust isn't optional — Financial services demand provable data isolation
Go is excellent for MCP servers — Concurrency model matches agent orchestration patterns
Local-first doesn't mean slow — With proper caching, local inference matches cloud latency