Back to Blog
Engineering Stories#AI#Architecture#MCP#Security#Go

Orion: Building a Sovereign Multi-Agent AI Orchestrator

Gerald M
8 min read
2026-02-06

Enterprise AI has a trust problem. Most LLM solutions require sending sensitive data to external cloud providers. For financial services companies handling customer telemetry from Splunk, New Relic, and Amplitude, that's a non-starter. We needed an AI platform where zero data leaves the perimeter.

That's how Orion was born.

The Architecture

Hexagonal Architecture (Ports & Adapters)

Orion's core is built on Hexagonal Architecture — strictly separating business logic from external dependencies:

  • Core Domain — Query routing, context assembly, response generation
  • Input Ports — Chat API, CLI, scheduled queries
  • Output Ports — LLM adapters, telemetry connectors, storage
  • Adapters — Pluggable implementations for each port

This means swapping Ollama for a different LLM provider requires changing one adapter — not touching core logic.

MCP Server in Go

The Model Context Protocol (MCP) server is the intelligence router. Built in Go for performance:

  • Routes queries to locally-hosted SLMs (Gemma 3 for text, Llava for vision)
  • Retrieves context from a SQLite FTS5 full-text search vector store
  • Manages multi-turn conversation state with encrypted sessions
  • Handles concurrent agent coordination for complex queries

Zero-Trust Security

Every component operates under zero-trust principles:

  • Cloudflare Zero Trust secures all inbound/outbound traffic
  • SQLCipher AES-256 encrypts the local SQLite database at rest
  • Session tokens are cryptographically signed and time-limited
  • No data ever leaves the enterprise network boundary

The RAG Pipeline

Orion's Retrieval-Augmented Generation pipeline:

  1. Ingest — Pull logs, metrics, and traces from Splunk/New Relic/Amplitude
  2. Index — Chunk and index into SQLite FTS5 with semantic embeddings
  3. Retrieve — On query, perform hybrid search (keyword + vector similarity)
  4. Augment — Inject retrieved context into the LLM prompt
  5. Generate — Local SLM generates a response with source citations
  6. Cache — Cache frequent query patterns for sub-100ms responses

Results

  • Sub-200ms query latency for cached patterns
  • Zero external data exposure — all inference runs locally
  • Gained executive approval and budget for a pilot across three business units
  • Won the TCS AI Hackathon award for best enterprise-grade AI solution

The Plugin System

One of Orion's key design decisions was the plugin architecture:

  • Future LLM providers can be hot-swapped without downtime
  • New telemetry sources (Datadog, Grafana) can be added via adapter plugins
  • Custom prompt templates are version-controlled alongside the codebase

Key Takeaways

  1. Sovereign AI requires architectural discipline — Not just a wrapped LLM
  2. Zero-trust isn't optional — Financial services demand provable data isolation
  3. Go is excellent for MCP servers — Concurrency model matches agent orchestration patterns
  4. Local-first doesn't mean slow — With proper caching, local inference matches cloud latency