Enterprise AI has a trust problem. Most LLM solutions require sending sensitive data to external cloud providers. For financial services companies handling customer telemetry from Splunk, New Relic, and Amplitude, that's a non-starter. We needed an AI platform where zero data leaves the perimeter.
That's how Orion was born.
The Architecture
Hexagonal Architecture (Ports & Adapters)
Orion's core is built on Hexagonal Architecture — strictly separating business logic from external dependencies:
- Core Domain — Query routing, context assembly, response generation
- Input Ports — Chat API, CLI, scheduled queries
- Output Ports — LLM adapters, telemetry connectors, storage
- Adapters — Pluggable implementations for each port
This means swapping Ollama for a different LLM provider requires changing one adapter — not touching core logic.
MCP Server in Go
The Model Context Protocol (MCP) server is the intelligence router. Built in Go for performance:
- Routes queries to locally-hosted SLMs (Gemma 3 for text, Llava for vision)
- Retrieves context from a SQLite FTS5 full-text search vector store
- Manages multi-turn conversation state with encrypted sessions
- Handles concurrent agent coordination for complex queries
Zero-Trust Security
Every component operates under zero-trust principles:
- Cloudflare Zero Trust secures all inbound/outbound traffic
- SQLCipher AES-256 encrypts the local SQLite database at rest
- Session tokens are cryptographically signed and time-limited
- No data ever leaves the enterprise network boundary
The RAG Pipeline
Orion's Retrieval-Augmented Generation pipeline:
- Ingest — Pull logs, metrics, and traces from Splunk/New Relic/Amplitude
- Index — Chunk and index into SQLite FTS5 with semantic embeddings
- Retrieve — On query, perform hybrid search (keyword + vector similarity)
- Augment — Inject retrieved context into the LLM prompt
- Generate — Local SLM generates a response with source citations
- Cache — Cache frequent query patterns for sub-100ms responses
Results
- Sub-200ms query latency for cached patterns
- Zero external data exposure — all inference runs locally
- Gained executive approval and budget for a pilot across three business units
- Won the TCS AI Hackathon award for best enterprise-grade AI solution
The Plugin System
One of Orion's key design decisions was the plugin architecture:
- Future LLM providers can be hot-swapped without downtime
- New telemetry sources (Datadog, Grafana) can be added via adapter plugins
- Custom prompt templates are version-controlled alongside the codebase
Key Takeaways
- Sovereign AI requires architectural discipline — Not just a wrapped LLM
- Zero-trust isn't optional — Financial services demand provable data isolation
- Go is excellent for MCP servers — Concurrency model matches agent orchestration patterns
- Local-first doesn't mean slow — With proper caching, local inference matches cloud latency