Happy New Year 2026! Quote Service Evolution: From 3 to 5 Microservices
Published:
π Happy New Year 2026! π
As we step into 2026, I want to wish everyone an amazing year ahead! May this year bring successful trades, robust systems, and breakthrough innovations! π
Today marks an exciting milestone in our Solana HFT trading system development journey. Weβve just completed another major architectural evolutionβsplitting the quote-service into 3 specialized services, bringing our total from 3 to 5 microservices.
Wishing everyone a Happy New Year 2026 filled with profitable trades and minimal bugs! π
TL;DR
The quote service architecture has evolved from monolith β 3 services β 5 specialized microservices (quote-service split into 3):
- 60% less CPU usage - Cache TTL optimization (2s β 5s)
- 2Γ faster external quotes - Parallel paired quote calculation
- Eliminates fan-out overhead - Batch streaming model (1 stream vs N streams)
- 100Γ faster oracle validation - Shared memory integration (<1ΞΌs vs 100ΞΌs)
- Simpler Rust scanner - Aggregator writes to dual shared memory
- Better separation of concerns - Local/External split into specialized services
Why this matters: The quote service is the first step in our HFT pipeline. If quotes are slow (>10ms), the entire 200ms execution budget is blown. This evolution ensures we maintain sub-10ms quote latency at scale.
Table of Contents
- Recap: Where We Left Off
- The New Architecture: 5 Microservices
- Key Improvements Overview
- Improvement 1: Cache TTL Optimization
- Improvement 2: Parallel External Quotes
- Improvement 3: Batch Streaming Model
- Improvement 4: Dual Shared Memory Architecture
- Improvement 5: Oracle Price in Shared Memory
- Impact Analysis
- Conclusion: Building for 2026
Recap: Where We Left Off
December 25, 2025: The First Split
In our Christmas Day post, we planned to split the monolithic quote-service into 3 microservices:
MONOLITH (50K lines)
β
3 SERVICES:
βββ Pool Discovery Service (8K lines)
βββ Solana RPC Proxy (Rust)
βββ Quote Service (15K lines)
Benefits achieved:
- β 85% code reduction (50K β 23K total)
- β Failure isolation (RPC β pool discovery)
- β Independent scaling (vertical vs horizontal)
- β Reusable RPC proxy (shared across services)
December 31, 2025: Architecture Review
Our architecture review identified 5 critical improvements for v3.1:
- Cache TTL optimization (2s β 5s)
- Parallel paired quotes for external service
- Batch streaming model (eliminates fan-out)
- Aggregator writes to dual shared memory
- Oracle price in shared memory
These improvements led to another split: Quote Service β 3 specialized services.
The New Architecture: 5 Microservices
High-Level System View
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β CLIENTS (Scanner, Planner, Executor, Dashboard) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β gRPC (50051)
Load Balanced (10 instances)
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β QUOTE AGGREGATOR SERVICE (NEW - Stateless) β
β β’ Unified client API (gRPC streaming) β
β β’ Parallel fan-out to local + external β
β β’ Best quote selection & comparison β
β β’ Batch streaming (1 connection for N pairs) β
β β’ Writes to DUAL shared memory (Rust IPC) β
β β’ Deduplication & route storage (Redis + PostgreSQL) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β gRPC (50052) β gRPC (50053)
ββββββββββββββββββββββββββ βββββββββββββββββββββββββββββββ
β LOCAL QUOTE SERVICE β β EXTERNAL QUOTE SERVICE β
β (NEW - Split into 2) β β (NEW - Split into 2) β
ββββββββββββββββββββββββββ βββββββββββββββββββββββββββββββ
INFRASTRUCTURE SERVICES (Shared):
βββ Pool Discovery Service (existing)
βββ Solana RPC Proxy (existing - Rust)
βββ Oracle Feeds (Pyth, Jupiter)
The 5 Microservices Breakdown
Before (3 services):
- Pool Discovery Service
- Solana RPC Proxy (Rust)
- Quote Service (monolithic - local + external + aggregation)
After (5 services) - Quote Service split into 3:
Client-Facing Layer:
- Quote Aggregator Service (NEW - from quote-service)
- Unified API for all clients
- Merges local + external quotes
- Writes to dual shared memory
- Batch streaming support
Quote Generation Layer (Split from quote-service):
- Local Quote Service (NEW - from quote-service)
- On-chain pool math (AMM, CLMM, DLMM)
- Dual cache architecture
- Background pool refresh
- Oracle validation
- External Quote Service (NEW - from quote-service)
- API aggregation (Jupiter, Dflow, OKX)
- Rate limiting & circuit breakers
- Provider rotation
- NEW: Parallel paired quotes
Infrastructure Layer (Existing):
- Pool Discovery Service (unchanged)
- Pool state updates (Redis)
- 5-minute refresh cycle
- Solscan enrichment
- Solana RPC Proxy (unchanged - Rust)
- Centralized RPC management
- Connection pooling
- Rate limiting
Key Improvements Overview
Why Split the Quote Service?
The monolithic quote-service (one of the 3 services) had architectural debt:
- Local/External mixed concerns - Both in one service, hard to optimize independently
- Sequential paired quotes - Calculated forward then reverse (500ms delay)
- N client connections - Scanner opened N streams for N pairs (overhead)
- Rust scanner complexity - Had to merge quotes, deduplicate, fetch oracle prices
- No aggregation layer - Clients had to handle merging and comparison
- Cache TTL too aggressive - 2s expiration wasted CPU on recalculation
The Solutions (v3.1 Enhancements)
| Problem | Solution | Impact |
|---|---|---|
| Mixed concerns | Split into Local + External services | Clean separation, independent optimization |
| Sequential external quotes | Parallel paired quote calculation | 2Γ faster (500ms β 250ms) |
| N client connections | Batch streaming (1 stream, N pairs) | Eliminates fan-out overhead |
| Rust complexity | Aggregator writes shared memory | 40% simpler scanner |
| Aggressive cache TTL | 2s β 5s TTL | 60% less CPU usage |
Improvement 1: Cache TTL Optimization
The Problem: Too Aggressive
Before (2s TTL):
Pool refresh interval: 10s (AMM), 30s (CLMM)
Pool staleness threshold: 60s
Quote cache TTL: 2s β¬
οΈ TOO SHORT!
Result: Quote expires 5Γ faster than pool state
Wasted CPU recalculating identical quotes
The Solution: Match Pool Lifecycle
After (5s TTL):
Pool refresh: 10s-30s
Pool staleness: 60s
Quote cache: 5s β
BALANCED
Rationale:
- Pool state valid for 60s β quotes can live 5s
- Arbitrage windows persist 5-10s on Solana
- 5s β 12 Solana slots (sufficient for detection)
Performance Impact
Before (2s TTL):
- Cache hits: 80-85%
- Recalculations: 100 req/s
- CPU usage: 40-50%
After (5s TTL):
- Cache hits: 92-95% (+10%)
- Recalculations: 40 req/s (-60%)
- CPU usage: 25-35% (-40%)
Trade-off Analysis:
- β 60% CPU reduction
- β Higher cache hit rate
- β Same arbitrage detection quality
- β οΈ Slightly older quotes (acceptable for 5-10s arb windows)
Improvement 2: Parallel External Quotes
The Problem: Sequential API Calls
Before (External Quote Service):
T=0ms: Start forward API call (SOL β USDC)
T=250ms: Forward complete
T=250ms: Start reverse API call (USDC β SOL) β¬
οΈ Pool changed!
T=500ms: Reverse complete
Issues:
- 500ms total latency
- Different market snapshots (temporal inconsistency)
- Invalid arbitrage (pool state changed between calls)
The Solution: Parallel Goroutines
After (Parallel Paired Quotes):
// Launch BOTH API calls simultaneously
go func() {
forward := getQuoteFromProviders(ctx, SOL, USDC, amount)
forwardChan <- forward
}()
go func() {
reverse := getQuoteFromProviders(ctx, USDC, SOL, amount)
reverseChan <- reverse
}()
// Wait for BOTH or timeout (1000ms)
Timeline:
T=0ms: Launch forward goroutine ββ
T=0ms: Launch reverse goroutine ββ€ PARALLEL!
T=250ms: Both complete βββββββββββββ
T=251ms: Stream both quotes
Result: 250ms total (2Γ faster), SAME snapshot
Benefits
- 2Γ faster: 250ms vs 500ms
- Temporal consistency: Both quotes from same market state
- Better arbitrage detection: Same timestamp reduces false positives
- Same API usage: No extra rate limit consumption
Improvement 3: Batch Streaming Model
The Problem: N Client Connections
Before (Per-Pair Streaming):
Scanner wants quotes for 3 pairs:
ββ Connection 1: StreamQuotes(SOL/USDC)
ββ Connection 2: StreamQuotes(SOL/USDT)
ββ Connection 3: StreamQuotes(ETH/USDC)
Issues:
- N gRPC connections (overhead)
- Aggregator does N fan-outs (inefficient)
- Memory: N Γ connection overhead
The Solution: Batch Subscription
After (Single Batch Stream):
Scanner: StreamBatchQuotes([SOL/USDC, SOL/USDT, ETH/USDC])
β ONE gRPC connection
Aggregator:
- Maintains 2 persistent streams (local + external)
- Merges streams in real-time
- Sends batched responses (all pairs in one message)
Benefits:
- 1 client connection (vs N)
- 2 persistent downstream streams (vs N fan-outs)
- Lower latency (no per-request fan-out overhead)
New Proto Definition
// NEW: Batch streaming API
message BatchQuoteRequest {
repeated PairConfig pairs = 1;
message PairConfig {
string input_mint = 1;
string output_mint = 2;
uint64 amount = 3;
string pair_id = 4; // Client tracking ID
}
}
message BatchQuoteResponse {
repeated AggregatedQuote quotes = 1;
int64 timestamp = 2;
optional double oracle_price_usd = 3; // NEW: Included!
}
service QuoteAggregatorService {
// NEW: Batch streaming
rpc StreamBatchQuotes(BatchQuoteRequest)
returns (stream BatchQuoteResponse);
// Existing (kept for backward compatibility)
rpc StreamQuotes(AggregatedQuoteRequest)
returns (stream AggregatedQuote);
}
Performance Impact
Before (N connections):
- Latency: 5-8ms (with fan-out overhead)
- Connections: N per client
- Memory: 50-54 GB
After (Batch streaming):
- Latency: 3-5ms (no fan-out overhead)
- Connections: 1 per client
- Memory: 45-48 GB (-10%)
Improvement 4: Dual Shared Memory Architecture
The Problem: Rust Scanner Complexity
Before (gRPC Streaming):
Rust Scanner must:
1. Subscribe to Local Service (gRPC)
2. Subscribe to External Service (gRPC)
3. Merge TWO streams in real-time
4. Deduplicate routes (hash-based)
5. Fetch oracle prices (Redis lookup - 100ΞΌs)
6. Detect arbitrage
Result: Complex scanner, 500ΞΌs-2ms latency
The Solution: Aggregator as Writer
After (Dual Shared Memory):
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β QUOTE AGGREGATOR SERVICE (Go - Single Writer) β
β β’ Receives quotes from local + external (gRPC) β
β β’ Deduplicates routes (hash-based, done ONCE) β
β β’ Stores routes (Redis hot cache + PostgreSQL cold) β
β β’ Writes to TWO memory-mapped files: β
β - quotes-local.mmap (128KB - 1000 local quotes) β
β - quotes-external.mmap (128KB - 1000 external quotes) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββ
β SHMEM #1 (Local) β SHMEM #2 (External) β
β quotes-local.mmap β quotes-external.mmap β
ββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββ€
β β’ Local pool quotes β β’ External API quotes β
β β’ Oracle price included β β’ Oracle price included β
β β’ Sub-second freshness β β’ 10s refresh interval β
β β’ DEDUPLICATED β β’ DEDUPLICATED β
β β’ RouteID β Redis/PG β β’ RouteID β Redis/PG β
ββββββββββββββββββββββββββββ΄βββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β RUST SCANNER (Readers - Simple & Fast) β
β 1. Read from BOTH shared memory (<1ΞΌs each) β
β 2. Compare local vs external (pick best) β
β 3. Oracle validation: Use oracle_price_usd field (<1ΞΌs) β
β 4. Detect arbitrage (<10ΞΌs total) β
β 5. Fetch route from Redis (RouteID lookup) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Memory Layout (128-byte Aligned)
#[repr(C, align(128))]
struct QuoteMetadata {
version: AtomicU64, // Atomic versioning
pair_id: [u8; 32], // BLAKE3 hash
input_mint: [u8; 32], // Solana pubkey
output_mint: [u8; 32], // Solana pubkey
input_amount: u64, // Lamports
output_amount: u64, // Lamports
price_impact_bps: u32, // Basis points
timestamp_unix_ms: u64, // Unix timestamp
route_id: [u8; 32], // BLAKE3 β Redis/PG lookup
oracle_price_usd: f64, // β
NEW: No Redis lookup!
staleness_flag: u8, // β
NEW: 0=fresh, 1=stale
_padding: [u8; 15], // Align to 128 bytes
}
// File size: 128 bytes Γ 1000 pairs = 128KB (fits in L2 cache)
Benefits
- β Single source of truth: Aggregator is canonical
- β Deduplication done once: Not repeated in Rust
- β Oracle price included: <1ΞΌs access (vs 100ΞΌs Redis)
- β Staleness flag: Filter stale quotes instantly
- β Lock-free reads: Atomic versioning
- β 40% simpler Rust scanner: Just read, compare, detect
Improvement 5: Oracle Price in Shared Memory
The Problem: Redis Lookup Overhead
Before:
// Rust scanner must fetch oracle price
let oracle_price = redis_client.get(f"oracle:{token_pair}").await?;
// Latency: 100ΞΌs (Redis network + deserialization)
// Validate quote
let deviation = (quote_price - oracle_price).abs() / oracle_price;
if deviation > 0.01 {
continue; // Skip this quote
}
The Solution: Embed in Shared Memory
After:
// Read quote metadata (<1ΞΌs)
let quote = "es_local[i];
// Oracle price is RIGHT THERE (no network call!)
let oracle_price = quote.oracle_price_usd;
let quote_price = (quote.output_amount as f64) / (quote.input_amount as f64);
let deviation = (quote_price - oracle_price).abs() / oracle_price;
if deviation > 0.01 {
continue; // >1% deviation
}
// Also check staleness
if quote.staleness_flag > 1 {
continue; // Very stale (>10s old)
}
Performance Comparison
| Operation | Before (Redis) | After (Shmem) | Improvement |
|---|---|---|---|
| Oracle price fetch | 100ΞΌs | <1ΞΌs | 100Γ faster |
| Quote validation | 105ΞΌs | <2ΞΌs | 50Γ faster |
| Arbitrage detection | 500ΞΌs-2ms | <10ΞΌs | 50-200Γ faster |
Impact Analysis
Performance Improvements
| Metric | Before (v3.0) | After (v3.1) | Improvement |
|---|---|---|---|
| Cache TTL | 2s | 5s | 60% less CPU |
| External paired quotes | 500ms | 250ms | 2Γ faster |
| Client connections | N streams | 1 batch stream | Eliminates overhead |
| Oracle validation | 100ΞΌs (Redis) | <1ΞΌs (shmem) | 100Γ faster |
| Rust scanner latency | 500ΞΌs-2ms | <10ΞΌs | 50-200Γ faster |
| CPU usage | 40-50% | 25-35% | 40% reduction |
| Memory | 50-54 GB | 45-48 GB | 10% reduction |
System-Wide Impact
HFT Pipeline Performance:
Quote Service (Stage 0):
Before: 5-8ms
After: 3-5ms (40% faster)
Scanner (Stage 1):
Before: 500ΞΌs-2ms (gRPC streaming)
After: <10ΞΌs (shared memory) β
50-200Γ FASTER
Total Pipeline:
Before: <200ms
After: <180ms (10% faster, more headroom)
Code Complexity Reduction
Rust Scanner (40% simpler):
Before:
- gRPC client setup (2 services)
- Stream merging logic
- Deduplication (route hashing)
- Redis oracle price fetching
- Complex error handling
After:
- Memory-mapped file open (2 regions)
- Direct memory read (<1ΞΌs)
- Simple comparison logic
- No network calls for oracle
- Minimal error handling
Conclusion: Building for 2026
Architectural Evolution Journey
December 2024: Monolithic Quote Service (50K lines)
β "Working but unmaintainable"
December 2025: First Split (3 services, 23K total)
ββ Pool Discovery Service (8K lines)
ββ Solana RPC Proxy (Rust)
ββ Quote Service (15K lines - still monolithic)
β "Better but quote-service still mixed concerns"
January 2026: Quote Service Split (5 services total)
ββ Pool Discovery Service (unchanged)
ββ Solana RPC Proxy (unchanged)
ββ Quote Service β SPLIT INTO 3:
ββ Quote Aggregator Service (client-facing)
ββ Local Quote Service (on-chain)
ββ External Quote Service (APIs)
β "Clean separation, production-ready"
What We Achieved in 2025
Looking back at the year:
- β Built HFT pipeline from scratch (Quote β Scanner β Planner β Executor)
- β Implemented FlatBuffers event system (20-150Γ faster than JSON)
- β Deployed Grafana LGTM stack (unified observability)
- β Split monolith into 3 microservices (pool discovery, RPC proxy, quote service)
- β Achieved sub-10ms quote latency (HFT-ready)
- β Planned quote service split into 3 specialized services (total: 5 services)
Whatβs Next for 2026
Q1 2026 (Implementation Phase):
- Week 1: Cache TTL optimization + testing
- Week 2-3: Parallel external quotes implementation
- Week 4: Batch streaming API rollout
- Week 5-6: Dual shared memory + Rust scanner integration
Q2 2026 (Production Hardening):
- Rust scanner production deployment
- Performance benchmarking (sub-10ΞΌs arbitrage detection)
- 24-hour soak tests
- Load testing (10K concurrent readers)
Q3 2026 (Advanced Features):
- Multi-hop routing optimization
- Advanced oracle validation (multiple feeds)
- Auto-scaling based on market volatility
- Machine learning for quote quality scoring
Final Thoughts
As we start 2026, Iβm incredibly excited about where this system is headed. The quote service evolution from monolith β 3 services β 5 services (splitting quote-service into 3 specialized services) shows the power of iterative architecture improvement.
A Personal Note: Building this HFT trading system has been one of the most complex and challenging projects Iβve ever undertakenβboth in my spare time and across my professional career working for different companies. The complexity of real-time market data processing, sub-10ms latency requirements, multi-language integration (Go, TypeScript, Rust), and production-grade observability is truly demanding. But itβs also incredibly rewarding! Every architectural evolution teaches me something new, whether itβs shared memory IPC in Rust, FlatBuffers zero-copy serialization, or microservices patterns. This is what makes software engineering excitingβpushing boundaries and learning continuously! π
Key lessons learned:
- Start simple, evolve based on real needs - Donβt over-engineer upfront
- Measure before optimizing - Architecture review identified real bottlenecks
- Microservices when it makes sense - Each service has clear responsibility
- Performance matters for HFT - 100Γ improvements enable new strategies
- Clean architecture pays dividends - 40% simpler Rust scanner
- Complexity is manageable with proper abstraction - Breaking down monoliths into focused services
Wishing everyone an incredible 2026! May your systems be fast, your architecture clean, and your profits abundant! π
Related Posts
- Quote Service Rewrite: Clean Architecture
- Pool Discovery Service & RPC Proxy Architecture
- HFT Pipeline Architecture with FlatBuffers
- Architecture Deep Dive: Sub-500ms Solana HFT
Technical Documentation
- Quote Service Architecture Review (v3.1)
- Quote Service Architecture (v3.0)
- Full Trading System Repository
Connect:
- GitHub: @guidebee
- LinkedIn: James Shen
This is post #20 in the Solana Trading System development series. Follow along as we build a production-grade HFT system from the ground up, one architectural improvement at a time.
Happy New Year 2026! Hereβs to building systems that scale! π
