Quote Service Architecture - Source of Truth
Quote Service Architecture - Source of Truth
Date: December 31, 2025 Status: π― Production Architecture - Canonical Reference Version: 3.1 (Enhanced: Batch Streaming + Dual Shared Memory + Parallel External Quotes) Supersedes: Docs 22, 24, 26, 31, 32, 33, 34 Last Review: December 31, 2025 (Solution Architect + HFT Expert Review)
Table of Contents
- Executive Summary
- System Context
- Architecture Evolution
- Current Architecture: 3-Microservice Design
- Service 1: Local Quote Service
- Service 2: External Quote Service
- Service 3: Quote Aggregator Service
- Data Flow & Integration
- Performance Characteristics
- Deployment & Operations
Executive Summary
What is the Quote Service?
The Quote Service is the engine core of the Solana HFT trading system. It sits at the beginning of the critical path, providing sub-10ms quote generation for arbitrage detection and execution.
Mission: Deliver fast, accurate, and reliable quotes that enable <200ms total execution time from opportunity detection to profit.
Current State: 3-Microservice Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β CLIENTS (Scanner, Planner, Executor, Dashboard) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β gRPC (50051)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β QUOTE AGGREGATOR (10 instances, stateless) β
β β’ Unified client API β
β β’ Parallel fan-out to local + external β
β β’ Best quote selection β
β β’ Price comparison & deduplication β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β gRPC (50052) β gRPC (50053)
ββββββββββββββββββββββββββ βββββββββββββββββββββββββββββββ
β LOCAL QUOTE SERVICE β β EXTERNAL QUOTE SERVICE β
β (1-2 instances) β β (5 instances) β
β β β β
β β’ Parallel paired β β β’ API aggregation β
β quote calculation β β β’ Rate limiting (1 RPS) β
β β’ Dual cache (pool + β β β’ Circuit breakers β
β quote response) β β β’ Provider rotation β
β β’ Background refresh β β β’ Multi-region support β
β β’ <5ms latency β β β’ 200-500ms latency β
ββββββββββββββββββββββββββ βββββββββββββββββββββββββββββββ
β β
ββββββββββββββββββββββββββ βββββββββββββββββββββββββββββββ
β Infrastructure β β External APIs β
β β’ Pool Discovery Redis β β β’ Jupiter (1 RPS) β
β β’ RPC Proxy β β β’ Dflow (5 RPS) β
β β’ Oracle Feeds (Pyth) β β β’ OKX (10 RPS) β
ββββββββββββββββββββββββββ β β’ HumidiFi, GoonFi, ZeroFi β
βββββββββββββββββββββββββββββββ
Key Performance Metrics
| Metric | Target | Actual | Status |
|---|---|---|---|
| Local Quote (cache hit) | <1ms | 0.5-0.8ms | β |
| Local Quote (fresh calc) | <5ms | 2-4ms | β |
| Parallel Paired Quotes | <10ms | 5-8ms | β |
| External Quote (cached) | <1ms | 0.5ms | β |
| External Quote (API) | 200-500ms | 250ms avg | β |
| Aggregated Quote | <10ms | 5-8ms | β |
| Throughput (local) | 2,000 req/s | 2,500 req/s | β |
| Cache Hit Rate | >80% | 85-90% | β |
| Availability | 99.9% | 99.95% | β |
Critical Path in HFT Pipeline
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β QUOTE-SERVICE β THIS SYSTEM β
β β’ Parallel paired quotes: <10ms β
β β’ gRPC streaming to clients β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β SCANNER (Rust - production, TypeScript - prototyping) β
β β’ Fast pre-filter: <10ms β
β β’ Publishes: NATS opportunity.* β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β PLANNER (Rust - production, TypeScript - prototyping) β
β β’ Deep validation + RPC simulation: 50-100ms β
β β’ Publishes: NATS execution.planned.* β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β EXECUTOR (Rust - production, TypeScript - prototyping) β
β β’ Transaction submission: <20ms β
β β’ Publishes: NATS executed.* β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
PROFIT IN WALLET
TOTAL: <200ms (quote β profit) β
Why Quote Service is Critical:
- Itβs the first step in the pipeline
- If quotes are slow (>10ms), the entire 200ms budget is blown
- If quotes are stale (>2s), arbitrage opportunities are missed
- If quotes are inaccurate, false positives waste execution budget
System Context
Position in Overall HFT Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β DATA LAYER (Solana Blockchain) β
β β’ Pool Discovery Service β Redis (pool state cache) β
β β’ RPC Proxy β Connection pooling (CLMM tick arrays) β
β β’ Oracle Feeds β Pyth/Jupiter (price validation) β
β β’ External APIs β Jupiter/Dflow/OKX (market aggregation) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β QUOTE SERVICE β CRITICAL PATH β
β β’ Local quotes: <5ms (on-chain pool math, dual cache) β
β β’ External quotes: 200-500ms (API aggregation) β
β β’ Aggregated quotes: <10ms (best selection) β
β β’ Parallel paired quotes: <10ms (simultaneous calculation) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β TRADING PIPELINE (Scanner β Planner β Executor) β
β β’ Scanner: Fast pre-filter (<10ms) β
β β’ Planner: Deep validation (50-100ms) β
β β’ Executor: Transaction submission (<20ms) β
β β’ TARGET: <200ms total execution β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Integration Points
Upstream Dependencies:
- Pool Discovery Service: Provides pool state updates via Redis
- RPC Proxy: Fetches CLMM tick arrays for concentrated liquidity pools
- Oracle Feeds: Validates quote accuracy (Pyth, Jupiter)
- External APIs: Jupiter, Dflow, OKX, dark pools (HumidiFi, GoonFi, ZeroFi)
Downstream Consumers:
- Scanner Service (Rust production, TypeScript prototyping): Consumes quotes via shared memory IPC (Rust) or gRPC (TypeScript), detects arbitrage
- Planner Service (Rust production, TypeScript prototyping): Validates opportunities with fresh quotes
- Executor Service (Rust production, TypeScript prototyping): Uses quotes for final profit calculation
- Dashboard (Next.js): Real-time quote monitoring and visualization
Architecture Evolution
Historical Context: 10+ Iterations
The quote service has undergone extensive evolution to solve paired quote timing issues:
| Iteration | Approach | Result | Key Learning |
|---|---|---|---|
| 1. Auto-reverse pairs | Server adds reverse pairs | β Working | Need both forward + reverse |
| 2. Paired optimization | Use same amount | β οΈ Partial | 3.6s delay still too high |
| 3. Fresh calculation | Bypass cache | β Worse | 8.6s delay (RPC bottleneck) |
| 4. Server timestamp | Fix measurement | β Accurate | 5s actual delay confirmed |
| 5. Cache + fresh time | Fast cache + accurate timestamp | β Fast | ~100ms delay |
| 6. Liquidity cache | Fix pool scanning | β Working | Performance stable |
| 7. Performance fixes | Cache key + async metrics | β Fast | Sub-5ms cache hits |
| 8. Batch refresh | Periodic batch every 2s | β οΈ Blocking | 100ms delay persists |
| 9. Panic fix | Safe channel sends | β Stable | No more crashes |
| 10. Continuous streaming | Worker per pair | β Current | 68-100ms delay |
| 11. Parallel paired | Simultaneous calculation | π― Target | <10ms delay |
Root Cause Identified (Doc 22):
Sequential calculation creates 50-100ms gap where pool state changes:
T=0ms: Start forward calculation
T=50ms: Forward complete β Stream
T=50ms: Start reverse calculation β¬
οΈ Pool state may have changed!
T=100ms: Reverse complete β Stream
Result: Different slots, inconsistent pool state, invalid arbitrage
Solution: Parallel Paired Quote Calculation (current architecture)
Why Microservices? (Evolution from Monolith)
Monolithic Quote-Service Problems:
- β RPC failures impacted external API quotes
- β External API rate limits slowed local quotes
- β Single scaling strategy (canβt optimize local vs external independently)
- β Single deployment (canβt update providers without affecting pool math)
- β Mixed concerns (AMM math + API integration in same codebase)
3-Service Architecture Benefits:
- β Failure Isolation: RPC down β only local quotes affected, external continues
- β Independent Scaling: Local (vertical: CPU/memory), External (horizontal: 5+ instances)
- β Independent Deployment: Update API providers without touching pool math
- β Team Independence: Protocol experts (local) vs API engineers (external)
- β Technology Flexibility: Can rewrite services independently
- β <10% Code Overlap: Clear separation of concerns
Current Architecture: 3-Microservice Design
Service Boundaries
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β CLIENTS β
β β’ TypeScript (Scanner, Planner, Executor) β
β β’ Rust Services (future) β
β β’ Dashboard (Next.js) β
β β’ Analytics Tools β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
gRPC (Port 50051)
Load Balanced (NGINX/K8s)
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β QUOTE AGGREGATOR SERVICE (Client-Facing) β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β Responsibilities: β
β β’ Client gRPC API (StreamQuotes, GetBestQuote) β
β β’ Fan-out to local + external services (PARALLEL) β
β β’ Merge quote streams in real-time β
β β’ Deduplicate routes (same pool = same route) β
β β’ Select best quote (price, confidence, liquidity) β
β β’ Compare local vs external (price diff, recommendation) β
β β’ Client load balancing across 10 instances β
β β
β Ports: gRPC (50051), HTTP (8081), Metrics (9092) β
β Technology: Go β
β Scaling: Horizontal (stateless, 10 instances) β
β Latency: 2-3ms overhead (gRPC serialization) β
β Resources: 1 CPU, 1-2 GB memory per instance β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β gRPC (50052) β gRPC (50053)
β HTTP (8082) β HTTP (8083)
ββββββββββββββββββββββββββ βββββββββββββββββββββββββββββββ
β LOCAL QUOTE SERVICE β β EXTERNAL QUOTE SERVICE β
β ββββββββββββββββββββ β β βββββββββββββββββββββββββ β
β Responsibilities: β β Responsibilities: β
β β’ Pool state cache β β β’ API client pool β
β β’ Background refresh β β β’ Per-provider rate limit β
β β’ AMM/CLMM/DLMM math β β β’ Provider rotation β
β β’ Quote response cache β β β’ Circuit breakers β
β β’ Parallel paired calc β β β’ Response cache (10s TTL) β
β β’ Oracle validation β β β’ Multi-region support β
β β’ RPC coordination β β β’ API key management β
β β β β
β Ports: gRPC (50052), β β Ports: gRPC (50053), β
β HTTP (8082), β β HTTP (8083), β
β Metrics (9090) β β Metrics (9091) β
β Technology: Go β β Technology: Go β
β Scaling: Vertical β β Scaling: Horizontal β
β Instances: 1-2 β β Instances: 5 β
β Latency: <5ms β β Latency: 200-500ms β
β Resources: 2-4 CPU, β β Resources: 1-2 CPU, β
β 8-16 GB β β 2-4 GB β
ββββββββββββββββββββββββββ βββββββββββββββββββββββββββββββ
β β
ββββββββββββββββββββββββββ βββββββββββββββββββββββββββββββ
β Infrastructure β β External APIs β
β β’ Pool Discovery Redis β β β’ Jupiter (1 RPS shared) β
β β’ RPC Proxy β β β’ Jupiter Ultra (1 RPS) β
β β’ Oracle Feeds (Pyth) β β β’ Dflow (5 RPS) β
β β’ NATS (optional) β β β’ OKX (10 RPS) β
ββββββββββββββββββββββββββ β β’ QuickNode (10 RPS) β
β β’ BLXR (5 RPS) β
β β’ HumidiFi (dark pool) β
β β’ GoonFi (dark pool) β
β β’ ZeroFi (dark pool) β
βββββββββββββββββββββββββββββββ
Communication Protocols
gRPC Streaming (Primary):
service QuoteAggregatorService {
// Get single best quote (one-time request)
rpc GetBestQuote(AggregatedQuoteRequest) returns (AggregatedQuote);
// Stream continuous quotes (real-time updates)
rpc StreamQuotes(AggregatedQuoteRequest) returns (stream AggregatedQuote);
// Health check
rpc Health(HealthRequest) returns (HealthResponse);
}
NATS Messaging (Required):
- Quote-Service publishes to
market.swap_route.*for HFT pipeline integration - Scanner consumes from NATS for real-time opportunity detection
- Critical Path: Required for Scanner β Planner β Executor pipeline
- Latency: 2-5ms (acceptable for pipeline events)
Proto Definitions:
message AggregatedQuote {
optional LocalQuote best_local = 1;
optional ExternalQuote best_external = 2;
QuoteSource best_source = 3; // LOCAL or EXTERNAL
QuoteComparison comparison = 4;
repeated LocalQuote all_local_quotes = 5;
repeated ExternalQuote all_external_quotes = 6;
int64 aggregation_time_ms = 7;
int64 timestamp = 8;
}
message QuoteComparison {
double price_diff_percent = 1; // (local - external) / external * 100
double spread_bps = 2; // Bid-ask spread
int64 local_latency_ms = 3;
int64 external_latency_ms = 4;
string recommendation = 5; // "local", "external", "equal"
double confidence_score = 6; // 0.0-1.0
}
Shared Memory IPC (Rust Production Scanners)
Purpose: Ultra-low-latency quote access (<1ΞΌs) for production Rust scanners with intelligent change notification.
Architecture (Dual Shared Memory with Hybrid Change Detection):
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β QUOTE SERVICE (Go - Writer) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β 1. Calculate quotes from TWO sources: β
β β’ Local (on-chain pool math, <5ms) β
β β’ External (API aggregation, 200-500ms) β
β 2. Write to TWO memory-mapped files (lock-free) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββ
β SHMEM #1 (Internal) β SHMEM #2 (External) β
β quotes-internal.mmap β quotes-external.mmap β
ββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββ€
β β’ Local pool quotes β β’ External API quotes β
β β’ Sub-second freshness β β’ 10s+ refresh interval β
β β’ On-chain accurate β β’ Better routing β
β β’ Fixed-size metadata β β’ Fixed-size metadata β
ββββββββββββββββββββββββββββ΄βββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β RUST SCANNER (Readers - Production) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β 1. Read from BOTH shared memory (<1ΞΌs each) β
β 2. Intelligent selection: β
β β’ Strategy A: Use internal (fresher, on-chain) β
β β’ Strategy B: Use external (better multi-hop) β
β β’ Strategy C: Compare both, pick best β
β 3. Detect arbitrage (<10ΞΌs vs 500ΞΌs-2ms with gRPC) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Memory Layout (Fixed-Size Metadata):
// 128-byte aligned quote entry
#[repr(C, align(128))]
struct QuoteMetadata {
version: AtomicU64, // Versioning for consistency
pair_id: [u8; 32], // BLAKE3(token_in, token_out, amount)
input_mint: [u8; 32], // Solana public key (32 bytes)
output_mint: [u8; 32], // Solana public key (32 bytes)
input_amount: u64, // Lamports
output_amount: u64, // Lamports
price_impact_bps: u32, // Basis points (0.01%)
timestamp_unix_ms: u64, // Unix timestamp (milliseconds)
route_id: [u8; 32], // BLAKE3(route_steps) -> lookup in Redis/PG
_padding: [u8; 24], // Padding to 128 bytes
}
// Memory-mapped file: Array of QuoteMetadata
// File size: 128 bytes Γ 1000 pairs = 128 KB (fits in L2 cache)
Performance Characteristics:
Operation Latency Improvement vs gRPC
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Read quote metadata <1ΞΌs 100-500x faster
Arbitrage detection <10ΞΌs 50-200x faster
Two-hop check <2ΞΌs 50-250x faster
Cache hit rate >99% Memory-resident
Key Features:
- Lock-Free Reads: Atomic versioning ensures consistency
- Dual Sources: Internal (fresh) + External (better routing)
- Zero-Copy: Direct memory access (no serialization)
- OS-Managed: Kernel handles memory-mapped file sync
- Single Writer: Quote service (Go) ONLY, eliminates write conflicts
- Multiple Readers: Unlimited concurrent Rust scanners
Route Storage (Dynamic Data in PostgreSQL + Redis):
-- PostgreSQL: Persistent route storage
CREATE TABLE route_steps (
route_id BYTEA PRIMARY KEY, -- BLAKE3 hash (32 bytes)
route_steps JSONB NOT NULL, -- RouteStep array
hop_count SMALLINT NOT NULL,
created_at TIMESTAMPTZ NOT NULL,
last_used_at TIMESTAMPTZ NOT NULL
);
-- Redis: Hot route cache (30s TTL)
SET route:{route_id_hex} '{route_steps_json}' EX 30
Rust Scanner Example:
use memmap2::MmapOptions;
use std::fs::File;
// Open shared memory regions
let file_internal = File::open("/var/quote-service/quotes-internal.mmap")?;
let mmap_internal = unsafe { MmapOptions::new().map(&file_internal)? };
let file_external = File::open("/var/quote-service/quotes-external.mmap")?;
let mmap_external = unsafe { MmapOptions::new().map(&file_external)? };
// Read quote metadata (<1ΞΌs)
let quotes_internal: &[QuoteMetadata] = unsafe {
std::slice::from_raw_parts(
mmap_internal.as_ptr() as *const QuoteMetadata,
1000, // Max 1000 pairs
)
};
let quotes_external: &[QuoteMetadata] = unsafe {
std::slice::from_raw_parts(
mmap_external.as_ptr() as *const QuoteMetadata,
1000,
)
};
// Fast arbitrage detection
for quote_internal in quotes_internal {
if quote_internal.version.load(Ordering::Acquire) == 0 {
continue; // Empty slot
}
// Find matching external quote
for quote_external in quotes_external {
if quote_external.pair_id == quote_internal.pair_id {
// Compare prices (<10ΞΌs total)
let price_diff_bps = calculate_price_diff(
quote_internal.output_amount,
quote_external.output_amount,
);
if price_diff_bps > ARBITRAGE_THRESHOLD {
// Fetch route from Redis/PostgreSQL
let route_internal = fetch_route("e_internal.route_id)?;
let route_external = fetch_route("e_external.route_id)?;
// Publish opportunity to NATS
publish_arbitrage_opportunity(route_internal, route_external)?;
}
}
}
}
Why Dual Regions? (Internal vs External):
- Different Freshness: Internal (sub-second) vs External (10s+)
- Different Sources: On-chain pool math vs 3rd-party APIs
- Different Use Cases: Internal for accuracy, External for better routing
- Cross-Validation: Detect price manipulation by comparing both
- Strategy Flexibility: Scanner chooses best source per opportunity
Implementation Roadmap (3 weeks total)
Phase 1: PostgreSQL Route Storage (Week 1 - 3 days):
- Create PostgreSQL schema (
route_stepstable with JSONB) - Implement Go route repository
- Integrate with quote service (RouteID calculation, storage)
- Add route cleanup job (7-day retention)
Phase 2: Redis Route Caching (Week 1 - 2 days):
- Implement Redis route cache
- Add cache warming on quote calculation
- Add cache metrics (hit/miss rate, latency)
- Target: >95% cache hit rate
Phase 3: Shared Memory Writer (Go) (Week 2 - 3 days):
- Implement
SharedMemoryWriterwith atomic write protocol - Memory-mapped file creation and initialization
- Integrate with
QuoteServiceImpl.GetPairedQuotes() - Test concurrent writes (10K writes/sec)
Phase 4: Shared Memory Reader (Rust) (Week 2 - 2 days):
- Implement
SharedMemoryReaderwith lock-free reads - Double-checked read with version validation
- PairID lookup in index with retry logic
- Target: 10M reads/sec
Phase 5: Route Fetcher (Rust) (Week 2 - 2 days):
- Implement
RouteFetcherwith Redis β PostgreSQL fallback - Redis fetch with deserialization (<1ms)
- PostgreSQL fetch with connection pooling (<10ms)
- Cache warming on PostgreSQL fetch
Phase 6: Scanner Integration (Week 3 - 3 days):
- Implement
ArbitrageDetectorin Rust - Integrate shared memory reader + route fetcher
- Arbitrage calculation (forward Γ reverse)
- Publish opportunities to NATS
- Target: <50ΞΌs detection latency
Phase 7: Production Testing (Week 3 - 2 days):
- Load test (1000 paired quotes/sec)
- Stress test (10K concurrent readers)
- Measure cache hit rates and end-to-end latency
- 24-hour soak test for stability
Implementation Status: π§ Future (Rust scanners not yet implemented)
Full Design Specification: Doc 31 content has been fully incorporated above. The original document has been archived at docs/archive/quote-service/rewrite/31-SHARED-MEMORY-QUOTE-ARCHITECTURE.md.
Service 1: Local Quote Service
Purpose
Provide fast, accurate on-chain quotes using:
- Dual cache architecture (pool state + quote response)
- Background refresh (AMM: 10s, CLMM: 30s)
- Parallel paired quote calculation (forward + reverse simultaneously)
- Oracle validation (Pyth price feed deviation alerts)
Dual Cache Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β LOCAL QUOTE SERVICE β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β LAYER 1: Pool State Cache (map[poolID]*PoolState) β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β β
β β type PoolState struct { β β
β β Pool *domain.Pool // Metadata + reserves β β
β β CLMMData *CLMMPoolData // Tick arrays (CLMM) β β
β β LastUpdate time.Time // Refresh timestamp β β
β β RefreshCount int // Counter β β
β β IsStale bool // Staleness flag β β
β β mu sync.RWMutex // Thread-safe β β
β β } β β
β β β β
β β Refresh Strategy: β β
β β β’ AMM/CPMM: Every 10s (Redis read, fast) β β
β β β’ CLMM: Every 30s (RPC fetch tick arrays, slow) β β
β β β’ Staleness check: Every 5s (mark pools >60s stale) β β
β β β’ Priority queue: On-demand refresh for stale pools β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β Pool Refresh β Invalidate Layer 2 β
β β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β LAYER 2: Quote Response Cache (map[key]*CachedQuote) β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β β
β β type CachedQuoteResponse struct { β β
β β InputMint string β β
β β OutputMint string β β
β β InputAmount uint64 β β
β β OutputAmount uint64 β β
β β PriceImpact float64 β β
β β PoolID string β β
β β CachedAt time.Time β β
β β ExpiresAt time.Time // 2s TTL β β
β β PoolStateAge time.Duration β β
β β OraclePrice float64 β β
β β Deviation float64 β β
β β } β β
β β β β
β β Cache Key: hash(inputMint, outputMint, amount) β β
β β Invalidation: β β
β β β’ Pool refresh β invalidate ALL quotes using pool β β
β β β’ TTL expiration (2s) β β
β β β’ Periodic eviction (every 5s) β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Why Dual Cache?
- Layer 1 (Pool State): Expensive to fetch (50-100ms RPC for CLMM), shared across all quotes
- Layer 2 (Quote Response): Fast to calculate (< 5ms pool math), specific to input/output/amount
- Pool-Aware Invalidation: Pool refresh β invalidate ALL Layer 2 quotes using that pool
- Different TTLs: Pool state (60s staleness), Quote response (2s TTL)
Parallel Paired Quote Calculation
Problem (Sequential calculation):
T=0ms: Start forward calculation (SOL β USDC)
T=50ms: Forward complete β Stream
T=50ms: Start reverse calculation (USDC β SOL) β¬
οΈ Pool changed!
T=100ms: Reverse complete β Stream
Result: 100ms delay, different slots, invalid arbitrage
Solution (Parallel calculation):
T=0ms: Launch forward goroutine ββ
T=0ms: Launch reverse goroutine ββ€ PARALLEL!
T=50ms: Both complete βββββββββββββ
T=51ms: Stream both quotes
Result: ~1ms delay, SAME slot, valid arbitrage
Implementation:
// calculatePairedQuotesParallel calculates forward + reverse simultaneously
func (s *GRPCQuoteServer) calculatePairedQuotesParallel(
ctx context.Context,
inputMint, outputMint, amount string,
timeoutMs int,
) *PairedQuoteResult {
ctxWithTimeout, cancel := context.WithTimeout(ctx,
time.Duration(timeoutMs) * time.Millisecond)
defer cancel()
// Result channels
forwardChan := make(chan *QuoteResult, 1)
reverseChan := make(chan *QuoteResult, 1)
startTime := time.Now()
// Launch forward goroutine
go func() {
quote, err := s.cache.GetOrCalculateQuoteFresh(
ctxWithTimeout, inputMint, outputMint, amount, ...)
forwardChan <- &QuoteResult{Quote: quote, Error: err}
}()
// Launch reverse goroutine (SIMULTANEOUSLY!)
go func() {
quote, err := s.cache.GetOrCalculateQuoteFresh(
ctxWithTimeout, outputMint, inputMint, amount, ...)
reverseChan <- &QuoteResult{Quote: quote, Error: err}
}()
// Wait for BOTH or timeout
result := &PairedQuoteResult{}
resultsReceived := 0
for resultsReceived < 2 {
select {
case fwd := <-forwardChan:
result.Forward = fwd
resultsReceived++
case rev := <-reverseChan:
result.Reverse = rev
resultsReceived++
case <-ctxWithTimeout.Done():
// Timeout: return partial results
goto done
}
}
done:
result.TotalMs = time.Since(startTime).Milliseconds()
return result
}
Key Features:
- Go Concurrency: Leverages goroutines + channels (like
Promise.allSettled) - Timeout Handling: 100ms max, returns partial results if one fails
- Same Pool State: Both quotes use identical pool snapshot
- Same Slot: Both calculated within ~1ms (vs 100ms sequential)
Quote Request Flow
1. Client Request
Scanner: GetQuote(SOL β USDC, 1 SOL)
β
2. Check Layer 2 (Quote Response Cache)
ββ Cache HIT β Return immediately (< 1ms)
ββ Cache MISS β Continue
β
3. Check Layer 1 (Pool State Cache)
ββ Pool fresh (<60s) β Calculate quote (<5ms)
β β Store in Layer 2
β β Return
ββ Pool stale (>60s) β Add to priority queue
β Calculate with stale (mark stale)
β Return with warning
β
4. Background Refresh (async, triggered by priority queue)
β’ Refresh pool state (Layer 1)
β’ Invalidate Layer 2 quotes for pool
β’ Next request gets fresh quote
Background Refresh System
Refresh Manager:
type RefreshManager struct {
poolCache *PoolStateCache
quoteResponseCache *QuoteResponseCache
refreshQueue chan RefreshCommand
// Refresh intervals
ammRefreshInterval time.Duration // 10s
clmmRefreshInterval time.Duration // 30s
// Worker pool
workerCount int // 5 workers
}
type RefreshCommand struct {
PoolID string
PoolType PoolType // AMM, CLMM, DLMM
Priority Priority // HIGH (on-demand), NORMAL (scheduled)
}
Refresh Strategies:
- Scheduled Refresh (Background):
Every 10s: Refresh all AMM/CPMM pools from Redis Every 30s: Refresh all CLMM pools from RPC (tick arrays) Every 5s: Check staleness, mark pools >60s as stale - On-Demand Refresh (Priority Queue):
Quote request for stale pool β Add HIGH priority to queue β Worker picks up immediately β Refresh pool state β Invalidate quote cache β Future quotes use fresh data - Pool-Aware Cache Invalidation:
Pool state refreshed β Find all Layer 2 quotes using this pool β Invalidate ALL matching quotes β Ensures consistency
Protocol Support
| Protocol | Type | Refresh Source | Interval | Latency | Program ID |
|---|---|---|---|---|---|
| Raydium AMM V4 | AMM | Redis | 10s | <1ms | 675kPX9M... |
| Raydium CPMM | AMM | Redis | 10s | <1ms | CPMMoo8L... |
| Raydium CLMM | CLMM | RPC (tick arrays) | 30s | 50-100ms | CAMMCzo5... |
| Meteora DLMM | DLMM | RPC (bins) | 30s | 50-100ms | LBUZKhRx... |
| Orca Whirlpools | CLMM | RPC (tick arrays) | 30s | 50-100ms | (future) |
| PumpSwap | AMM | Redis | 10s | <1ms | pAMMBay6... |
Oracle Integration
Purpose: Validate quote accuracy against external price feeds.
Supported Oracles:
- Pyth (primary): Real-time price feeds (<1s latency)
- Jupiter Oracle (secondary): TWAP prices
Usage:
// After calculating quote, validate against oracle
oraclePrice := s.oracleClient.GetPrice(tokenPair)
quotePrice := float64(outputAmount) / float64(inputAmount)
deviation := math.Abs(quotePrice - oraclePrice) / oraclePrice
if deviation > 0.01 { // >1% deviation
log.Warn("Quote deviates from oracle",
"pair", tokenPair,
"deviation", deviation*100,
"quotePrice", quotePrice,
"oraclePrice", oraclePrice)
// Publish metric for alerting
s.metrics.RecordOracleDeviation(tokenPair, deviation)
}
Configuration
Environment Variables:
# Service
LOCAL_QUOTE_SERVICE_PORT=50052
LOCAL_QUOTE_SERVICE_HTTP_PORT=8082
# Cache
POOL_CACHE_STALENESS_THRESHOLD=60s
QUOTE_CACHE_TTL=2s
QUOTE_CACHE_EVICTION_INTERVAL=5s
# Background Refresh
AMM_REFRESH_INTERVAL=10s
CLMM_REFRESH_INTERVAL=30s
REFRESH_WORKER_COUNT=5
# Parallel Paired Quotes
PARALLEL_QUOTE_TIMEOUT_MS=100
PARALLEL_QUOTE_ENABLED=true
# Dependencies
RPC_PROXY_URL=http://localhost:8083
POOL_DISCOVERY_REDIS_URL=redis://localhost:6379/0
ORACLE_PYTH_HTTP_URL=https://hermes.pyth.network
ORACLE_JUPITER_URL=https://price.jup.ag/v4
# Observability
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
METRICS_PORT=9090
LOG_LEVEL=info
Deployment Specifications
Resource Requirements:
- CPU: 2-4 cores (pool math computation)
- Memory: 8-16 GB
- Pool cache: ~5-10 MB (100 pools Γ 50KB)
- Quote cache: ~1-2 MB (1000 quotes Γ 200B)
- CLMM tick arrays: ~5 MB
- Go runtime: ~2-3 GB
- Disk: 1 GB (logs only, cache in memory)
- Network: 10 Mbps (RPC calls)
Scaling Strategy:
- Vertical: Increase CPU/memory for more pools
- Horizontal: 2 instances for HA (shared Redis cache)
- Pattern: Active-Active with shared state
Docker Image:
FROM golang:1.24-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN go build -o /bin/local-quote-service ./cmd/local-quote-service
FROM alpine:latest
RUN apk --no-cache add ca-certificates
COPY --from=builder /bin/local-quote-service /bin/
EXPOSE 50052 8082 9090
ENTRYPOINT ["/bin/local-quote-service"]
Service 2: External Quote Service
Purpose
Aggregate quotes from external APIs with:
- Rate limiting (per-provider token bucket)
- Circuit breakers (auto-disable failing providers)
- Provider rotation (health-based load balancing)
- Multi-region support (5+ instances globally)
Provider Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β EXTERNAL QUOTE SERVICE β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Provider Pool (9 providers) β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β β
β β JUPITER POOL (1 RPS shared): β β
β β β’ Jupiter Standard (0.16 req/s) ββ β β
β β β’ Jupiter Ultra (0.16 req/s) ββ Share 1 RPS β β
β β β’ HumidiFi (dark pool)(0.16 req/s) ββ€ (60 req/min) β β
β β β’ GoonFi (dark pool) (0.16 req/s) ββ€ β β
β β β’ ZeroFi (dark pool) (0.16 req/s) ββ β β
β β β β
β β INDEPENDENT PROVIDERS: β β
β β β’ Dflow (5.0 req/s) β β
β β β’ OKX (10.0 req/s) β β
β β β’ QuickNode (10.0 req/s) β β
β β β’ BLXR (5.0 req/s) β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Rate Limiters (Per-Provider Token Bucket) β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β β
β β type RateLimiter struct { β β
β β limiter *rate.Limiter // Token bucket β β
β β capacity float64 // Requests per second β β
β β tokens float64 // Available tokens β β
β β } β β
β β β β
β β Strategy: Wait for token before API call β β
β β Burst: Allow 1 request instantly, refill at rate β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Circuit Breakers (Per-Provider) β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β β
β β States: CLOSED β OPEN β HALF_OPEN β CLOSED β β
β β β β
β β β’ Failure threshold: 5 consecutive failures β β
β β β’ Timeout: 30s (OPEN β HALF_OPEN) β β
β β β’ Test requests: 1 in HALF_OPEN state β β
β β β’ Auto-recovery: Success β CLOSED β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Quote Cache (10s TTL, per-provider) β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β β
β β Cache Key: hash(provider, inputMint, outputMint, amt) β β
β β β β
β β Longer TTL than local (10s vs 2s): β β
β β β’ API calls are expensive (rate limited) β β
β β β’ External prices change slower β β
β β β’ Reduces API usage 10x β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Provider Interface
type QuoteProvider interface {
GetQuote(ctx context.Context, req *QuoteRequest) (*QuoteResponse, error)
GetName() string
GetRateLimit() float64 // Requests per second
IsEnabled() bool
GetHealth() *ProviderHealth
}
// Jupiter Provider Example
type JupiterProvider struct {
client *http.Client
apiKeys []string
currentKey int
rateLimiter *rate.Limiter // Shared 1 RPS
mode string // "standard" or "ultra"
baseURL string
}
func (p *JupiterProvider) GetQuote(
ctx context.Context,
req *QuoteRequest,
) (*QuoteResponse, error) {
// 1. Wait for rate limit token
if err := p.rateLimiter.Wait(ctx); err != nil {
return nil, fmt.Errorf("rate limit: %w", err)
}
// 2. Build URL
url := fmt.Sprintf("%s/quote?inputMint=%s&outputMint=%s&amount=%d",
p.baseURL, req.InputMint, req.OutputMint, req.Amount)
// 3. Add headers (Jupiter Ultra requires Origin + Referer)
headers := http.Header{}
if p.mode == "ultra" {
headers.Set("Origin", "https://jup.ag")
headers.Set("Referer", "https://jup.ag/")
}
// 4. Make API call
httpReq, _ := http.NewRequestWithContext(ctx, "GET", url, nil)
httpReq.Header = headers
resp, err := p.client.Do(httpReq)
if err != nil {
return nil, err
}
defer resp.Body.Close()
// 5. Parse response
return p.parseResponse(resp)
}
Rate Limiting Strategy
Jupiter API Pool (1 RPS total):
Total capacity: 60 requests/min
Allocation:
β’ Jupiter Oracle: 12 req/min (5s interval, always-on)
β’ Quote services: 48 req/min remaining
Per-quoter capacity (5 quoters sharing):
β’ 48 Γ· 5 = 9.6 req/min
β’ = 0.16 req/s per quoter
Quoters sharing 1 RPS:
1. Jupiter Standard
2. Jupiter Ultra
3. HumidiFi (dark pool)
4. GoonFi (dark pool)
5. ZeroFi (dark pool)
Capacity Planning:
With 0.16 req/s (9.6 req/min) per Jupiter-based quoter:
10s refresh interval = 6 calls/min per pair
Maximum pairs: ~1 pair with all 5 quoters enabled
Recommendations:
β’ HFT (1 pair): Jupiter + Jupiter Ultra + dark pools (best prices)
β’ Multi-pair (2-3): Jupiter only (higher capacity, fewer quoters)
Independent Providers (no sharing):
β’ Dflow: 5.0 req/s = 300 req/min
β’ OKX: 10.0 req/s = 600 req/min
β’ QuickNode: 10.0 req/s = 600 req/min
β’ BLXR: 5.0 req/s = 300 req/min
Circuit Breaker Logic
type CircuitBreaker struct {
state CircuitState // Closed, Open, HalfOpen
failureCount int
successCount int
failureThreshold int // 5 consecutive failures
timeout time.Duration // 30s (Open β HalfOpen)
lastFailure time.Time
lastSuccess time.Time
mu sync.Mutex
}
func (cb *CircuitBreaker) Call(fn func() error) error {
cb.mu.Lock()
// Check state
if cb.state == Open {
if time.Since(cb.lastFailure) > cb.timeout {
cb.state = HalfOpen // Try recovery
cb.successCount = 0
} else {
cb.mu.Unlock()
return ErrCircuitOpen
}
}
cb.mu.Unlock()
// Execute function
err := fn()
cb.mu.Lock()
defer cb.mu.Unlock()
if err != nil {
cb.failureCount++
cb.successCount = 0
cb.lastFailure = time.Now()
if cb.failureCount >= cb.failureThreshold {
cb.state = Open // Disable provider
log.Error("Circuit breaker opened",
"provider", cb.name,
"failures", cb.failureCount)
}
return err
}
// Success - reset or close circuit
cb.failureCount = 0
cb.successCount++
cb.lastSuccess = time.Now()
if cb.state == HalfOpen && cb.successCount >= 3 {
cb.state = Closed // Fully recovered
log.Info("Circuit breaker closed", "provider", cb.name)
}
return nil
}
State Transitions:
CLOSED (Normal operation)
β 5 consecutive failures
OPEN (Provider disabled)
β Wait 30s
HALF_OPEN (Testing recovery)
β 3 consecutive successes
CLOSED (Fully recovered)
Provider Rotation & Health
type ProviderRotator struct {
providers []ProviderConfig
currentIndex int
usageCounts map[string]int64
errorCounts map[string]int64
lastResetTime time.Time
mu sync.Mutex
}
func (r *ProviderRotator) SelectProvider() string {
r.mu.Lock()
defer r.mu.Unlock()
// 1. Filter healthy providers (circuit breaker closed)
healthy := r.getHealthyProviders()
if len(healthy) == 0 {
return "" // No providers available
}
// 2. Sort by error rate (ascending)
sort.Slice(healthy, func(i, j int) bool {
errRateI := float64(r.errorCounts[healthy[i]]) / float64(r.usageCounts[healthy[i]]+1)
errRateJ := float64(r.errorCounts[healthy[j]]) / float64(r.usageCounts[healthy[j]]+1)
return errRateI < errRateJ
})
// 3. Round-robin among top 3 providers
topProviders := healthy
if len(healthy) > 3 {
topProviders = healthy[:3]
}
selected := topProviders[r.currentIndex % len(topProviders)]
r.currentIndex++
r.usageCounts[selected]++
return selected
}
Configuration
Environment Variables:
# Service
EXTERNAL_QUOTE_SERVICE_PORT=50053
EXTERNAL_QUOTE_SERVICE_HTTP_PORT=8083
# Provider Enablement
QUOTERS_ENABLED=true
QUOTERS_USE_JUPITER=true
QUOTERS_USE_JUPITER_ULTRA=false
QUOTERS_USE_HUMIDIFI=false
QUOTERS_USE_GOONFI=false
QUOTERS_USE_ZEROFI=false
QUOTERS_USE_DFLOW=false
QUOTERS_USE_OKX=false
QUOTERS_USE_QUICKNODE=false
QUOTERS_USE_BLXR=false
# API Keys
OKX_API_KEY=your-api-key
OKX_SECRET_KEY=your-secret-key
OKX_PASSPHRASE=your-passphrase
BLXR_API_KEY=your-blxr-key
# Rate Limits (requests per second)
JUPITER_RATE_LIMIT=0.16 # Shared pool
DFLOW_RATE_LIMIT=5.0
OKX_RATE_LIMIT=10.0
QUICKNODE_RATE_LIMIT=10.0
BLXR_RATE_LIMIT=5.0
# Cache & Circuit Breaker
QUOTE_CACHE_TTL=10s
CIRCUIT_BREAKER_FAILURE_THRESHOLD=5
CIRCUIT_BREAKER_TIMEOUT=30s
CIRCUIT_BREAKER_HALFOPEN_SUCCESSES=3
# Observability
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
METRICS_PORT=9091
LOG_LEVEL=info
Deployment Specifications
Resource Requirements:
- CPU: 1-2 cores (HTTP I/O bound)
- Memory: 2-4 GB
- HTTP connections: ~500 MB
- Quote cache: ~1-2 MB
- Go runtime: ~1-2 GB
- Disk: 1 GB (logs only)
- Network: 50 Mbps (external API calls)
Scaling Strategy:
- Horizontal: 5+ instances (distribute API keys)
- Geographic: US East/West, EU, Asia (reduce API latency)
- Pattern: Stateless (optional Redis cache for sharing)
Docker Image:
FROM golang:1.24-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN go build -o /bin/external-quote-service ./cmd/external-quote-service
FROM alpine:latest
RUN apk --no-cache add ca-certificates
COPY --from=builder /bin/external-quote-service /bin/
EXPOSE 50053 8083 9091
ENTRYPOINT ["/bin/external-quote-service"]
Service 3: Quote Aggregator Service
Purpose
Client-facing unified interface that:
- Fans out to local + external services in parallel
- Merges quote streams in real-time
- Selects best quote (price, confidence, liquidity)
- Compares local vs external (price diff, recommendation)
- Handles partial failures gracefully
Aggregation Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β QUOTE AGGREGATOR SERVICE β
β β
β 1. Client Request β
β StreamQuotes(SOL β USDC, 1 SOL) β
β β
β 2. Fan-Out (Parallel goroutines) β
β ββ Local Quote Service (gRPC: 50052) β
β ββ External Quote Service (gRPC: 50053) β
β β
β 3. Collect Responses (merge streams) β
β Local stream: outputAmount = 102.5 USDC (5ms) β
β External stream: outputAmount = 103.0 USDC (250ms) β
β β
β 4. Deduplicate Routes β
β β’ Hash route plan (pool addresses) β
β β’ Same route from multiple sources β keep best price β
β β
β 5. Select Best Quote β
β Ranking: β
β 1. Output amount (higher = better) β
β 2. Price impact (lower = better) β
β 3. Confidence score (higher = better) β
β 4. Liquidity depth (higher = better) β
β β
β Best: External 103.0 USDC (0.5 USDC better) β
β β
β 6. Compare Local vs External β
β priceDiff = (103.0 - 102.5) / 102.5 = 0.49% β
β recommendation = "external" (>0.5% better) β
β confidence = 0.85 β
β β
β 7. Build Aggregated Response β
β AggregatedQuote { β
β bestLocal: 102.5 USDC, β
β bestExternal: 103.0 USDC, β
β bestSource: EXTERNAL, β
β comparison: { priceDiff: 0.49%, rec: "external" } β
β } β
β β
β 8. Stream to Client β
β gRPC streaming (continuous updates) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Fan-Out Implementation
func (s *QuoteAggregatorService) StreamQuotes(
req *AggregatedQuoteRequest,
stream QuoteAggregatorService_StreamQuotesServer,
) error {
ctx := stream.Context()
// Create channels for quote streams
localChan := make(chan *LocalQuote, 100)
externalChan := make(chan *ExternalQuote, 100)
errChan := make(chan error, 2)
// Fan-out to both services (PARALLEL)
go s.streamLocalQuotes(ctx, req, localChan, errChan)
go s.streamExternalQuotes(ctx, req, externalChan, errChan)
// Track best quotes
var bestLocal *LocalQuote
var bestExternal *ExternalQuote
// Merge streams in real-time
for {
select {
case local := <-localChan:
// Update best local quote
if bestLocal == nil || local.OutputAmount > bestLocal.OutputAmount {
bestLocal = local
}
// Send aggregated quote immediately
aggregated := s.buildAggregatedQuote(bestLocal, bestExternal)
if err := stream.Send(aggregated); err != nil {
return err
}
case external := <-externalChan:
// Update best external quote
if bestExternal == nil || external.OutputAmount > bestExternal.OutputAmount {
bestExternal = external
}
// Send aggregated quote immediately
aggregated := s.buildAggregatedQuote(bestLocal, bestExternal)
if err := stream.Send(aggregated); err != nil {
return err
}
case err := <-errChan:
// Log error, continue with partial results
log.Printf("Downstream service error: %v", err)
// Don't return - partial results are OK
case <-ctx.Done():
return ctx.Err()
}
}
}
Key Features:
- Non-blocking: Doesnβt wait for both services
- Partial Results: Returns local-only or external-only if one fails
- Real-time Updates: Streams quotes as they arrive
- Graceful Degradation: Logs errors, continues operation
Quote Comparison
type QuoteComparator struct {
oracleClient *OracleClient
}
func (c *QuoteComparator) Compare(
local *LocalQuote,
external *ExternalQuote,
) *QuoteComparison {
// 1. Price difference
priceDiff := (float64(external.OutputAmount) - float64(local.OutputAmount)) /
float64(local.OutputAmount) * 100
// 2. Spread (bid-ask)
spreadBps := int32(math.Abs(priceDiff) * 100)
// 3. Recommendation
recommendation := "local"
if priceDiff > 0.5 { // External > 0.5% better
recommendation = "external"
} else if math.Abs(priceDiff) < 0.1 { // Within 0.1%
recommendation = "equal"
}
// 4. Confidence score (based on oracle validation)
confidence := c.calculateConfidence(local, external)
return &QuoteComparison{
PriceDiffPercent: priceDiff,
SpreadBps: spreadBps,
LocalLatencyMs: local.CalculationTimeMs,
ExternalLatencyMs: external.ApiLatencyMs,
Recommendation: recommendation,
ConfidenceScore: confidence,
}
}
func (c *QuoteComparator) calculateConfidence(
local *LocalQuote,
external *ExternalQuote,
) float64 {
confidence := 0.5 // Base
// Factor 1: Oracle price validation
if local.OraclePrice > 0 && external.Provider != "" {
localDev := math.Abs(local.Deviation)
externalDev := math.Abs(external.Deviation)
if localDev < 0.01 && externalDev < 0.01 {
confidence += 0.3 // Both within 1% of oracle
} else if localDev < 0.05 && externalDev < 0.05 {
confidence += 0.1 // Both within 5% of oracle
}
}
// Factor 2: Liquidity depth
minLiquidity := math.Min(
local.LiquidityUsd,
external.LiquidityUsd,
)
if minLiquidity > 100000 {
confidence += 0.1 // >$100k liquidity
}
// Factor 3: Price impact
totalImpact := local.PriceImpact + external.PriceImpact
if totalImpact < 0.005 { // <0.5%
confidence += 0.1
}
return math.Min(confidence, 1.0)
}
Deduplication
type QuoteDeduplicator struct {
seenRoutes map[string]*Quote // routeHash -> best quote
mu sync.Mutex
}
func (d *QuoteDeduplicator) AddQuote(quote Quote) bool {
routeHash := d.hashRoute(quote.GetRoutePlan())
d.mu.Lock()
defer d.mu.Unlock()
existing, seen := d.seenRoutes[routeHash]
if !seen {
// New route
d.seenRoutes[routeHash] = "e
return true // Keep
}
// Duplicate route - compare prices
if quote.GetOutputAmount() > existing.GetOutputAmount() {
// Better price - replace
d.seenRoutes[routeHash] = "e
return true // Keep
}
return false // Discard (worse price)
}
func (d *QuoteDeduplicator) hashRoute(plan []SwapHop) string {
// Hash based on pool addresses (order matters)
var b strings.Builder
for _, hop := range plan {
b.WriteString(hop.AmmKey)
b.WriteString(":")
}
hash := sha256.Sum256([]byte(b.String()))
return hex.EncodeToString(hash[:])
}
Configuration
Environment Variables:
# Service
QUOTE_AGGREGATOR_SERVICE_PORT=50051
QUOTE_AGGREGATOR_SERVICE_HTTP_PORT=8081
# Downstream Services
LOCAL_QUOTE_SERVICE_URL=localhost:50052
EXTERNAL_QUOTE_SERVICE_URL=localhost:50053
# Connection Pool
GRPC_MAX_CONCURRENT_STREAMS=100
GRPC_KEEPALIVE_TIME=30s
GRPC_KEEPALIVE_TIMEOUT=10s
# Circuit Breaker (for downstream services)
CIRCUIT_BREAKER_FAILURE_THRESHOLD=10
CIRCUIT_BREAKER_TIMEOUT=30s
# Observability
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
METRICS_PORT=9092
LOG_LEVEL=info
Deployment Specifications
Resource Requirements:
- CPU: 1 core (lightweight aggregation)
- Memory: 1-2 GB
- gRPC connections: ~500 MB
- Deduplication map: ~10 MB
- Go runtime: ~500 MB
- Disk: 1 GB (logs only)
- Network: 10 Mbps (gRPC proxying)
Scaling Strategy:
- Horizontal: 10+ instances (stateless)
- Load Balancing: NGINX or Kubernetes HPA
- Auto-scaling: Based on client connection count
Docker Image:
FROM golang:1.24-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN go build -o /bin/quote-aggregator-service ./cmd/quote-aggregator-service
FROM alpine:latest
RUN apk --no-cache add ca-certificates
COPY --from=builder /bin/quote-aggregator-service /bin/
EXPOSE 50051 8081 9092
ENTRYPOINT ["/bin/quote-aggregator-service"]
Data Flow & Integration
Complete Quote Flow (End-to-End)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 1. CLIENT REQUEST β
β Scanner: StreamQuotes(SOL β USDC, 1 SOL) β
β β gRPC: localhost:50051 β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 2. AGGREGATOR (2-3ms overhead) β
β β’ Parse request β
β β’ Fan-out to local + external (PARALLEL) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
βββββββββββββββββββββββββββ βββββββββββββββββββββββββββββββ
β 3a. LOCAL (50052) β β 3b. EXTERNAL (50053) β
β β β β
β β’ Layer 2 cache check β β β’ Provider pool selection β
β ββ HIT: <1ms β β β’ Rate limiter (wait token) β
β ββ MISS: β β β’ Circuit breaker check β
β ββ Layer 1 (pool) β β β’ API call (Jupiter/etc) β
β β (<5ms) β β (200-500ms) β
β ββ Pool math β β β’ Store in cache (10s TTL) β
β β β β
β Latency: 2-5ms β β Latency: 250ms β
βββββββββββββββββββββββββββ βββββββββββββββββββββββββββββββ
β β
LocalQuote ExternalQuote
102.5 USDC 103.0 USDC
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 4. AGGREGATOR MERGE β
β β’ Deduplicate routes (hash-based) β
β β’ Select best (103.0 USDC from External) β
β β’ Compare local vs external (0.49% diff) β
β β’ Calculate confidence (0.85) β
β β’ Build AggregatedQuote β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 5. STREAM TO CLIENT β
β AggregatedQuote { β
β bestLocal: 102.5 USDC, β
β bestExternal: 103.0 USDC, β
β bestSource: EXTERNAL, β
β comparison: { priceDiff: 0.49%, rec: "external" }, β
β confidence: 0.85 β
β } β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Integration with HFT Pipeline
Scanner Integration (TypeScript):
import { QuoteAggregatorServiceClient } from '@/generated/quote';
const client = new QuoteAggregatorServiceClient('localhost:50051');
// Stream aggregated quotes
const stream = client.streamQuotes({
inputMint: SOL_MINT,
outputMint: USDC_MINT,
amount: 1_000_000_000, // 1 SOL
includeLocal: true,
includeExternal: true,
});
stream.on('data', (quote: AggregatedQuote) => {
// Fast pre-filter (<10ms)
const opportunities = detectArbitrage(quote);
// Publish to NATS: opportunity.two_hop.*
for (const opp of opportunities) {
await nats.publish('opportunity.two_hop.detected', opp);
}
});
Planner Integration (TypeScript):
// Planner validates with fresh quotes
const quote = await client.getBestQuote({
inputMint: opportunity.tokenStart,
outputMint: opportunity.tokenIntermediate,
amount: opportunity.amountIn,
});
// Deep validation (50-100ms)
const simulation = await simulateArbitrage(quote);
if (simulation.success && simulation.netProfit > 0.01) {
const plan = createExecutionPlan(quote, simulation);
await nats.publish('execution.planned', plan);
}
Error Handling & Graceful Degradation
Failure Scenarios:
| Failure | Local Service | External Service | Aggregator | Client Impact |
|---|---|---|---|---|
| Local service down | β Offline | β Running | β οΈ Partial | Gets external-only quotes |
| External service down | β Running | β Offline | β οΈ Partial | Gets local-only quotes |
| Specific provider down | N/A | β οΈ Circuit open | β OK | Other providers active |
| RPC timeout | β οΈ Stale quotes | N/A | β οΈ Partial | Stale local + fresh external |
| API rate limit | N/A | β οΈ Cache hits | β OK | Cached external quotes |
| Both services down | β Offline | β Offline | β Error | Client retries or fails |
Graceful Degradation Priority:
1. Local quotes (fastest, most critical for HFT)
2. External quotes (slower, better prices sometimes)
3. Cached quotes (stale but available)
4. Error (client decides: retry or skip)
Performance Characteristics
Latency Breakdown (Detailed)
| Operation | Target | P50 | P95 | P99 | Bottleneck |
|---|---|---|---|---|---|
| Local: Layer 2 hit | <1ms | 0.5ms | 0.8ms | 1.2ms | Memory access |
| Local: Layer 1 hit + calc | <5ms | 2ms | 4ms | 6ms | Pool math CPU |
| Local: Pool stale (RPC) | <150ms | 80ms | 120ms | 180ms | RPC tick arrays |
| Local: Parallel paired | <10ms | 5ms | 8ms | 12ms | 2Γ Layer 1 hit |
| External: Cache hit | <1ms | 0.5ms | 0.8ms | 1.2ms | Memory access |
| External: API call | 200-500ms | 250ms | 450ms | 600ms | API latency |
| Aggregator: Both cached | <5ms | 3ms | 5ms | 8ms | gRPC overhead |
| Aggregator: Local cached + External API | 200-500ms | 255ms | 455ms | 610ms | External API |
Throughput Capacity
| Service | Bottleneck | Single Instance | Multi-Instance | Notes |
|---|---|---|---|---|
| Local Quote | Pool math CPU | 2,000 req/s | 4,000 req/s (2Γ) | Quote cache hit rate >80% |
| External Quote | API rate limits | 10 req/s | 50 req/s (5Γ) | Jupiter: 1 RPS, OKX: 10 RPS |
| Aggregator | gRPC proxying | 1,000 req/s | 10,000 req/s (10Γ) | CPU: <50% at 1k req/s |
| Overall System | External APIs | 10-50 req/s | 50-250 req/s | Depends on provider mix |
Cache Performance
Target Hit Rates:
- Local Layer 2 (Quote Response): >80% hit rate
- Local Layer 1 (Pool State): >95% hit rate
- External Quote Cache: >70% hit rate
Actual Performance (production):
Local Layer 2: 85-90% hit rate β
Local Layer 1: 96-98% hit rate β
External Cache: 72-78% hit rate β
Cache Impact on Latency:
Without cache: 250ms avg (always fresh calculation)
With 80% hit: 50ms avg (0.8 Γ 1ms + 0.2 Γ 250ms)
Improvement: 5Γ faster
Resource Utilization (Production)
Memory Usage:
| Service | Per Instance | Total (All Instances) | Breakdown |
|---|---|---|---|
| Local | 10-12 GB | 20-24 GB (2Γ) | Pool: 5MB, Quote: 2MB, CLMM: 5MB, Go: 2GB |
| External | 3 GB | 15 GB (5Γ) | Cache: 2MB, HTTP: 500MB, Go: 1.5GB |
| Aggregator | 1.5 GB | 15 GB (10Γ) | gRPC: 500MB, Map: 10MB, Go: 500MB |
| Total | Β | 50-54 GB | Across 17 instances |
CPU Usage:
| Service | Per Instance | Total (All Instances) | Usage Pattern |
|---|---|---|---|
| Local | 2-3 cores | 4-6 cores (2Γ) | Burst: pool math, Idle: 10-20% |
| External | 1 core | 5 cores (5Γ) | Steady: 30-50% (I/O wait) |
| Aggregator | 0.5 core | 5 cores (10Γ) | Steady: 20-30% |
| Total | Β | 14-16 cores | Peak: 80%, Avg: 40-50% |
Network Bandwidth:
| Service | Ingress | Egress | Total |
|---|---|---|---|
| Local | 5 Mbps | 5 Mbps | 10 Mbps |
| External | 50 Mbps | 50 Mbps | 100 Mbps (API calls) |
| Aggregator | 10 Mbps | 10 Mbps | 20 Mbps |
| Total | 65 Mbps | 65 Mbps | 130 Mbps peak |
Deployment & Operations
Docker Compose Configuration
version: '3.8'
services:
# ============================================================
# QUOTE AGGREGATOR (Client-Facing)
# ============================================================
quote-aggregator:
image: quote-aggregator-service:latest
container_name: quote-aggregator
ports:
- "50051:50051" # gRPC (load balanced)
- "8081:8081" # HTTP metrics
- "9092:9092" # Prometheus
environment:
- GRPC_PORT=50051
- HTTP_PORT=8081
- METRICS_PORT=9092
- LOCAL_QUOTE_SERVICE_URL=local-quote:50052
- EXTERNAL_QUOTE_SERVICE_URL=external-quote:50053
- MAX_CONCURRENT_STREAMS=100
depends_on:
- local-quote
- external-quote
deploy:
replicas: 10
resources:
limits:
cpus: '1'
memory: 2G
restart: unless-stopped
networks:
- quote-network
# ============================================================
# LOCAL QUOTE SERVICE
# ============================================================
local-quote:
image: local-quote-service:latest
container_name: local-quote
ports:
- "50052:50052" # gRPC
- "8082:8082" # HTTP
- "9090:9090" # Prometheus
environment:
- GRPC_PORT=50052
- HTTP_PORT=8082
- METRICS_PORT=9090
- AMM_REFRESH_INTERVAL=10s
- CLMM_REFRESH_INTERVAL=30s
- POOL_CACHE_STALENESS_THRESHOLD=60s
- QUOTE_CACHE_TTL=2s
- PARALLEL_QUOTE_ENABLED=true
- PARALLEL_QUOTE_TIMEOUT_MS=100
- RPC_PROXY_URL=http://rpc-proxy:8083
- POOL_DISCOVERY_REDIS_URL=redis://redis:6379/0
- ORACLE_PYTH_HTTP_URL=https://hermes.pyth.network
- OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4318
depends_on:
- redis
- rpc-proxy
deploy:
resources:
limits:
cpus: '4'
memory: 16G
restart: unless-stopped
networks:
- quote-network
# ============================================================
# EXTERNAL QUOTE SERVICE
# ============================================================
external-quote:
image: external-quote-service:latest
ports:
- "50053-50057:50053" # gRPC (5 instances)
- "8083-8087:8083" # HTTP
- "9091-9095:9091" # Prometheus
environment:
- GRPC_PORT=50053
- HTTP_PORT=8083
- METRICS_PORT=9091
- QUOTERS_ENABLED=true
- QUOTERS_USE_JUPITER=true
- QUOTERS_USE_JUPITER_ULTRA=false
- QUOTERS_USE_DFLOW=false
- QUOTERS_USE_OKX=false
- JUPITER_RATE_LIMIT=0.16
- QUOTE_CACHE_TTL=10s
- CIRCUIT_BREAKER_FAILURE_THRESHOLD=5
- CIRCUIT_BREAKER_TIMEOUT=30s
- OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4318
env_file:
- ../../.env # API keys (OKX_API_KEY, BLXR_API_KEY)
deploy:
replicas: 5
resources:
limits:
cpus: '2'
memory: 4G
restart: unless-stopped
networks:
- quote-network
# ============================================================
# INFRASTRUCTURE
# ============================================================
redis:
image: redis:7-alpine
container_name: redis
ports:
- "6379:6379"
volumes:
- redis-data:/data
restart: unless-stopped
networks:
- quote-network
rpc-proxy:
image: solana-rpc-proxy:latest
container_name: rpc-proxy
ports:
- "8083:8083"
environment:
- HTTP_PORT=8083
- RPC_ENDPOINTS=${RPC_ENDPOINTS}
restart: unless-stopped
networks:
- quote-network
networks:
quote-network:
driver: bridge
volumes:
redis-data:
Kubernetes Deployment (Future)
Local Quote Service (StatefulSet):
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: local-quote
spec:
serviceName: local-quote
replicas: 2
selector:
matchLabels:
app: local-quote
template:
spec:
containers:
- name: local-quote
image: local-quote-service:v1
ports:
- containerPort: 50052
name: grpc
resources:
requests:
cpu: 2
memory: 8Gi
limits:
cpu: 4
memory: 16Gi
livenessProbe:
exec:
command: ["/bin/grpc_health_probe", "-addr=:50052"]
initialDelaySeconds: 10
periodSeconds: 10
Monitoring & Observability
Key Metrics (Prometheus):
Local Quote Service:
# Pool refresh
pool_refresh_duration_seconds{type="amm"}
pool_refresh_duration_seconds{type="clmm"}
pool_cache_size{type="amm"}
pool_cache_size{type="clmm"}
pool_stale_count
# Quote cache
quote_cache_hit_rate
quote_cache_size
quote_staleness_seconds_p95
# Parallel paired quotes
parallel_quote_duration_ms_p50
parallel_quote_duration_ms_p95
parallel_quote_success_rate
# Oracle
oracle_price_deviation_percent
External Quote Service:
# Provider health
provider_healthy{provider="Jupiter"}
provider_requests_total{provider="Jupiter"}
provider_errors_total{provider="Jupiter"}
provider_latency_seconds{provider="Jupiter"}
# Rate limiting
rate_limit_tokens_available{provider="Jupiter"}
rate_limit_requests_waiting{provider="Jupiter"}
# Circuit breaker
circuit_breaker_state{provider="Jupiter"} # 0=closed, 1=open, 2=halfopen
circuit_breaker_failures_total{provider="Jupiter"}
Aggregator Service:
# Requests
aggregator_requests_total
aggregator_latency_seconds_p50
aggregator_latency_seconds_p95
aggregator_latency_seconds_p99
aggregator_errors_total
# Quote sources
aggregator_best_source_total{source="local"}
aggregator_best_source_total{source="external"}
aggregator_price_diff_percent_p50
aggregator_price_diff_percent_p95
Grafana Dashboards
Dashboard 1: Quote Services Overview (Single Pane of Glass):
Row 1: Service Health
- [Local] [External] [Aggregator] status indicators
- Uptime, error rate, request rate
Row 2: Latency Distribution
- P50/P95/P99 latency (all services)
- Latency heatmap (last 24h)
Row 3: Cache Performance
- Local Layer 1 hit rate (target: >95%)
- Local Layer 2 hit rate (target: >80%)
- External cache hit rate (target: >70%)
Row 4: Quote Quality
- Price diff histogram (local vs external)
- Best source distribution (local % vs external %)
- Oracle deviation alerts
Dashboard 2: Local Quote Deep Dive:
Row 1: Pool Refresh Performance
- AMM refresh duration (target: <10ms)
- CLMM refresh duration (target: <100ms)
- Refresh queue depth
Row 2: Pool Cache
- Cache size (AMM vs CLMM)
- Staleness count
- Refresh rate (per second)
Row 3: Quote Cache
- Hit rate over time
- Cache size
- Eviction rate
Row 4: Parallel Paired Quotes
- Duration (both quotes) P50/P95/P99
- Success rate (both completed)
- Timeout rate
Alerts (Prometheus Alertmanager)
Critical (PagerDuty):
groups:
- name: quote_service_critical
rules:
- alert: QuoteServiceDown
expr: up{job="quote-service"} == 0
for: 1m
annotations:
summary: "Quote service is down"
- alert: HighErrorRate
expr: rate(errors_total[5m]) > 0.05
for: 5m
annotations:
summary: "Error rate >5% on "
- alert: HighLatencyP99
expr: histogram_quantile(0.99, latency_seconds) > 5
for: 5m
annotations:
summary: "P99 latency >5s on "
- alert: AllProvidersDown
expr: count(provider_healthy == 0) == count(provider_healthy)
for: 2m
annotations:
summary: "All external quote providers are down"
Warning (Slack):
- alert: LowCacheHitRate
expr: cache_hit_rate < 0.7
for: 10m
annotations:
summary: "Cache hit rate <70% on "
- alert: PoolStaleness
expr: pool_stale_count > 5
for: 5m
annotations:
summary: " pools are stale (>60s)"
- alert: CircuitBreakerOpen
expr: circuit_breaker_state > 0
for: 5m
annotations:
summary: "Circuit breaker open for "
- alert: OracleDeviation
expr: oracle_price_deviation_percent > 1.0
for: 2m
annotations:
summary: "Quote deviates >1% from oracle for "
Conclusion
Architecture Summary
The Quote Service v3.0 delivers sub-10ms quotes through:
β 3 specialized microservices (local, external, aggregator) β Dual cache architecture (pool state + quote response) β Parallel paired quotes (<10ms vs 100ms sequential) β Background refresh (AMM: 10s, CLMM: 30s) β Rate limiting & circuit breakers (external API protection) β Oracle validation (Pyth price feed deviation alerts) β gRPC streaming (continuous real-time updates) β Graceful degradation (partial results on failures)
Critical Path Performance
QUOTE-SERVICE β SCANNER β PLANNER β EXECUTOR β PROFIT
< 5ms < 10ms 50-100ms < 20ms
ββββββββββββββββββββββββββββββββββββββββββββββββββ
< 200ms TOTAL β
Quote Service is the foundation: If quotes are slow, stale, or inaccurate, the entire HFT system fails.
Key Innovations
- Parallel Paired Quotes (Doc 22):
- Sequential: 100ms delay, different slots β
- Parallel: <10ms delay, same slot β
- 10Γ faster, 100% slot consistency
- Dual Cache (Doc 33):
- Layer 1: Pool state (expensive to fetch)
- Layer 2: Quote response (cheap to calculate)
- Pool-aware invalidation
- Microservices (Doc 34):
- Failure isolation (RPC β API)
- Independent scaling (vertical vs horizontal)
- Independent deployment (3Γ release velocity)
- Shared Memory IPC (Doc 31):
- Three-tier storage: Shared memory β Redis β PostgreSQL
- Lock-free reads with atomic versioning (<1ΞΌs quote access)
- Dual shared memory regions (internal vs external quotes)
- 100-500x faster than gRPC for Rust production scanners
- Sub-10ΞΌs arbitrage detection (vs 500ΞΌs-2ms with gRPC)
Production Readiness
| Metric | Target | Actual | Status |
|---|---|---|---|
| Latency (cached) | <5ms | 2-3ms | β Exceeds |
| Latency (fresh) | <10ms | 5-8ms | β Meets |
| Throughput | 2,000 req/s | 2,500 req/s | β Exceeds |
| Cache Hit Rate | >80% | 85-90% | β Exceeds |
| Availability | 99.9% | 99.95% | β Exceeds |
| Parallel Paired | <10ms | 5-8ms | β Meets |
Archived Documentation
This document supersedes and consolidates:
- β Doc 22: Parallel Paired Quotes (integrated into Local Service)
- β Doc 24: Quote Streaming Design (integrated into Architecture)
- β Doc 26: Quote Service Rewrite (evolved into Microservices)
- β Doc 31: Shared Memory IPC Architecture (fully incorporated into Section 4.2)
- β Doc 32: Hybrid Quote Architecture (split into Local + External)
- β Doc 33: Background Pool Refresh (integrated into Local Service)
- β Doc 34: Microservices Architecture (current design)
Archive Location: docs/archive/quote-service/
Document Version: 3.0 Last Updated: December 31, 2025 Status: β Production-Ready - Canonical Reference Maintainer: Solution Architect Team
