Quote Service Architecture - Source of Truth

Date: December 31, 2025 Status: 🎯 Production Architecture - Canonical Reference Version: 3.1 (Enhanced: Batch Streaming + Dual Shared Memory + Parallel External Quotes) Supersedes: Docs 22, 24, 26, 31, 32, 33, 34 Last Review: December 31, 2025 (Solution Architect + HFT Expert Review)

Executive Summary
System Context
Architecture Evolution
Current Architecture: 3-Microservice Design
- 4.1. Service Boundaries
- 4.2. Communication Protocols
- 4.3. Shared Memory IPC (Rust Production Scanners)
Service 1: Local Quote Service
Service 2: External Quote Service
Service 3: Quote Aggregator Service
Data Flow & Integration
Performance Characteristics
Deployment & Operations

Executive Summary

What is the Quote Service?

The Quote Service is the engine core of the Solana HFT trading system. It sits at the beginning of the critical path, providing sub-10ms quote generation for arbitrage detection and execution.

Mission: Deliver fast, accurate, and reliable quotes that enable <200ms total execution time from opportunity detection to profit.

Current State: 3-Microservice Architecture

┌─────────────────────────────────────────────────────────────┐
│  CLIENTS (Scanner, Planner, Executor, Dashboard)            │
└─────────────────────────────────────────────────────────────┘
                           ↓ gRPC (50051)
┌─────────────────────────────────────────────────────────────┐
│  QUOTE AGGREGATOR (10 instances, stateless)                 │
│  • Unified client API                                       │
│  • Parallel fan-out to local + external                     │
│  • Best quote selection                                     │
│  • Price comparison & deduplication                         │
└─────────────────────────────────────────────────────────────┘
         ↓ gRPC (50052)                    ↓ gRPC (50053)
┌────────────────────────┐      ┌─────────────────────────────┐
│ LOCAL QUOTE SERVICE    │      │ EXTERNAL QUOTE SERVICE      │
│ (1-2 instances)        │      │ (5 instances)               │
│                        │      │                             │
│ • Parallel paired      │      │ • API aggregation           │
│   quote calculation    │      │ • Rate limiting (1 RPS)     │
│ • Dual cache (pool +   │      │ • Circuit breakers          │
│   quote response)      │      │ • Provider rotation         │
│ • Background refresh   │      │ • Multi-region support      │
│ • <5ms latency         │      │ • 200-500ms latency         │
└────────────────────────┘      └─────────────────────────────┘
         ↓                                  ↓
┌────────────────────────┐      ┌─────────────────────────────┐
│ Infrastructure         │      │ External APIs               │
│ • Pool Discovery Redis │      │ • Jupiter (1 RPS)           │
│ • RPC Proxy            │      │ • Dflow (5 RPS)             │
│ • Oracle Feeds (Pyth)  │      │ • OKX (10 RPS)              │
└────────────────────────┘      │ • HumidiFi, GoonFi, ZeroFi  │
                                └─────────────────────────────┘

Key Performance Metrics

Metric	Target	Actual	Status
Local Quote (cache hit)	<1ms	0.5-0.8ms	✅
Local Quote (fresh calc)	<5ms	2-4ms	✅
Parallel Paired Quotes	<10ms	5-8ms	✅
External Quote (cached)	<1ms	0.5ms	✅
External Quote (API)	200-500ms	250ms avg	✅
Aggregated Quote	<10ms	5-8ms	✅
Throughput (local)	2,000 req/s	2,500 req/s	✅
Cache Hit Rate	>80%	85-90%	✅
Availability	99.9%	99.95%	✅

Critical Path in HFT Pipeline

┌─────────────────────────────────────────────────────────────┐
│ QUOTE-SERVICE ⭐ THIS SYSTEM                                │
│ • Parallel paired quotes: <10ms                             │
│ • gRPC streaming to clients                                 │
└─────────────────────────────────────────────────────────────┘
                            ↓
┌─────────────────────────────────────────────────────────────┐
│ SCANNER (Rust - production, TypeScript - prototyping)       │
│ • Fast pre-filter: <10ms                                    │
│ • Publishes: NATS opportunity.*                             │
└─────────────────────────────────────────────────────────────┘
                            ↓
┌─────────────────────────────────────────────────────────────┐
│ PLANNER (Rust - production, TypeScript - prototyping)       │
│ • Deep validation + RPC simulation: 50-100ms                │
│ • Publishes: NATS execution.planned.*                       │
└─────────────────────────────────────────────────────────────┘
                            ↓
┌─────────────────────────────────────────────────────────────┐
│ EXECUTOR (Rust - production, TypeScript - prototyping)      │
│ • Transaction submission: <20ms                             │
│ • Publishes: NATS executed.*                                │
└─────────────────────────────────────────────────────────────┘
                            ↓
                    PROFIT IN WALLET

TOTAL: <200ms (quote → profit) ✅

Why Quote Service is Critical:

It’s the first step in the pipeline
If quotes are slow (>10ms), the entire 200ms budget is blown
If quotes are stale (>2s), arbitrage opportunities are missed
If quotes are inaccurate, false positives waste execution budget

System Context

Position in Overall HFT Architecture

┌─────────────────────────────────────────────────────────────┐
│ DATA LAYER (Solana Blockchain)                              │
│ • Pool Discovery Service → Redis (pool state cache)         │
│ • RPC Proxy → Connection pooling (CLMM tick arrays)         │
│ • Oracle Feeds → Pyth/Jupiter (price validation)            │
│ • External APIs → Jupiter/Dflow/OKX (market aggregation)    │
└─────────────────────────────────────────────────────────────┘
                            ↓
┌─────────────────────────────────────────────────────────────┐
│ QUOTE SERVICE ⭐ CRITICAL PATH                              │
│ • Local quotes: <5ms (on-chain pool math, dual cache)       │
│ • External quotes: 200-500ms (API aggregation)              │
│ • Aggregated quotes: <10ms (best selection)                 │
│ • Parallel paired quotes: <10ms (simultaneous calculation)  │
└─────────────────────────────────────────────────────────────┘
                            ↓
┌─────────────────────────────────────────────────────────────┐
│ TRADING PIPELINE (Scanner → Planner → Executor)             │
│ • Scanner: Fast pre-filter (<10ms)                          │
│ • Planner: Deep validation (50-100ms)                       │
│ • Executor: Transaction submission (<20ms)                  │
│ • TARGET: <200ms total execution                            │
└─────────────────────────────────────────────────────────────┘

Integration Points

Upstream Dependencies:

Pool Discovery Service: Provides pool state updates via Redis
RPC Proxy: Fetches CLMM tick arrays for concentrated liquidity pools
Oracle Feeds: Validates quote accuracy (Pyth, Jupiter)
External APIs: Jupiter, Dflow, OKX, dark pools (HumidiFi, GoonFi, ZeroFi)

Downstream Consumers:

Scanner Service (Rust production, TypeScript prototyping): Consumes quotes via shared memory IPC (Rust) or gRPC (TypeScript), detects arbitrage
Planner Service (Rust production, TypeScript prototyping): Validates opportunities with fresh quotes
Executor Service (Rust production, TypeScript prototyping): Uses quotes for final profit calculation
Dashboard (Next.js): Real-time quote monitoring and visualization

Architecture Evolution

Historical Context: 10+ Iterations

The quote service has undergone extensive evolution to solve paired quote timing issues:

Iteration	Approach	Result	Key Learning
1. Auto-reverse pairs	Server adds reverse pairs	✅ Working	Need both forward + reverse
2. Paired optimization	Use same amount	⚠️ Partial	3.6s delay still too high
3. Fresh calculation	Bypass cache	❌ Worse	8.6s delay (RPC bottleneck)
4. Server timestamp	Fix measurement	✅ Accurate	5s actual delay confirmed
5. Cache + fresh time	Fast cache + accurate timestamp	✅ Fast	~100ms delay
6. Liquidity cache	Fix pool scanning	✅ Working	Performance stable
7. Performance fixes	Cache key + async metrics	✅ Fast	Sub-5ms cache hits
8. Batch refresh	Periodic batch every 2s	⚠️ Blocking	100ms delay persists
9. Panic fix	Safe channel sends	✅ Stable	No more crashes
10. Continuous streaming	Worker per pair	✅ Current	68-100ms delay
11. Parallel paired	Simultaneous calculation	🎯 Target	<10ms delay

Root Cause Identified (Doc 22):

Sequential calculation creates 50-100ms gap where pool state changes:

T=0ms:    Start forward calculation
T=50ms:   Forward complete → Stream
T=50ms:   Start reverse calculation  ⬅️ Pool state may have changed!
T=100ms:  Reverse complete → Stream

Result: Different slots, inconsistent pool state, invalid arbitrage

Solution: Parallel Paired Quote Calculation (current architecture)

Why Microservices? (Evolution from Monolith)

Monolithic Quote-Service Problems:

❌ RPC failures impacted external API quotes
❌ External API rate limits slowed local quotes
❌ Single scaling strategy (can’t optimize local vs external independently)
❌ Single deployment (can’t update providers without affecting pool math)
❌ Mixed concerns (AMM math + API integration in same codebase)

3-Service Architecture Benefits:

✅ Failure Isolation: RPC down → only local quotes affected, external continues
✅ Independent Scaling: Local (vertical: CPU/memory), External (horizontal: 5+ instances)
✅ Independent Deployment: Update API providers without touching pool math
✅ Team Independence: Protocol experts (local) vs API engineers (external)
✅ Technology Flexibility: Can rewrite services independently
✅ <10% Code Overlap: Clear separation of concerns

Current Architecture: 3-Microservice Design

Service Boundaries

┌─────────────────────────────────────────────────────────────┐
│                    CLIENTS                                   │
│  • TypeScript (Scanner, Planner, Executor)                   │
│  • Rust Services (future)                                    │
│  • Dashboard (Next.js)                                       │
│  • Analytics Tools                                           │
└─────────────────────────────────────────────────────────────┘
                            ↓
                    gRPC (Port 50051)
                    Load Balanced (NGINX/K8s)
                            ↓
┌─────────────────────────────────────────────────────────────┐
│         QUOTE AGGREGATOR SERVICE (Client-Facing)             │
│  ─────────────────────────────────────────────────────────  │
│  Responsibilities:                                           │
│  • Client gRPC API (StreamQuotes, GetBestQuote)              │
│  • Fan-out to local + external services (PARALLEL)           │
│  • Merge quote streams in real-time                          │
│  • Deduplicate routes (same pool = same route)               │
│  • Select best quote (price, confidence, liquidity)          │
│  • Compare local vs external (price diff, recommendation)    │
│  • Client load balancing across 10 instances                 │
│                                                              │
│  Ports: gRPC (50051), HTTP (8081), Metrics (9092)            │
│  Technology: Go                                              │
│  Scaling: Horizontal (stateless, 10 instances)               │
│  Latency: 2-3ms overhead (gRPC serialization)                │
│  Resources: 1 CPU, 1-2 GB memory per instance                │
└─────────────────────────────────────────────────────────────┘
         ↓ gRPC (50052)                    ↓ gRPC (50053)
         ↓ HTTP (8082)                     ↓ HTTP (8083)
┌────────────────────────┐      ┌─────────────────────────────┐
│ LOCAL QUOTE SERVICE    │      │ EXTERNAL QUOTE SERVICE      │
│  ────────────────────  │      │  ─────────────────────────  │
│ Responsibilities:      │      │ Responsibilities:           │
│ • Pool state cache     │      │ • API client pool           │
│ • Background refresh   │      │ • Per-provider rate limit   │
│ • AMM/CLMM/DLMM math   │      │ • Provider rotation         │
│ • Quote response cache │      │ • Circuit breakers          │
│ • Parallel paired calc │      │ • Response cache (10s TTL)  │
│ • Oracle validation    │      │ • Multi-region support      │
│ • RPC coordination     │      │ • API key management        │
│                        │      │                             │
│ Ports: gRPC (50052),   │      │ Ports: gRPC (50053),        │
│        HTTP (8082),    │      │        HTTP (8083),         │
│        Metrics (9090)  │      │        Metrics (9091)       │
│ Technology: Go         │      │ Technology: Go              │
│ Scaling: Vertical      │      │ Scaling: Horizontal         │
│ Instances: 1-2         │      │ Instances: 5                │
│ Latency: <5ms          │      │ Latency: 200-500ms          │
│ Resources: 2-4 CPU,    │      │ Resources: 1-2 CPU,         │
│            8-16 GB     │      │            2-4 GB           │
└────────────────────────┘      └─────────────────────────────┘
         ↓                                  ↓
┌────────────────────────┐      ┌─────────────────────────────┐
│ Infrastructure         │      │ External APIs               │
│ • Pool Discovery Redis │      │ • Jupiter (1 RPS shared)    │
│ • RPC Proxy            │      │ • Jupiter Ultra (1 RPS)     │
│ • Oracle Feeds (Pyth)  │      │ • Dflow (5 RPS)             │
│ • NATS (optional)      │      │ • OKX (10 RPS)              │
└────────────────────────┘      │ • QuickNode (10 RPS)        │
                                │ • BLXR (5 RPS)              │
                                │ • HumidiFi (dark pool)      │
                                │ • GoonFi (dark pool)        │
                                │ • ZeroFi (dark pool)        │
                                └─────────────────────────────┘

Communication Protocols

gRPC Streaming (Primary):

service QuoteAggregatorService {
  // Get single best quote (one-time request)
  rpc GetBestQuote(AggregatedQuoteRequest) returns (AggregatedQuote);

  // Stream continuous quotes (real-time updates)
  rpc StreamQuotes(AggregatedQuoteRequest) returns (stream AggregatedQuote);

  // Health check
  rpc Health(HealthRequest) returns (HealthResponse);
}

NATS Messaging (Required):

Quote-Service publishes to market.swap_route.* for HFT pipeline integration
Scanner consumes from NATS for real-time opportunity detection
Critical Path: Required for Scanner → Planner → Executor pipeline
Latency: 2-5ms (acceptable for pipeline events)

Proto Definitions:

message AggregatedQuote {
  optional LocalQuote best_local = 1;
  optional ExternalQuote best_external = 2;
  QuoteSource best_source = 3;  // LOCAL or EXTERNAL
  QuoteComparison comparison = 4;
  repeated LocalQuote all_local_quotes = 5;
  repeated ExternalQuote all_external_quotes = 6;
  int64 aggregation_time_ms = 7;
  int64 timestamp = 8;
}

message QuoteComparison {
  double price_diff_percent = 1;     // (local - external) / external * 100
  double spread_bps = 2;              // Bid-ask spread
  int64 local_latency_ms = 3;
  int64 external_latency_ms = 4;
  string recommendation = 5;          // "local", "external", "equal"
  double confidence_score = 6;        // 0.0-1.0
}

Shared Memory IPC (Rust Production Scanners)

Purpose: Ultra-low-latency quote access (<1μs) for production Rust scanners with intelligent change notification.

Architecture (Dual Shared Memory with Hybrid Change Detection):

┌──────────────────────────────────────────────────────────────┐
│                  QUOTE SERVICE (Go - Writer)                  │
├──────────────────────────────────────────────────────────────┤
│  1. Calculate quotes from TWO sources:                        │
│     • Local (on-chain pool math, <5ms)                        │
│     • External (API aggregation, 200-500ms)                   │
│  2. Write to TWO memory-mapped files (lock-free)              │
└──────────────────────────────────────────────────────────────┘
                            ↓
        ┌──────────────────────────┬──────────────────────────┐
        │ SHMEM #1 (Internal)      │ SHMEM #2 (External)      │
        │ quotes-internal.mmap     │ quotes-external.mmap     │
        ├──────────────────────────┼──────────────────────────┤
        │ • Local pool quotes      │ • External API quotes    │
        │ • Sub-second freshness   │ • 10s+ refresh interval  │
        │ • On-chain accurate      │ • Better routing         │
        │ • Fixed-size metadata    │ • Fixed-size metadata    │
        └──────────────────────────┴──────────────────────────┘
                            ↓
┌──────────────────────────────────────────────────────────────┐
│              RUST SCANNER (Readers - Production)              │
├──────────────────────────────────────────────────────────────┤
│  1. Read from BOTH shared memory (<1μs each)                  │
│  2. Intelligent selection:                                    │
│     • Strategy A: Use internal (fresher, on-chain)            │
│     • Strategy B: Use external (better multi-hop)             │
│     • Strategy C: Compare both, pick best                     │
│  3. Detect arbitrage (<10μs vs 500μs-2ms with gRPC)           │
└──────────────────────────────────────────────────────────────┘

Memory Layout (Fixed-Size Metadata):

// 128-byte aligned quote entry
#[repr(C, align(128))]
struct QuoteMetadata {
    version: AtomicU64,           // Versioning for consistency
    pair_id: [u8; 32],            // BLAKE3(token_in, token_out, amount)
    input_mint: [u8; 32],         // Solana public key (32 bytes)
    output_mint: [u8; 32],        // Solana public key (32 bytes)
    input_amount: u64,            // Lamports
    output_amount: u64,           // Lamports
    price_impact_bps: u32,        // Basis points (0.01%)
    timestamp_unix_ms: u64,       // Unix timestamp (milliseconds)
    route_id: [u8; 32],           // BLAKE3(route_steps) -> lookup in Redis/PG
    _padding: [u8; 24],           // Padding to 128 bytes
}

// Memory-mapped file: Array of QuoteMetadata
// File size: 128 bytes × 1000 pairs = 128 KB (fits in L2 cache)

Performance Characteristics:

Operation                  Latency       Improvement vs gRPC
────────────────────────────────────────────────────────────
Read quote metadata        <1μs          100-500x faster
Arbitrage detection        <10μs         50-200x faster
Two-hop check              <2μs          50-250x faster
Cache hit rate             >99%          Memory-resident

Key Features:

Lock-Free Reads: Atomic versioning ensures consistency
Dual Sources: Internal (fresh) + External (better routing)
Zero-Copy: Direct memory access (no serialization)
OS-Managed: Kernel handles memory-mapped file sync
Single Writer: Quote service (Go) ONLY, eliminates write conflicts
Multiple Readers: Unlimited concurrent Rust scanners

Route Storage (Dynamic Data in PostgreSQL + Redis):

-- PostgreSQL: Persistent route storage
CREATE TABLE route_steps (
    route_id BYTEA PRIMARY KEY,           -- BLAKE3 hash (32 bytes)
    route_steps JSONB NOT NULL,           -- RouteStep array
    hop_count SMALLINT NOT NULL,
    created_at TIMESTAMPTZ NOT NULL,
    last_used_at TIMESTAMPTZ NOT NULL
);

-- Redis: Hot route cache (30s TTL)
SET route:{route_id_hex} '{route_steps_json}' EX 30

Rust Scanner Example:

use memmap2::MmapOptions;
use std::fs::File;

// Open shared memory regions
let file_internal = File::open("/var/quote-service/quotes-internal.mmap")?;
let mmap_internal = unsafe { MmapOptions::new().map(&file_internal)? };

let file_external = File::open("/var/quote-service/quotes-external.mmap")?;
let mmap_external = unsafe { MmapOptions::new().map(&file_external)? };

// Read quote metadata (<1μs)
let quotes_internal: &[QuoteMetadata] = unsafe {
    std::slice::from_raw_parts(
        mmap_internal.as_ptr() as *const QuoteMetadata,
        1000, // Max 1000 pairs
    )
};

let quotes_external: &[QuoteMetadata] = unsafe {
    std::slice::from_raw_parts(
        mmap_external.as_ptr() as *const QuoteMetadata,
        1000,
    )
};

// Fast arbitrage detection
for quote_internal in quotes_internal {
    if quote_internal.version.load(Ordering::Acquire) == 0 {
        continue; // Empty slot
    }

    // Find matching external quote
    for quote_external in quotes_external {
        if quote_external.pair_id == quote_internal.pair_id {
            // Compare prices (<10μs total)
            let price_diff_bps = calculate_price_diff(
                quote_internal.output_amount,
                quote_external.output_amount,
            );

            if price_diff_bps > ARBITRAGE_THRESHOLD {
                // Fetch route from Redis/PostgreSQL
                let route_internal = fetch_route(&quote_internal.route_id)?;
                let route_external = fetch_route(&quote_external.route_id)?;

                // Publish opportunity to NATS
                publish_arbitrage_opportunity(route_internal, route_external)?;
            }
        }
    }
}

Why Dual Regions? (Internal vs External):

Different Freshness: Internal (sub-second) vs External (10s+)
Different Sources: On-chain pool math vs 3rd-party APIs
Different Use Cases: Internal for accuracy, External for better routing
Cross-Validation: Detect price manipulation by comparing both
Strategy Flexibility: Scanner chooses best source per opportunity

Implementation Roadmap (3 weeks total)

Phase 1: PostgreSQL Route Storage (Week 1 - 3 days):

Create PostgreSQL schema (route_steps table with JSONB)
Implement Go route repository
Integrate with quote service (RouteID calculation, storage)
Add route cleanup job (7-day retention)

Phase 2: Redis Route Caching (Week 1 - 2 days):

Implement Redis route cache
Add cache warming on quote calculation
Add cache metrics (hit/miss rate, latency)
Target: >95% cache hit rate

Phase 3: Shared Memory Writer (Go) (Week 2 - 3 days):

Implement SharedMemoryWriter with atomic write protocol
Memory-mapped file creation and initialization
Integrate with QuoteServiceImpl.GetPairedQuotes()
Test concurrent writes (10K writes/sec)

Phase 4: Shared Memory Reader (Rust) (Week 2 - 2 days):

Implement SharedMemoryReader with lock-free reads
Double-checked read with version validation
PairID lookup in index with retry logic
Target: 10M reads/sec

Phase 5: Route Fetcher (Rust) (Week 2 - 2 days):

Implement RouteFetcher with Redis → PostgreSQL fallback
Redis fetch with deserialization (<1ms)
PostgreSQL fetch with connection pooling (<10ms)
Cache warming on PostgreSQL fetch

Phase 6: Scanner Integration (Week 3 - 3 days):

Implement ArbitrageDetector in Rust
Integrate shared memory reader + route fetcher
Arbitrage calculation (forward × reverse)
Publish opportunities to NATS
Target: <50μs detection latency

Phase 7: Production Testing (Week 3 - 2 days):

Load test (1000 paired quotes/sec)
Stress test (10K concurrent readers)
Measure cache hit rates and end-to-end latency
24-hour soak test for stability

Implementation Status: 🚧 Future (Rust scanners not yet implemented)

Full Design Specification: Doc 31 content has been fully incorporated above. The original document has been archived at docs/archive/quote-service/rewrite/31-SHARED-MEMORY-QUOTE-ARCHITECTURE.md.

Service 1: Local Quote Service

Purpose

Provide fast, accurate on-chain quotes using:

Dual cache architecture (pool state + quote response)
Background refresh (AMM: 10s, CLMM: 30s)
Parallel paired quote calculation (forward + reverse simultaneously)
Oracle validation (Pyth price feed deviation alerts)

Dual Cache Architecture

┌─────────────────────────────────────────────────────────────┐
│              LOCAL QUOTE SERVICE                             │
│                                                              │
│  ┌────────────────────────────────────────────────────────┐ │
│  │  LAYER 1: Pool State Cache (map[poolID]*PoolState)     │ │
│  │  ─────────────────────────────────────────────────────  │ │
│  │                                                         │ │
│  │  type PoolState struct {                               │ │
│  │    Pool         *domain.Pool    // Metadata + reserves │ │
│  │    CLMMData     *CLMMPoolData   // Tick arrays (CLMM)  │ │
│  │    LastUpdate   time.Time       // Refresh timestamp   │ │
│  │    RefreshCount int             // Counter             │ │
│  │    IsStale      bool            // Staleness flag      │ │
│  │    mu           sync.RWMutex    // Thread-safe         │ │
│  │  }                                                      │ │
│  │                                                         │ │
│  │  Refresh Strategy:                                      │ │
│  │  • AMM/CPMM: Every 10s (Redis read, fast)              │ │
│  │  • CLMM: Every 30s (RPC fetch tick arrays, slow)       │ │
│  │  • Staleness check: Every 5s (mark pools >60s stale)   │ │
│  │  • Priority queue: On-demand refresh for stale pools   │ │
│  └────────────────────────────────────────────────────────┘ │
│                          ↓                                   │
│           Pool Refresh → Invalidate Layer 2                  │
│                          ↓                                   │
│  ┌────────────────────────────────────────────────────────┐ │
│  │  LAYER 2: Quote Response Cache (map[key]*CachedQuote)  │ │
│  │  ─────────────────────────────────────────────────────  │ │
│  │                                                         │ │
│  │  type CachedQuoteResponse struct {                     │ │
│  │    InputMint     string                                │ │
│  │    OutputMint    string                                │ │
│  │    InputAmount   uint64                                │ │
│  │    OutputAmount  uint64                                │ │
│  │    PriceImpact   float64                               │ │
│  │    PoolID        string                                │ │
│  │    CachedAt      time.Time                             │ │
│  │    ExpiresAt     time.Time  // 2s TTL                  │ │
│  │    PoolStateAge  time.Duration                         │ │
│  │    OraclePrice   float64                               │ │
│  │    Deviation     float64                               │ │
│  │  }                                                      │ │
│  │                                                         │ │
│  │  Cache Key: hash(inputMint, outputMint, amount)        │ │
│  │  Invalidation:                                          │ │
│  │  • Pool refresh → invalidate ALL quotes using pool     │ │
│  │  • TTL expiration (2s)                                 │ │
│  │  • Periodic eviction (every 5s)                        │ │
│  └────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘

Why Dual Cache?

Layer 1 (Pool State): Expensive to fetch (50-100ms RPC for CLMM), shared across all quotes
Layer 2 (Quote Response): Fast to calculate (< 5ms pool math), specific to input/output/amount
Pool-Aware Invalidation: Pool refresh → invalidate ALL Layer 2 quotes using that pool
Different TTLs: Pool state (60s staleness), Quote response (2s TTL)

Parallel Paired Quote Calculation

Problem (Sequential calculation):

T=0ms:    Start forward calculation (SOL → USDC)
T=50ms:   Forward complete → Stream
T=50ms:   Start reverse calculation (USDC → SOL)  ⬅️ Pool changed!
T=100ms:  Reverse complete → Stream

Result: 100ms delay, different slots, invalid arbitrage

Solution (Parallel calculation):

T=0ms:    Launch forward goroutine ─┐
T=0ms:    Launch reverse goroutine ─┤ PARALLEL!
T=50ms:   Both complete ────────────┘
T=51ms:   Stream both quotes

Result: ~1ms delay, SAME slot, valid arbitrage

Implementation:

// calculatePairedQuotesParallel calculates forward + reverse simultaneously
func (s *GRPCQuoteServer) calculatePairedQuotesParallel(
    ctx context.Context,
    inputMint, outputMint, amount string,
    timeoutMs int,
) *PairedQuoteResult {
    ctxWithTimeout, cancel := context.WithTimeout(ctx,
        time.Duration(timeoutMs) * time.Millisecond)
    defer cancel()

    // Result channels
    forwardChan := make(chan *QuoteResult, 1)
    reverseChan := make(chan *QuoteResult, 1)

    startTime := time.Now()

    // Launch forward goroutine
    go func() {
        quote, err := s.cache.GetOrCalculateQuoteFresh(
            ctxWithTimeout, inputMint, outputMint, amount, ...)
        forwardChan <- &QuoteResult{Quote: quote, Error: err}
    }()

    // Launch reverse goroutine (SIMULTANEOUSLY!)
    go func() {
        quote, err := s.cache.GetOrCalculateQuoteFresh(
            ctxWithTimeout, outputMint, inputMint, amount, ...)
        reverseChan <- &QuoteResult{Quote: quote, Error: err}
    }()

    // Wait for BOTH or timeout
    result := &PairedQuoteResult{}
    resultsReceived := 0

    for resultsReceived < 2 {
        select {
        case fwd := <-forwardChan:
            result.Forward = fwd
            resultsReceived++
        case rev := <-reverseChan:
            result.Reverse = rev
            resultsReceived++
        case <-ctxWithTimeout.Done():
            // Timeout: return partial results
            goto done
        }
    }

done:
    result.TotalMs = time.Since(startTime).Milliseconds()
    return result
}

Key Features:

Go Concurrency: Leverages goroutines + channels (like Promise.allSettled)
Timeout Handling: 100ms max, returns partial results if one fails
Same Pool State: Both quotes use identical pool snapshot
Same Slot: Both calculated within ~1ms (vs 100ms sequential)

Quote Request Flow

1. Client Request
   Scanner: GetQuote(SOL → USDC, 1 SOL)
                ↓
2. Check Layer 2 (Quote Response Cache)
   ├─ Cache HIT → Return immediately (< 1ms)
   └─ Cache MISS → Continue
                ↓
3. Check Layer 1 (Pool State Cache)
   ├─ Pool fresh (<60s) → Calculate quote (<5ms)
   │                    → Store in Layer 2
   │                    → Return
   └─ Pool stale (>60s) → Add to priority queue
                        → Calculate with stale (mark stale)
                        → Return with warning
                ↓
4. Background Refresh (async, triggered by priority queue)
   • Refresh pool state (Layer 1)
   • Invalidate Layer 2 quotes for pool
   • Next request gets fresh quote

Background Refresh System

Refresh Manager:

type RefreshManager struct {
    poolCache          *PoolStateCache
    quoteResponseCache *QuoteResponseCache
    refreshQueue       chan RefreshCommand

    // Refresh intervals
    ammRefreshInterval  time.Duration  // 10s
    clmmRefreshInterval time.Duration  // 30s

    // Worker pool
    workerCount int  // 5 workers
}

type RefreshCommand struct {
    PoolID   string
    PoolType PoolType  // AMM, CLMM, DLMM
    Priority Priority  // HIGH (on-demand), NORMAL (scheduled)
}

Refresh Strategies:

Scheduled Refresh (Background):

Every 10s: Refresh all AMM/CPMM pools from Redis
Every 30s: Refresh all CLMM pools from RPC (tick arrays)
Every 5s:  Check staleness, mark pools >60s as stale

On-Demand Refresh (Priority Queue):

Quote request for stale pool
 → Add HIGH priority to queue
 → Worker picks up immediately
 → Refresh pool state
 → Invalidate quote cache
 → Future quotes use fresh data

Pool-Aware Cache Invalidation:

Pool state refreshed
 → Find all Layer 2 quotes using this pool
 → Invalidate ALL matching quotes
 → Ensures consistency

Protocol Support

Protocol	Type	Refresh Source	Interval	Latency	Program ID
Raydium AMM V4	AMM	Redis	10s	<1ms	`675kPX9M...`
Raydium CPMM	AMM	Redis	10s	<1ms	`CPMMoo8L...`
Raydium CLMM	CLMM	RPC (tick arrays)	30s	50-100ms	`CAMMCzo5...`
Meteora DLMM	DLMM	RPC (bins)	30s	50-100ms	`LBUZKhRx...`
Orca Whirlpools	CLMM	RPC (tick arrays)	30s	50-100ms	(future)
PumpSwap	AMM	Redis	10s	<1ms	`pAMMBay6...`

Oracle Integration

Purpose: Validate quote accuracy against external price feeds.

Supported Oracles:

Pyth (primary): Real-time price feeds (<1s latency)
Jupiter Oracle (secondary): TWAP prices

Usage:

// After calculating quote, validate against oracle
oraclePrice := s.oracleClient.GetPrice(tokenPair)
quotePrice := float64(outputAmount) / float64(inputAmount)
deviation := math.Abs(quotePrice - oraclePrice) / oraclePrice

if deviation > 0.01 {  // >1% deviation
    log.Warn("Quote deviates from oracle",
        "pair", tokenPair,
        "deviation", deviation*100,
        "quotePrice", quotePrice,
        "oraclePrice", oraclePrice)

    // Publish metric for alerting
    s.metrics.RecordOracleDeviation(tokenPair, deviation)
}

Configuration

Environment Variables:

# Service
LOCAL_QUOTE_SERVICE_PORT=50052
LOCAL_QUOTE_SERVICE_HTTP_PORT=8082

# Cache
POOL_CACHE_STALENESS_THRESHOLD=60s
QUOTE_CACHE_TTL=2s
QUOTE_CACHE_EVICTION_INTERVAL=5s

# Background Refresh
AMM_REFRESH_INTERVAL=10s
CLMM_REFRESH_INTERVAL=30s
REFRESH_WORKER_COUNT=5

# Parallel Paired Quotes
PARALLEL_QUOTE_TIMEOUT_MS=100
PARALLEL_QUOTE_ENABLED=true

# Dependencies
RPC_PROXY_URL=http://localhost:8083
POOL_DISCOVERY_REDIS_URL=redis://localhost:6379/0
ORACLE_PYTH_HTTP_URL=https://hermes.pyth.network
ORACLE_JUPITER_URL=https://price.jup.ag/v4

# Observability
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
METRICS_PORT=9090
LOG_LEVEL=info

Deployment Specifications

Resource Requirements:

CPU: 2-4 cores (pool math computation)
Memory: 8-16 GB
- Pool cache: ~5-10 MB (100 pools × 50KB)
- Quote cache: ~1-2 MB (1000 quotes × 200B)
- CLMM tick arrays: ~5 MB
- Go runtime: ~2-3 GB
Disk: 1 GB (logs only, cache in memory)
Network: 10 Mbps (RPC calls)

Scaling Strategy:

Vertical: Increase CPU/memory for more pools
Horizontal: 2 instances for HA (shared Redis cache)
Pattern: Active-Active with shared state

Docker Image:

FROM golang:1.24-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN go build -o /bin/local-quote-service ./cmd/local-quote-service

FROM alpine:latest
RUN apk --no-cache add ca-certificates
COPY --from=builder /bin/local-quote-service /bin/
EXPOSE 50052 8082 9090
ENTRYPOINT ["/bin/local-quote-service"]

Service 2: External Quote Service

Purpose

Aggregate quotes from external APIs with:

Rate limiting (per-provider token bucket)
Circuit breakers (auto-disable failing providers)
Provider rotation (health-based load balancing)
Multi-region support (5+ instances globally)

Provider Architecture

┌─────────────────────────────────────────────────────────────┐
│              EXTERNAL QUOTE SERVICE                          │
│                                                              │
│  ┌────────────────────────────────────────────────────────┐ │
│  │  Provider Pool (9 providers)                            │ │
│  │  ─────────────────────────────────────────────────────  │ │
│  │                                                         │ │
│  │  JUPITER POOL (1 RPS shared):                          │ │
│  │  • Jupiter Standard    (0.16 req/s)  ─┐                │ │
│  │  • Jupiter Ultra       (0.16 req/s)   ├─ Share 1 RPS   │ │
│  │  • HumidiFi (dark pool)(0.16 req/s)  ─┤   (60 req/min) │ │
│  │  • GoonFi (dark pool)  (0.16 req/s)  ─┤                │ │
│  │  • ZeroFi (dark pool)  (0.16 req/s)  ─┘                │ │
│  │                                                         │ │
│  │  INDEPENDENT PROVIDERS:                                 │ │
│  │  • Dflow               (5.0 req/s)                      │ │
│  │  • OKX                 (10.0 req/s)                     │ │
│  │  • QuickNode           (10.0 req/s)                     │ │
│  │  • BLXR                (5.0 req/s)                      │ │
│  └────────────────────────────────────────────────────────┘ │
│                          ↓                                   │
│  ┌────────────────────────────────────────────────────────┐ │
│  │  Rate Limiters (Per-Provider Token Bucket)             │ │
│  │  ─────────────────────────────────────────────────────  │ │
│  │                                                         │ │
│  │  type RateLimiter struct {                             │ │
│  │    limiter   *rate.Limiter  // Token bucket            │ │
│  │    capacity  float64        // Requests per second     │ │
│  │    tokens    float64        // Available tokens        │ │
│  │  }                                                      │ │
│  │                                                         │ │
│  │  Strategy: Wait for token before API call              │ │
│  │  Burst: Allow 1 request instantly, refill at rate      │ │
│  └────────────────────────────────────────────────────────┘ │
│                          ↓                                   │
│  ┌────────────────────────────────────────────────────────┐ │
│  │  Circuit Breakers (Per-Provider)                       │ │
│  │  ─────────────────────────────────────────────────────  │ │
│  │                                                         │ │
│  │  States: CLOSED → OPEN → HALF_OPEN → CLOSED            │ │
│  │                                                         │ │
│  │  • Failure threshold: 5 consecutive failures           │ │
│  │  • Timeout: 30s (OPEN → HALF_OPEN)                     │ │
│  │  • Test requests: 1 in HALF_OPEN state                 │ │
│  │  • Auto-recovery: Success → CLOSED                     │ │
│  └────────────────────────────────────────────────────────┘ │
│                          ↓                                   │
│  ┌────────────────────────────────────────────────────────┐ │
│  │  Quote Cache (10s TTL, per-provider)                   │ │
│  │  ─────────────────────────────────────────────────────  │ │
│  │                                                         │ │
│  │  Cache Key: hash(provider, inputMint, outputMint, amt) │ │
│  │                                                         │ │
│  │  Longer TTL than local (10s vs 2s):                    │ │
│  │  • API calls are expensive (rate limited)              │ │
│  │  • External prices change slower                       │ │
│  │  • Reduces API usage 10x                               │ │
│  └────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘

Provider Interface

type QuoteProvider interface {
    GetQuote(ctx context.Context, req *QuoteRequest) (*QuoteResponse, error)
    GetName() string
    GetRateLimit() float64  // Requests per second
    IsEnabled() bool
    GetHealth() *ProviderHealth
}

// Jupiter Provider Example
type JupiterProvider struct {
    client       *http.Client
    apiKeys      []string
    currentKey   int
    rateLimiter  *rate.Limiter  // Shared 1 RPS
    mode         string          // "standard" or "ultra"
    baseURL      string
}

func (p *JupiterProvider) GetQuote(
    ctx context.Context,
    req *QuoteRequest,
) (*QuoteResponse, error) {
    // 1. Wait for rate limit token
    if err := p.rateLimiter.Wait(ctx); err != nil {
        return nil, fmt.Errorf("rate limit: %w", err)
    }

    // 2. Build URL
    url := fmt.Sprintf("%s/quote?inputMint=%s&outputMint=%s&amount=%d",
        p.baseURL, req.InputMint, req.OutputMint, req.Amount)

    // 3. Add headers (Jupiter Ultra requires Origin + Referer)
    headers := http.Header{}
    if p.mode == "ultra" {
        headers.Set("Origin", "https://jup.ag")
        headers.Set("Referer", "https://jup.ag/")
    }

    // 4. Make API call
    httpReq, _ := http.NewRequestWithContext(ctx, "GET", url, nil)
    httpReq.Header = headers

    resp, err := p.client.Do(httpReq)
    if err != nil {
        return nil, err
    }
    defer resp.Body.Close()

    // 5. Parse response
    return p.parseResponse(resp)
}

Rate Limiting Strategy

Jupiter API Pool (1 RPS total):

Total capacity: 60 requests/min

Allocation:
• Jupiter Oracle: 12 req/min (5s interval, always-on)
• Quote services: 48 req/min remaining

Per-quoter capacity (5 quoters sharing):
• 48 ÷ 5 = 9.6 req/min
• = 0.16 req/s per quoter

Quoters sharing 1 RPS:
1. Jupiter Standard
2. Jupiter Ultra
3. HumidiFi (dark pool)
4. GoonFi (dark pool)
5. ZeroFi (dark pool)

Capacity Planning:

With 0.16 req/s (9.6 req/min) per Jupiter-based quoter:

10s refresh interval = 6 calls/min per pair
Maximum pairs: ~1 pair with all 5 quoters enabled

Recommendations:
• HFT (1 pair):     Jupiter + Jupiter Ultra + dark pools (best prices)
• Multi-pair (2-3): Jupiter only (higher capacity, fewer quoters)

Independent Providers (no sharing):

• Dflow:      5.0 req/s  = 300 req/min
• OKX:       10.0 req/s  = 600 req/min
• QuickNode: 10.0 req/s  = 600 req/min
• BLXR:       5.0 req/s  = 300 req/min

Circuit Breaker Logic

type CircuitBreaker struct {
    state            CircuitState  // Closed, Open, HalfOpen
    failureCount     int
    successCount     int
    failureThreshold int           // 5 consecutive failures
    timeout          time.Duration // 30s (Open → HalfOpen)
    lastFailure      time.Time
    lastSuccess      time.Time
    mu               sync.Mutex
}

func (cb *CircuitBreaker) Call(fn func() error) error {
    cb.mu.Lock()

    // Check state
    if cb.state == Open {
        if time.Since(cb.lastFailure) > cb.timeout {
            cb.state = HalfOpen  // Try recovery
            cb.successCount = 0
        } else {
            cb.mu.Unlock()
            return ErrCircuitOpen
        }
    }

    cb.mu.Unlock()

    // Execute function
    err := fn()

    cb.mu.Lock()
    defer cb.mu.Unlock()

    if err != nil {
        cb.failureCount++
        cb.successCount = 0
        cb.lastFailure = time.Now()

        if cb.failureCount >= cb.failureThreshold {
            cb.state = Open  // Disable provider
            log.Error("Circuit breaker opened",
                "provider", cb.name,
                "failures", cb.failureCount)
        }
        return err
    }

    // Success - reset or close circuit
    cb.failureCount = 0
    cb.successCount++
    cb.lastSuccess = time.Now()

    if cb.state == HalfOpen && cb.successCount >= 3 {
        cb.state = Closed  // Fully recovered
        log.Info("Circuit breaker closed", "provider", cb.name)
    }

    return nil
}

State Transitions:

CLOSED (Normal operation)
   ↓ 5 consecutive failures
OPEN (Provider disabled)
   ↓ Wait 30s
HALF_OPEN (Testing recovery)
   ↓ 3 consecutive successes
CLOSED (Fully recovered)

Provider Rotation & Health

type ProviderRotator struct {
    providers     []ProviderConfig
    currentIndex  int
    usageCounts   map[string]int64
    errorCounts   map[string]int64
    lastResetTime time.Time
    mu            sync.Mutex
}

func (r *ProviderRotator) SelectProvider() string {
    r.mu.Lock()
    defer r.mu.Unlock()

    // 1. Filter healthy providers (circuit breaker closed)
    healthy := r.getHealthyProviders()
    if len(healthy) == 0 {
        return ""  // No providers available
    }

    // 2. Sort by error rate (ascending)
    sort.Slice(healthy, func(i, j int) bool {
        errRateI := float64(r.errorCounts[healthy[i]]) / float64(r.usageCounts[healthy[i]]+1)
        errRateJ := float64(r.errorCounts[healthy[j]]) / float64(r.usageCounts[healthy[j]]+1)
        return errRateI < errRateJ
    })

    // 3. Round-robin among top 3 providers
    topProviders := healthy
    if len(healthy) > 3 {
        topProviders = healthy[:3]
    }

    selected := topProviders[r.currentIndex % len(topProviders)]
    r.currentIndex++
    r.usageCounts[selected]++

    return selected
}

Configuration

Environment Variables:

# Service
EXTERNAL_QUOTE_SERVICE_PORT=50053
EXTERNAL_QUOTE_SERVICE_HTTP_PORT=8083

# Provider Enablement
QUOTERS_ENABLED=true
QUOTERS_USE_JUPITER=true
QUOTERS_USE_JUPITER_ULTRA=false
QUOTERS_USE_HUMIDIFI=false
QUOTERS_USE_GOONFI=false
QUOTERS_USE_ZEROFI=false
QUOTERS_USE_DFLOW=false
QUOTERS_USE_OKX=false
QUOTERS_USE_QUICKNODE=false
QUOTERS_USE_BLXR=false

# API Keys
OKX_API_KEY=your-api-key
OKX_SECRET_KEY=your-secret-key
OKX_PASSPHRASE=your-passphrase
BLXR_API_KEY=your-blxr-key

# Rate Limits (requests per second)
JUPITER_RATE_LIMIT=0.16     # Shared pool
DFLOW_RATE_LIMIT=5.0
OKX_RATE_LIMIT=10.0
QUICKNODE_RATE_LIMIT=10.0
BLXR_RATE_LIMIT=5.0

# Cache & Circuit Breaker
QUOTE_CACHE_TTL=10s
CIRCUIT_BREAKER_FAILURE_THRESHOLD=5
CIRCUIT_BREAKER_TIMEOUT=30s
CIRCUIT_BREAKER_HALFOPEN_SUCCESSES=3

# Observability
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
METRICS_PORT=9091
LOG_LEVEL=info

Deployment Specifications

Resource Requirements:

CPU: 1-2 cores (HTTP I/O bound)
Memory: 2-4 GB
- HTTP connections: ~500 MB
- Quote cache: ~1-2 MB
- Go runtime: ~1-2 GB
Disk: 1 GB (logs only)
Network: 50 Mbps (external API calls)

Scaling Strategy:

Horizontal: 5+ instances (distribute API keys)
Geographic: US East/West, EU, Asia (reduce API latency)
Pattern: Stateless (optional Redis cache for sharing)

Docker Image:

FROM golang:1.24-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN go build -o /bin/external-quote-service ./cmd/external-quote-service

FROM alpine:latest
RUN apk --no-cache add ca-certificates
COPY --from=builder /bin/external-quote-service /bin/
EXPOSE 50053 8083 9091
ENTRYPOINT ["/bin/external-quote-service"]

Service 3: Quote Aggregator Service

Purpose

Client-facing unified interface that:

Fans out to local + external services in parallel
Merges quote streams in real-time
Selects best quote (price, confidence, liquidity)
Compares local vs external (price diff, recommendation)
Handles partial failures gracefully

Aggregation Architecture

┌─────────────────────────────────────────────────────────────┐
│                  QUOTE AGGREGATOR SERVICE                    │
│                                                              │
│  1. Client Request                                           │
│     StreamQuotes(SOL → USDC, 1 SOL)                         │
│                                                              │
│  2. Fan-Out (Parallel goroutines)                           │
│     ├─ Local Quote Service (gRPC: 50052)                    │
│     └─ External Quote Service (gRPC: 50053)                 │
│                                                              │
│  3. Collect Responses (merge streams)                        │
│     Local stream:    outputAmount = 102.5 USDC (5ms)        │
│     External stream: outputAmount = 103.0 USDC (250ms)      │
│                                                              │
│  4. Deduplicate Routes                                       │
│     • Hash route plan (pool addresses)                       │
│     • Same route from multiple sources → keep best price    │
│                                                              │
│  5. Select Best Quote                                        │
│     Ranking:                                                 │
│     1. Output amount (higher = better)                       │
│     2. Price impact (lower = better)                         │
│     3. Confidence score (higher = better)                    │
│     4. Liquidity depth (higher = better)                     │
│                                                              │
│     Best: External 103.0 USDC (0.5 USDC better)             │
│                                                              │
│  6. Compare Local vs External                                │
│     priceDiff = (103.0 - 102.5) / 102.5 = 0.49%             │
│     recommendation = "external" (>0.5% better)               │
│     confidence = 0.85                                        │
│                                                              │
│  7. Build Aggregated Response                                │
│     AggregatedQuote {                                        │
│       bestLocal: 102.5 USDC,                                │
│       bestExternal: 103.0 USDC,                             │
│       bestSource: EXTERNAL,                                  │
│       comparison: { priceDiff: 0.49%, rec: "external" }     │
│     }                                                        │
│                                                              │
│  8. Stream to Client                                         │
│     gRPC streaming (continuous updates)                      │
└─────────────────────────────────────────────────────────────┘

Fan-Out Implementation

func (s *QuoteAggregatorService) StreamQuotes(
    req *AggregatedQuoteRequest,
    stream QuoteAggregatorService_StreamQuotesServer,
) error {
    ctx := stream.Context()

    // Create channels for quote streams
    localChan := make(chan *LocalQuote, 100)
    externalChan := make(chan *ExternalQuote, 100)
    errChan := make(chan error, 2)

    // Fan-out to both services (PARALLEL)
    go s.streamLocalQuotes(ctx, req, localChan, errChan)
    go s.streamExternalQuotes(ctx, req, externalChan, errChan)

    // Track best quotes
    var bestLocal *LocalQuote
    var bestExternal *ExternalQuote

    // Merge streams in real-time
    for {
        select {
        case local := <-localChan:
            // Update best local quote
            if bestLocal == nil || local.OutputAmount > bestLocal.OutputAmount {
                bestLocal = local
            }

            // Send aggregated quote immediately
            aggregated := s.buildAggregatedQuote(bestLocal, bestExternal)
            if err := stream.Send(aggregated); err != nil {
                return err
            }

        case external := <-externalChan:
            // Update best external quote
            if bestExternal == nil || external.OutputAmount > bestExternal.OutputAmount {
                bestExternal = external
            }

            // Send aggregated quote immediately
            aggregated := s.buildAggregatedQuote(bestLocal, bestExternal)
            if err := stream.Send(aggregated); err != nil {
                return err
            }

        case err := <-errChan:
            // Log error, continue with partial results
            log.Printf("Downstream service error: %v", err)
            // Don't return - partial results are OK

        case <-ctx.Done():
            return ctx.Err()
        }
    }
}

Key Features:

Non-blocking: Doesn’t wait for both services
Partial Results: Returns local-only or external-only if one fails
Real-time Updates: Streams quotes as they arrive
Graceful Degradation: Logs errors, continues operation

Quote Comparison

type QuoteComparator struct {
    oracleClient *OracleClient
}

func (c *QuoteComparator) Compare(
    local *LocalQuote,
    external *ExternalQuote,
) *QuoteComparison {
    // 1. Price difference
    priceDiff := (float64(external.OutputAmount) - float64(local.OutputAmount)) /
                 float64(local.OutputAmount) * 100

    // 2. Spread (bid-ask)
    spreadBps := int32(math.Abs(priceDiff) * 100)

    // 3. Recommendation
    recommendation := "local"
    if priceDiff > 0.5 {  // External > 0.5% better
        recommendation = "external"
    } else if math.Abs(priceDiff) < 0.1 {  // Within 0.1%
        recommendation = "equal"
    }

    // 4. Confidence score (based on oracle validation)
    confidence := c.calculateConfidence(local, external)

    return &QuoteComparison{
        PriceDiffPercent:  priceDiff,
        SpreadBps:         spreadBps,
        LocalLatencyMs:    local.CalculationTimeMs,
        ExternalLatencyMs: external.ApiLatencyMs,
        Recommendation:    recommendation,
        ConfidenceScore:   confidence,
    }
}

func (c *QuoteComparator) calculateConfidence(
    local *LocalQuote,
    external *ExternalQuote,
) float64 {
    confidence := 0.5  // Base

    // Factor 1: Oracle price validation
    if local.OraclePrice > 0 && external.Provider != "" {
        localDev := math.Abs(local.Deviation)
        externalDev := math.Abs(external.Deviation)

        if localDev < 0.01 && externalDev < 0.01 {
            confidence += 0.3  // Both within 1% of oracle
        } else if localDev < 0.05 && externalDev < 0.05 {
            confidence += 0.1  // Both within 5% of oracle
        }
    }

    // Factor 2: Liquidity depth
    minLiquidity := math.Min(
        local.LiquidityUsd,
        external.LiquidityUsd,
    )
    if minLiquidity > 100000 {
        confidence += 0.1  // >$100k liquidity
    }

    // Factor 3: Price impact
    totalImpact := local.PriceImpact + external.PriceImpact
    if totalImpact < 0.005 {  // <0.5%
        confidence += 0.1
    }

    return math.Min(confidence, 1.0)
}

Deduplication

type QuoteDeduplicator struct {
    seenRoutes map[string]*Quote  // routeHash -> best quote
    mu         sync.Mutex
}

func (d *QuoteDeduplicator) AddQuote(quote Quote) bool {
    routeHash := d.hashRoute(quote.GetRoutePlan())

    d.mu.Lock()
    defer d.mu.Unlock()

    existing, seen := d.seenRoutes[routeHash]

    if !seen {
        // New route
        d.seenRoutes[routeHash] = &quote
        return true  // Keep
    }

    // Duplicate route - compare prices
    if quote.GetOutputAmount() > existing.GetOutputAmount() {
        // Better price - replace
        d.seenRoutes[routeHash] = &quote
        return true  // Keep
    }

    return false  // Discard (worse price)
}

func (d *QuoteDeduplicator) hashRoute(plan []SwapHop) string {
    // Hash based on pool addresses (order matters)
    var b strings.Builder
    for _, hop := range plan {
        b.WriteString(hop.AmmKey)
        b.WriteString(":")
    }
    hash := sha256.Sum256([]byte(b.String()))
    return hex.EncodeToString(hash[:])
}

Configuration

Environment Variables:

# Service
QUOTE_AGGREGATOR_SERVICE_PORT=50051
QUOTE_AGGREGATOR_SERVICE_HTTP_PORT=8081

# Downstream Services
LOCAL_QUOTE_SERVICE_URL=localhost:50052
EXTERNAL_QUOTE_SERVICE_URL=localhost:50053

# Connection Pool
GRPC_MAX_CONCURRENT_STREAMS=100
GRPC_KEEPALIVE_TIME=30s
GRPC_KEEPALIVE_TIMEOUT=10s

# Circuit Breaker (for downstream services)
CIRCUIT_BREAKER_FAILURE_THRESHOLD=10
CIRCUIT_BREAKER_TIMEOUT=30s

# Observability
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
METRICS_PORT=9092
LOG_LEVEL=info

Deployment Specifications

Resource Requirements:

CPU: 1 core (lightweight aggregation)
Memory: 1-2 GB
- gRPC connections: ~500 MB
- Deduplication map: ~10 MB
- Go runtime: ~500 MB
Disk: 1 GB (logs only)
Network: 10 Mbps (gRPC proxying)

Scaling Strategy:

Horizontal: 10+ instances (stateless)
Load Balancing: NGINX or Kubernetes HPA
Auto-scaling: Based on client connection count

Docker Image:

FROM golang:1.24-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN go build -o /bin/quote-aggregator-service ./cmd/quote-aggregator-service

FROM alpine:latest
RUN apk --no-cache add ca-certificates
COPY --from=builder /bin/quote-aggregator-service /bin/
EXPOSE 50051 8081 9092
ENTRYPOINT ["/bin/quote-aggregator-service"]

Data Flow & Integration

Complete Quote Flow (End-to-End)

┌─────────────────────────────────────────────────────────────┐
│ 1. CLIENT REQUEST                                            │
│    Scanner: StreamQuotes(SOL → USDC, 1 SOL)                 │
│             ↓ gRPC: localhost:50051                          │
└─────────────────────────────────────────────────────────────┘
                            ↓
┌─────────────────────────────────────────────────────────────┐
│ 2. AGGREGATOR (2-3ms overhead)                               │
│    • Parse request                                           │
│    • Fan-out to local + external (PARALLEL)                  │
└─────────────────────────────────────────────────────────────┘
              ↓                              ↓
┌─────────────────────────┐      ┌─────────────────────────────┐
│ 3a. LOCAL (50052)       │      │ 3b. EXTERNAL (50053)        │
│                         │      │                             │
│ • Layer 2 cache check   │      │ • Provider pool selection   │
│   ├─ HIT: <1ms         │      │ • Rate limiter (wait token) │
│   └─ MISS:              │      │ • Circuit breaker check     │
│      ├─ Layer 1 (pool)  │      │ • API call (Jupiter/etc)    │
│      │   (<5ms)         │      │   (200-500ms)               │
│      └─ Pool math       │      │ • Store in cache (10s TTL)  │
│                         │      │                             │
│ Latency: 2-5ms          │      │ Latency: 250ms              │
└─────────────────────────┘      └─────────────────────────────┘
              ↓                              ↓
         LocalQuote                   ExternalQuote
         102.5 USDC                   103.0 USDC
              ↓                              ↓
┌─────────────────────────────────────────────────────────────┐
│ 4. AGGREGATOR MERGE                                          │
│    • Deduplicate routes (hash-based)                         │
│    • Select best (103.0 USDC from External)                  │
│    • Compare local vs external (0.49% diff)                  │
│    • Calculate confidence (0.85)                             │
│    • Build AggregatedQuote                                   │
└─────────────────────────────────────────────────────────────┘
                            ↓
┌─────────────────────────────────────────────────────────────┐
│ 5. STREAM TO CLIENT                                          │
│    AggregatedQuote {                                         │
│      bestLocal: 102.5 USDC,                                 │
│      bestExternal: 103.0 USDC,                              │
│      bestSource: EXTERNAL,                                   │
│      comparison: { priceDiff: 0.49%, rec: "external" },     │
│      confidence: 0.85                                        │
│    }                                                         │
└─────────────────────────────────────────────────────────────┘

Integration with HFT Pipeline

Scanner Integration (TypeScript):

import { QuoteAggregatorServiceClient } from '@/generated/quote';

const client = new QuoteAggregatorServiceClient('localhost:50051');

// Stream aggregated quotes
const stream = client.streamQuotes({
  inputMint: SOL_MINT,
  outputMint: USDC_MINT,
  amount: 1_000_000_000,  // 1 SOL
  includeLocal: true,
  includeExternal: true,
});

stream.on('data', (quote: AggregatedQuote) => {
  // Fast pre-filter (<10ms)
  const opportunities = detectArbitrage(quote);

  // Publish to NATS: opportunity.two_hop.*
  for (const opp of opportunities) {
    await nats.publish('opportunity.two_hop.detected', opp);
  }
});

Planner Integration (TypeScript):

// Planner validates with fresh quotes
const quote = await client.getBestQuote({
  inputMint: opportunity.tokenStart,
  outputMint: opportunity.tokenIntermediate,
  amount: opportunity.amountIn,
});

// Deep validation (50-100ms)
const simulation = await simulateArbitrage(quote);

if (simulation.success && simulation.netProfit > 0.01) {
  const plan = createExecutionPlan(quote, simulation);
  await nats.publish('execution.planned', plan);
}

Error Handling & Graceful Degradation

Failure Scenarios:

Failure	Local Service	External Service	Aggregator	Client Impact
Local service down	❌ Offline	✅ Running	⚠️ Partial	Gets external-only quotes
External service down	✅ Running	❌ Offline	⚠️ Partial	Gets local-only quotes
Specific provider down	N/A	⚠️ Circuit open	✅ OK	Other providers active
RPC timeout	⚠️ Stale quotes	N/A	⚠️ Partial	Stale local + fresh external
API rate limit	N/A	⚠️ Cache hits	✅ OK	Cached external quotes
Both services down	❌ Offline	❌ Offline	❌ Error	Client retries or fails

Graceful Degradation Priority:

Local quotes (fastest, most critical for HFT)
External quotes (slower, better prices sometimes)
Cached quotes (stale but available)
Error (client decides: retry or skip)

Performance Characteristics

Latency Breakdown (Detailed)

Operation	Target	P50	P95	P99	Bottleneck
Local: Layer 2 hit	<1ms	0.5ms	0.8ms	1.2ms	Memory access
Local: Layer 1 hit + calc	<5ms	2ms	4ms	6ms	Pool math CPU
Local: Pool stale (RPC)	<150ms	80ms	120ms	180ms	RPC tick arrays
Local: Parallel paired	<10ms	5ms	8ms	12ms	2× Layer 1 hit
External: Cache hit	<1ms	0.5ms	0.8ms	1.2ms	Memory access
External: API call	200-500ms	250ms	450ms	600ms	API latency
Aggregator: Both cached	<5ms	3ms	5ms	8ms	gRPC overhead
Aggregator: Local cached + External API	200-500ms	255ms	455ms	610ms	External API

Throughput Capacity

Service	Bottleneck	Single Instance	Multi-Instance	Notes
Local Quote	Pool math CPU	2,000 req/s	4,000 req/s (2×)	Quote cache hit rate >80%
External Quote	API rate limits	10 req/s	50 req/s (5×)	Jupiter: 1 RPS, OKX: 10 RPS
Aggregator	gRPC proxying	1,000 req/s	10,000 req/s (10×)	CPU: <50% at 1k req/s
Overall System	External APIs	10-50 req/s	50-250 req/s	Depends on provider mix

Cache Performance

Target Hit Rates:

Local Layer 2 (Quote Response): >80% hit rate
Local Layer 1 (Pool State): >95% hit rate
External Quote Cache: >70% hit rate

Actual Performance (production):

Local Layer 2:  85-90% hit rate ✅
Local Layer 1:  96-98% hit rate ✅
External Cache: 72-78% hit rate ✅

Cache Impact on Latency:

Without cache: 250ms avg (always fresh calculation)
With 80% hit: 50ms avg (0.8 × 1ms + 0.2 × 250ms)
Improvement: 5× faster

Resource Utilization (Production)

Memory Usage:

Service	Per Instance	Total (All Instances)	Breakdown
Local	10-12 GB	20-24 GB (2×)	Pool: 5MB, Quote: 2MB, CLMM: 5MB, Go: 2GB
External	3 GB	15 GB (5×)	Cache: 2MB, HTTP: 500MB, Go: 1.5GB
Aggregator	1.5 GB	15 GB (10×)	gRPC: 500MB, Map: 10MB, Go: 500MB
Total		50-54 GB	Across 17 instances

CPU Usage:

Service	Per Instance	Total (All Instances)	Usage Pattern
Local	2-3 cores	4-6 cores (2×)	Burst: pool math, Idle: 10-20%
External	1 core	5 cores (5×)	Steady: 30-50% (I/O wait)
Aggregator	0.5 core	5 cores (10×)	Steady: 20-30%
Total		14-16 cores	Peak: 80%, Avg: 40-50%

Network Bandwidth:

Service	Ingress	Egress	Total
Local	5 Mbps	5 Mbps	10 Mbps
External	50 Mbps	50 Mbps	100 Mbps (API calls)
Aggregator	10 Mbps	10 Mbps	20 Mbps
Total	65 Mbps	65 Mbps	130 Mbps peak

Deployment & Operations

Docker Compose Configuration

version: '3.8'

services:
  # ============================================================
  # QUOTE AGGREGATOR (Client-Facing)
  # ============================================================
  quote-aggregator:
    image: quote-aggregator-service:latest
    container_name: quote-aggregator
    ports:
      - "50051:50051"  # gRPC (load balanced)
      - "8081:8081"    # HTTP metrics
      - "9092:9092"    # Prometheus
    environment:
      - GRPC_PORT=50051
      - HTTP_PORT=8081
      - METRICS_PORT=9092
      - LOCAL_QUOTE_SERVICE_URL=local-quote:50052
      - EXTERNAL_QUOTE_SERVICE_URL=external-quote:50053
      - MAX_CONCURRENT_STREAMS=100
    depends_on:
      - local-quote
      - external-quote
    deploy:
      replicas: 10
      resources:
        limits:
          cpus: '1'
          memory: 2G
    restart: unless-stopped
    networks:
      - quote-network

  # ============================================================
  # LOCAL QUOTE SERVICE
  # ============================================================
  local-quote:
    image: local-quote-service:latest
    container_name: local-quote
    ports:
      - "50052:50052"  # gRPC
      - "8082:8082"    # HTTP
      - "9090:9090"    # Prometheus
    environment:
      - GRPC_PORT=50052
      - HTTP_PORT=8082
      - METRICS_PORT=9090
      - AMM_REFRESH_INTERVAL=10s
      - CLMM_REFRESH_INTERVAL=30s
      - POOL_CACHE_STALENESS_THRESHOLD=60s
      - QUOTE_CACHE_TTL=2s
      - PARALLEL_QUOTE_ENABLED=true
      - PARALLEL_QUOTE_TIMEOUT_MS=100
      - RPC_PROXY_URL=http://rpc-proxy:8083
      - POOL_DISCOVERY_REDIS_URL=redis://redis:6379/0
      - ORACLE_PYTH_HTTP_URL=https://hermes.pyth.network
      - OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4318
    depends_on:
      - redis
      - rpc-proxy
    deploy:
      resources:
        limits:
          cpus: '4'
          memory: 16G
    restart: unless-stopped
    networks:
      - quote-network

  # ============================================================
  # EXTERNAL QUOTE SERVICE
  # ============================================================
  external-quote:
    image: external-quote-service:latest
    ports:
      - "50053-50057:50053"  # gRPC (5 instances)
      - "8083-8087:8083"     # HTTP
      - "9091-9095:9091"     # Prometheus
    environment:
      - GRPC_PORT=50053
      - HTTP_PORT=8083
      - METRICS_PORT=9091
      - QUOTERS_ENABLED=true
      - QUOTERS_USE_JUPITER=true
      - QUOTERS_USE_JUPITER_ULTRA=false
      - QUOTERS_USE_DFLOW=false
      - QUOTERS_USE_OKX=false
      - JUPITER_RATE_LIMIT=0.16
      - QUOTE_CACHE_TTL=10s
      - CIRCUIT_BREAKER_FAILURE_THRESHOLD=5
      - CIRCUIT_BREAKER_TIMEOUT=30s
      - OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4318
    env_file:
      - ../../.env  # API keys (OKX_API_KEY, BLXR_API_KEY)
    deploy:
      replicas: 5
      resources:
        limits:
          cpus: '2'
          memory: 4G
    restart: unless-stopped
    networks:
      - quote-network

  # ============================================================
  # INFRASTRUCTURE
  # ============================================================
  redis:
    image: redis:7-alpine
    container_name: redis
    ports:
      - "6379:6379"
    volumes:
      - redis-data:/data
    restart: unless-stopped
    networks:
      - quote-network

  rpc-proxy:
    image: solana-rpc-proxy:latest
    container_name: rpc-proxy
    ports:
      - "8083:8083"
    environment:
      - HTTP_PORT=8083
      - RPC_ENDPOINTS=${RPC_ENDPOINTS}
    restart: unless-stopped
    networks:
      - quote-network

networks:
  quote-network:
    driver: bridge

volumes:
  redis-data:

Kubernetes Deployment (Future)

Local Quote Service (StatefulSet):

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: local-quote
spec:
  serviceName: local-quote
  replicas: 2
  selector:
    matchLabels:
      app: local-quote
  template:
    spec:
      containers:
      - name: local-quote
        image: local-quote-service:v1
        ports:
        - containerPort: 50052
          name: grpc
        resources:
          requests:
            cpu: 2
            memory: 8Gi
          limits:
            cpu: 4
            memory: 16Gi
        livenessProbe:
          exec:
            command: ["/bin/grpc_health_probe", "-addr=:50052"]
          initialDelaySeconds: 10
          periodSeconds: 10

Monitoring & Observability

Key Metrics (Prometheus):

Local Quote Service:

# Pool refresh
pool_refresh_duration_seconds{type="amm"}
pool_refresh_duration_seconds{type="clmm"}
pool_cache_size{type="amm"}
pool_cache_size{type="clmm"}
pool_stale_count

# Quote cache
quote_cache_hit_rate
quote_cache_size
quote_staleness_seconds_p95

# Parallel paired quotes
parallel_quote_duration_ms_p50
parallel_quote_duration_ms_p95
parallel_quote_success_rate

# Oracle
oracle_price_deviation_percent

External Quote Service:

# Provider health
provider_healthy{provider="Jupiter"}
provider_requests_total{provider="Jupiter"}
provider_errors_total{provider="Jupiter"}
provider_latency_seconds{provider="Jupiter"}

# Rate limiting
rate_limit_tokens_available{provider="Jupiter"}
rate_limit_requests_waiting{provider="Jupiter"}

# Circuit breaker
circuit_breaker_state{provider="Jupiter"}  # 0=closed, 1=open, 2=halfopen
circuit_breaker_failures_total{provider="Jupiter"}

Aggregator Service:

# Requests
aggregator_requests_total
aggregator_latency_seconds_p50
aggregator_latency_seconds_p95
aggregator_latency_seconds_p99
aggregator_errors_total

# Quote sources
aggregator_best_source_total{source="local"}
aggregator_best_source_total{source="external"}
aggregator_price_diff_percent_p50
aggregator_price_diff_percent_p95

Grafana Dashboards

Dashboard 1: Quote Services Overview (Single Pane of Glass):

Row 1: Service Health
- [Local] [External] [Aggregator] status indicators
- Uptime, error rate, request rate

Row 2: Latency Distribution
- P50/P95/P99 latency (all services)
- Latency heatmap (last 24h)

Row 3: Cache Performance
- Local Layer 1 hit rate (target: >95%)
- Local Layer 2 hit rate (target: >80%)
- External cache hit rate (target: >70%)

Row 4: Quote Quality
- Price diff histogram (local vs external)
- Best source distribution (local % vs external %)
- Oracle deviation alerts

Dashboard 2: Local Quote Deep Dive:

Row 1: Pool Refresh Performance
- AMM refresh duration (target: <10ms)
- CLMM refresh duration (target: <100ms)
- Refresh queue depth

Row 2: Pool Cache
- Cache size (AMM vs CLMM)
- Staleness count
- Refresh rate (per second)

Row 3: Quote Cache
- Hit rate over time
- Cache size
- Eviction rate

Row 4: Parallel Paired Quotes
- Duration (both quotes) P50/P95/P99
- Success rate (both completed)
- Timeout rate

Alerts (Prometheus Alertmanager)

Critical (PagerDuty):

groups:
- name: quote_service_critical
  rules:
  - alert: QuoteServiceDown
    expr: up{job="quote-service"} == 0
    for: 1m
    annotations:
      summary: "Quote service  is down"

  - alert: HighErrorRate
    expr: rate(errors_total[5m]) > 0.05
    for: 5m
    annotations:
      summary: "Error rate >5% on "

  - alert: HighLatencyP99
    expr: histogram_quantile(0.99, latency_seconds) > 5
    for: 5m
    annotations:
      summary: "P99 latency >5s on "

  - alert: AllProvidersDown
    expr: count(provider_healthy == 0) == count(provider_healthy)
    for: 2m
    annotations:
      summary: "All external quote providers are down"

Warning (Slack):

- alert: LowCacheHitRate
  expr: cache_hit_rate < 0.7
  for: 10m
  annotations:
    summary: "Cache hit rate <70% on "

- alert: PoolStaleness
  expr: pool_stale_count > 5
  for: 5m
  annotations:
    summary: " pools are stale (>60s)"

- alert: CircuitBreakerOpen
  expr: circuit_breaker_state > 0
  for: 5m
  annotations:
    summary: "Circuit breaker open for "

- alert: OracleDeviation
  expr: oracle_price_deviation_percent > 1.0
  for: 2m
  annotations:
    summary: "Quote deviates >1% from oracle for "

Conclusion

Architecture Summary

The Quote Service v3.0 delivers sub-10ms quotes through:

✅ 3 specialized microservices (local, external, aggregator) ✅ Dual cache architecture (pool state + quote response) ✅ Parallel paired quotes (<10ms vs 100ms sequential) ✅ Background refresh (AMM: 10s, CLMM: 30s) ✅ Rate limiting & circuit breakers (external API protection) ✅ Oracle validation (Pyth price feed deviation alerts) ✅ gRPC streaming (continuous real-time updates) ✅ Graceful degradation (partial results on failures)

Critical Path Performance

QUOTE-SERVICE → SCANNER → PLANNER → EXECUTOR → PROFIT
   < 5ms        < 10ms     50-100ms    < 20ms
──────────────────────────────────────────────────
              < 200ms TOTAL ✅

Quote Service is the foundation: If quotes are slow, stale, or inaccurate, the entire HFT system fails.

Key Innovations

Parallel Paired Quotes (Doc 22):
- Sequential: 100ms delay, different slots ❌
- Parallel: <10ms delay, same slot ✅
- 10× faster, 100% slot consistency
Dual Cache (Doc 33):
- Layer 1: Pool state (expensive to fetch)
- Layer 2: Quote response (cheap to calculate)
- Pool-aware invalidation
Microservices (Doc 34):
- Failure isolation (RPC ≠ API)
- Independent scaling (vertical vs horizontal)
- Independent deployment (3× release velocity)
Shared Memory IPC (Doc 31):
- Three-tier storage: Shared memory → Redis → PostgreSQL
- Lock-free reads with atomic versioning (<1μs quote access)
- Dual shared memory regions (internal vs external quotes)
- 100-500x faster than gRPC for Rust production scanners
- Sub-10μs arbitrage detection (vs 500μs-2ms with gRPC)

Production Readiness

Metric	Target	Actual	Status
Latency (cached)	<5ms	2-3ms	✅ Exceeds
Latency (fresh)	<10ms	5-8ms	✅ Meets
Throughput	2,000 req/s	2,500 req/s	✅ Exceeds
Cache Hit Rate	>80%	85-90%	✅ Exceeds
Availability	99.9%	99.95%	✅ Exceeds
Parallel Paired	<10ms	5-8ms	✅ Meets

Archived Documentation

This document supersedes and consolidates:

✅ Doc 22: Parallel Paired Quotes (integrated into Local Service)
✅ Doc 24: Quote Streaming Design (integrated into Architecture)
✅ Doc 26: Quote Service Rewrite (evolved into Microservices)
✅ Doc 31: Shared Memory IPC Architecture (fully incorporated into Section 4.2)
✅ Doc 32: Hybrid Quote Architecture (split into Local + External)
✅ Doc 33: Background Pool Refresh (integrated into Local Service)
✅ Doc 34: Microservices Architecture (current design)

Archive Location: docs/archive/quote-service/

Document Version: 3.0 Last Updated: December 31, 2025 Status: ✅ Production-Ready - Canonical Reference Maintainer: Solution Architect Team

Share on

Bluesky Facebook LinkedIn Mastodon X (formerly Twitter)

James Shen

Quote Service Architecture - Source of Truth

Table of Contents

Executive Summary

What is the Quote Service?

Current State: 3-Microservice Architecture

Key Performance Metrics

Critical Path in HFT Pipeline

System Context

Position in Overall HFT Architecture

Integration Points

Architecture Evolution

Historical Context: 10+ Iterations

Why Microservices? (Evolution from Monolith)

Current Architecture: 3-Microservice Design

Service Boundaries

Communication Protocols

Shared Memory IPC (Rust Production Scanners)

Implementation Roadmap (3 weeks total)

Service 1: Local Quote Service

Purpose

Dual Cache Architecture

Parallel Paired Quote Calculation

Quote Request Flow

Background Refresh System

Protocol Support

Oracle Integration

Configuration

Deployment Specifications

Service 2: External Quote Service

Purpose

Provider Architecture

Provider Interface

Rate Limiting Strategy

Circuit Breaker Logic

Provider Rotation & Health

Configuration

Deployment Specifications

Service 3: Quote Aggregator Service

Purpose

Aggregation Architecture

Fan-Out Implementation

Quote Comparison

Deduplication

Configuration

Deployment Specifications

Data Flow & Integration

Complete Quote Flow (End-to-End)

Integration with HFT Pipeline

Error Handling & Graceful Degradation

Performance Characteristics

Latency Breakdown (Detailed)

Throughput Capacity

Cache Performance

Resource Utilization (Production)

Deployment & Operations

Docker Compose Configuration

Kubernetes Deployment (Future)

Monitoring & Observability

Grafana Dashboards

Alerts (Prometheus Alertmanager)

Conclusion

Architecture Summary

Critical Path Performance

Key Innovations

Production Readiness

Archived Documentation

Share on