Quote Service Rewrite: Clean Architecture for Long-Term Maintainability
Published:
π Merry Christmas and Happy New Year! π
On this Christmas Day 2025, Iβm taking a moment to reflect on the journey of building this Solana HFT trading system. As we celebrate with family and friends, Iβm also planning the next major evolution of our architecture.
Wishing everyone a Merry Christmas and a prosperous Happy New Year! May 2026 bring successful trades, robust systems, and minimal bugs! π
Todayβs post is a bit differentβinstead of implementation details, Iβm sharing the architectural rewrite plan for our quote-service. Itβs a story of technical debt, lessons learned, and the path to sustainable architecture.
TL;DR
Planning a comprehensive rewrite of quote-service with clean architecture principles AND HFT integration:
- 85% Code Reduction: 50K lines β 15K lines through proper separation of concerns
- Sub-10ms Cached Quotes: < 10ms HFT-critical latency (vs current 200ms uncached)
- 4x Better Test Coverage: 20% β 80%+ with dependency injection and interfaces
- Dramatically Better Maintainability: Internal packages, clean architecture, single responsibility
- Service Separation: 3 services (quote, pool discovery, RPC proxy) vs 1 monolith
- Technology Decision: Go for speed (2-3 weeks), Rust RPC proxy for shared infrastructure
- HFT Pipeline Integration: Shredstream cache (300-800ms head start), FlatBuffers events (20-150x faster), NATS MARKET_DATA stream
The Core Insight: The current quote-service works, but itβs unmaintainable and not HFT-ready. We need to rebuild the foundation now before technical debt makes future changes impossible, AND we need to integrate with the HFT pipeline for sub-200ms end-to-end execution.
Table of Contents
- The Problem: Why Rewrite a Working System?
- Current Architecture: Design Flaws
- New Architecture: Clean Separation
- Go vs Rust Decision
- HTTP + gRPC: Combined vs Split
- HFT Integration Requirements β NEW
- Clean Architecture Benefits
- Technology Stack Decisions
- Expected Improvements
- Conclusion: Building for the Future
The Problem: Why Rewrite a Working System?
It Works, Butβ¦
The current quote-service is feature-complete and functional:
- β Serves quotes via HTTP and gRPC
- β Supports 6 DEX protocols (Raydium, Meteora, Orca, Pump.fun)
- β Real-time WebSocket updates
- β 99.99% availability with RPC pool
- β Redis crash recovery
- β Full observability (Grafana LGTM stack: Loki, Grafana, Tempo, Mimir)
So why rewrite?
Because βworksβ is not enough for long-term success. The system has critical architectural flaws that make it:
- Difficult to maintain - 96KB
cache.gofile with 50+ methods - Hard to test - Tightly coupled components, 20% test coverage
- Slow to extend - Adding features requires touching multiple files
- Risky to deploy - No confidence in changes due to poor testing
- Impossible to reason about - Mixed concerns everywhere
The Technical Debt Reality
Current Codebase Health:
βββ Lines of Code: 50,000+ (monolithic)
βββ Test Coverage: ~20% (hard to test)
βββ Files in cmd/: 20+ files (violates Go standards)
βββ Largest File: 96KB cache.go (unmaintainable)
βββ Architectural Pattern: Big Ball of Mud β
This is a ticking time bomb. Every feature we add makes it worse. Every bug fix becomes harder. Eventually, weβll reach a point where the system is too complex to understand and too risky to change.
The time to fix this is NOW, while we still can.
Current Architecture: Design Flaws
Flaw #1: Monolithic cache.go (96KB, 50+ methods)
The Problem:
// cache.go mixes EVERYTHING in one file:
type QuoteCache struct {
router *pkg.SimpleRouter // Pool routing
solClient *sol.Client // RPC client β
wsPool *subscription.WSPool // WebSocket β
oraclePriceFetcher *oracle.PriceFetcher // Oracle
cache map[string]*CachedQuote // Actual cache
poolLiquidity map[string]float64 // Pool state β
// ... 20 more fields
}
// 50+ methods that do everything:
func (c *QuoteCache) UpdateQuote() // Quote refresh
func (c *QuoteCache) DiscoverPools() // Pool discovery β
func (c *QuoteCache) ManageRPCPool() // RPC management β
func (c *QuoteCache) HandleWebSocket() // WebSocket β
// ... 46 more methods
Why This Is Bad:
- Violates Single Responsibility Principle - Does 5 different things
- Impossible to test in isolation - Too many dependencies
- Cannot reason about code - 96KB file is too large to hold in your head
- Changes have unpredictable side effects - Everything is interconnected
What Should Happen:
QuoteCacheshould ONLY cache quotes (1 responsibility)- Pool discovery β Separate service
- RPC management β Rust RPC Proxy
- WebSocket β Pool discovery service
Flaw #2: RPC Logic Embedded in Service
The Problem:
pkg/sol/rpc_pool.go (1200+ lines)
βββ RPC pool management
βββ Health monitoring
βββ Rate limiting
βββ Failover logic
βββ Cannot be reused by other services β
Why This Is Bad:
- Code duplication - Scanner needs RPC pool, must copy-paste
- Inconsistent behavior - Each service implements RPC differently
- Wasted effort - Solving the same problem multiple times
- Bugs multiply - Fix a bug in quote-service, scanner still broken
What Should Happen:
- Centralized Rust RPC Proxy (see docs/25-RUST-RPC-PROXY-DESIGN.md)
- Used by ALL services (quote, scanner, executor)
- Single source of truth for RPC management
Flaw #3: Pool Discovery During Quote Serving
The Problem:
Every 30 seconds:
1. UpdateQuote() triggered
2. For each pair:
ββ QueryAllPools() β Makes RPC calls! β
ββ Fetch pool state from blockchain (200ms)
ββ Calculate quote
ββ Cache result
PROBLEM: Discovery blocks quote serving!
Why This Is Bad:
- Slow - Discovery takes 200ms, blocks quote serving
- Unreliable - RPC failures cause quote serving to fail
- Wasteful - Discovering same pools every 30s
- Tight coupling - Quote logic mixed with discovery logic
What Should Happen:
- Separate pool-discovery-service (runs every 5 minutes)
- Writes discovered pools to Redis
- Quote-service just reads from Redis (0.5ms)
- No blocking, no coupling
Flaw #4: No Internal Packages
The Problem:
Current (WRONG):
go/cmd/quote-service/
βββ main.go
βββ cache.go
βββ grpc_server.go
βββ handler_*.go (10 files)
βββ ... all logic in cmd/ β
Problems:
- Violates Go project layout standards
- Cannot import logic in other services
- Difficult to test (no interfaces)
- Everything is tightly coupled
What Should Happen:
Correct Structure:
go/
βββ cmd/quote-service/
β βββ main.go (ONLY DI wiring, 100 lines)
β
βββ internal/quote-service/
βββ domain/ # Interfaces + models
βββ repository/ # Data access (Redis, cache)
βββ calculator/ # Quote calculation
βββ service/ # Business logic
βββ api/ # HTTP + gRPC handlers
Benefits:
- β Clean separation of concerns
- β Easy to test (inject mocks via interfaces)
- β Each package has ONE responsibility
- β Follows Go best practices
Flaw #5: Hard to Test
Current Test Coverage: 20% β
Why So Low?
// Current code (impossible to test):
func (c *QuoteCache) UpdateQuote() {
// Hard-coded RPC client β
pools := c.solClient.QueryAllPools(...)
// Hard-coded WebSocket β
c.wsPool.Subscribe(...)
// No interfaces, cannot inject mocks β
}
// To test this, you need:
- Real RPC endpoint (flaky, slow)
- Real WebSocket connection (flaky, slow)
- Real Redis (integration test, not unit test)
- Full infrastructure (NATS, Prometheus, etc.)
Result: Nobody writes tests, coverage stays at 20%
What Should Happen:
// New code (easy to test):
type QuoteService struct {
poolRepo domain.PoolReader // Interface! β
calculator domain.PriceCalculator // Interface! β
cacheManager domain.CacheManager // Interface! β
}
// To test this:
func TestQuoteService(t *testing.T) {
// Inject mocks! No real infrastructure needed!
mockPoolRepo := &MockPoolReader{}
mockCalculator := &MockPriceCalculator{}
mockCache := &MockCacheManager{}
service := NewQuoteService(mockPoolRepo, mockCalculator, mockCache)
// Test business logic in isolation β
quote, err := service.GetQuote(ctx, "SOL", "USDC", 1000000000)
assert.NoError(t, err)
assert.Equal(t, expectedOutput, quote.OutputAmount)
}
Result: 80%+ test coverage, fast unit tests β
New Architecture: Clean Separation
High-Level Architecture
Before (Monolithic):
βββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Quote Service (Single Monolith) β
β β
β β’ Quote caching (Good β
) β
β β’ Pool discovery (Blocks serving β) β
β β’ RPC management (Should be shared β) β
β β’ WebSocket updates (Blocks serving β) β
β β’ HTTP API (Good β
) β
β β’ gRPC streaming (Good β
) β
β β
β PROBLEMS: β
β - 50K lines, unmaintainable β
β - Discovery blocks quote serving β
β - RPC logic cannot be reused β
β - Hard to test (20% coverage) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββ
After (Clean Separation + HFT Integration):
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Shredstream Scanner (Rust - 300-800ms Advance) β
β β’ QUIC protocol for unconfirmed slot data β
β β’ Publishes: pool.state.updated.* (NATS) β
β β’ Provides 300-800ms head start over RPC β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββ
β NATS pool.state.updated.*
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Pool Discovery Service (NEW - Independent) β
β β’ Discovers pools every 5 minutes β
β β’ Writes to Redis (pool metadata) β
β β’ Solscan enrichment (TVL, 24h volume) β
β β’ Pool quality filtering (liquidity, status) β
β β’ 8K lines, single responsibility β
β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββ
β Redis (pool metadata)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Quote Service (REWRITTEN - Clean + HFT Ready) β
β β
β INPUTS: β
β β’ Redis pool metadata (5-10ms) β
β β’ NATS pool.state.updated.* (Shredstream cache) β
β β
β CORE: β
β β’ Hybrid cache: Shredstream (5ms) β In-memory β
β β’ Slot-based consistency (only update if newer) β
β β’ Thread-safe pool cache (sync.RWMutex) β
β β’ 15K lines, clean architecture β
β
β β’ 80%+ test coverage β
β
β β
β OUTPUTS: β
β β’ HTTP API :8080 (< 10ms quotes) β
β β’ gRPC streaming :50051 β
β β’ NATS market.swap_route.* (FlatBuffers events) β
β β
β Internal Structure: β
β βββ domain/ (interfaces, models) β
β βββ repository/ (Redis, cache, oracle) β
β βββ cache/ (Shredstream pool cache) β NEW β
β βββ calculator/ (pool math, routing) β
β βββ service/ (business logic) β
β βββ events/ (FlatBuffers publisher) β NEW β
β βββ nats/ (NATS subscriber) β NEW β
β βββ api/ (HTTP + gRPC) β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββ
β NATS MARKET_DATA stream
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Scanner Service (Stage 1: Opportunity Det.) β
β β’ Subscribes: market.swap_route.* β
β β’ Detects arbitrage opportunities β
β β’ Publishes: opportunity.* (< 50ms) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β HTTP (RPC calls)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Rust RPC Proxy (Shared Infrastructure) β
β β’ Centralized RPC management β
β β’ Used by ALL services (quote, scanner, executor) β
β β’ Rate limiting, health monitoring β
β β’ Connection pooling, circuit breaker β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
HFT Pipeline Flow (Stage 0 β Stage 1):
Stage 0: Quote Service (< 10ms per quote)
β publishes: market.swap_route.* (FlatBuffers, <1ms)
Stage 1: Scanner (< 50ms detection)
β publishes: opportunity.*
Stage 2: Planner (< 50ms planning)
β publishes: execution.planned
Stage 3: Executor (< 90ms execution)
β publishes: execution.completed
TOTAL: < 200ms end-to-end (vs current 1.7s = 8.5x faster)
Key Improvements
| Aspect | Before (Monolithic) | After (Clean) | Benefit |
|---|---|---|---|
| Quote Latency | ~200ms (discovery included) | < 10ms (Redis lookup) | 20x faster |
| Code Size | 50K lines | 15K lines (quote) + 8K (discovery) | 85% reduction |
| Test Coverage | 20% | > 80% target | 4x better |
| Maintainability | Poor (monolithic) | Excellent (clean architecture) | High |
| RPC Reusability | No (embedded) | Yes (shared proxy) | High |
| Deployment Risk | High (single service) | Low (independent services) | Lower |
Go vs Rust Decision
Performance Analysis: Is Rust Worth It?
Go (Optimized):
Redis pool lookup: 0.5ms
Pool math calculation: 0.2ms
Price calculation: 0.1ms
Response serialization: 0.1ms
βββββββββββββββββββββββββββββ
TOTAL: 0.9ms β
Excellent
Rust (Theoretical):
Redis pool lookup: 0.3ms (faster client)
Pool math calculation: 0.1ms (zero-cost abstractions)
Price calculation: 0.05ms (SIMD)
Response serialization: 0.05ms (serde zero-copy)
βββββββββββββββββββββββββββββ
TOTAL: 0.5ms β
Better, but marginal
Verdict: 0.4ms improvement (44% faster) is NOT worth 5 extra weeks
Decision Matrix
| Factor | Go | Rust | Winner |
|---|---|---|---|
| Development Speed | 2-3 weeks β | 6-8 weeks β οΈ | Go |
| Team Knowledge | Proven β | Learning curve β οΈ | Go |
| Performance | <10ms β | <5ms β | Tie (both good enough) |
| Code Reuse | Can reuse router/pool β | Rewrite everything β | Go |
| Risk | Low β | High β οΈ | Go |
Decision: Go for Quote Service β
Rationale:
- Solo developer - stick to known language
- Time to market - 2-3 weeks vs 6-8 weeks
- Performance - <10ms target easily met with Go
- Code reuse - can reuse existing
pkg/router,pkg/pool - Risk mitigation - proven technology, easy rollback
Hybrid Approach (Best of Both Worlds)
Use Go for:
β
Quote Service (fast delivery, good enough performance)
β
Pool Discovery (I/O bound, Go is perfect)
Use Rust for:
β
RPC Proxy (shared infrastructure, worth investment)
β
Transaction Builder (memory-critical, zero-copy)
β
Shredstream Parser (ultra-low latency)
Result: Fast delivery where it matters, peak performance where it counts
HTTP + gRPC: Combined vs Split
The Question
Should HTTP and gRPC be in one service or split into two separate services?
Option 1: Combined (RECOMMENDED β )
βββββββββββββββββββββββββββββββββββββββββββ
β Quote Service (Single Process) β
β β
β βββββββββββββββ ββββββββββββββββββ β
β β HTTP :8080 β β gRPC :50051 β β
β ββββββββ¬βββββββ ββββββββββ¬ββββββββ β
β β β β
β ββββββββββ¬ββββββββββββ β
β βΌ β
β ββββββββββββββββββββββββββββ β
β β In-Memory Cache β β
β β (SHARED! β
) β β
β β 0.3ms access β β
β ββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββ
Performance:
- HTTP cached quote: 0.3ms β
- gRPC stream update: 0.15ms β
- Throughput: 10,000 req/s β
Option 2: Split (NOT RECOMMENDED β οΈ)
ββββββββββββββββββββββββββββ
β HTTP Service :8080 β
β Uses Redis cache β
βββββββββ¬βββββββββββββββββββ
βΌ
Redis (1ms overhead)
β²
βββββββββ΄βββββββββββββββββββ
β gRPC Service :50051 β
β Uses Redis cache β
ββββββββββββββββββββββββββββ
Performance:
- HTTP cached quote: 1.2ms (4x slower β)
- gRPC stream update: 1.05ms (7x slower β)
- Throughput: 1,000 req/s (10x less β)
Performance Comparison
| Scenario | Combined | Split (Redis) | Difference |
|---|---|---|---|
| Cached Quote (HTTP) | 0.3ms β | 1.2ms β οΈ | 4x slower |
| gRPC Stream Update | 0.15ms β | 1.05ms β οΈ | 7x slower |
| Throughput | 10K req/s β | 1K req/s β οΈ | 10x less |
| Memory | 300MB β | 600MB β οΈ | 2x more |
| Services to Deploy | 1 β | 2 β οΈ | 2x ops |
Decision: COMBINED β
Why Combined Wins:
- Performance - 4-7x faster (CRITICAL for HFT)
- In-memory cache: 0.3ms
- Redis cache: 1.2ms
- Redis overhead kills performance
- Throughput - 10x higher capacity
- Combined: 10K req/s
- Split: 1K req/s (Redis bottleneck)
- Simplicity - Solo developer
- 1 service vs 2 services
- 1 deployment vs 2 deployments
- Memory Efficiency - 50% less RAM
- Combined: 300MB (single in-memory cache)
- Split: 600MB (2x Redis storage)
The Insight: For HFT systems targeting sub-10ms latency, in-memory cache sharing between HTTP and gRPC is non-negotiable. The 1ms Redis overhead destroys performance gains from service separation.
HFT Integration Requirements
Quote-service is Stage 0 of the HFT pipeline. These requirements are NON-NEGOTIABLE for sub-200ms end-to-end execution.
Performance Targets β‘
CRITICAL: Quote-service must meet these latency targets to enable the full HFT pipeline.
| Metric | Target | HFT Requirement |
|---|---|---|
| Cached Quote (Cache Hit) | < 10ms | MANDATORY |
| Cached Quote (Shredstream) | < 5ms | OPTIMAL |
| NATS Event Publishing | < 1ms | 10,000 events/sec |
| Pool State Update | Slot-based | Only if newer slot |
| Cache Hit Rate | > 95% | Minimize RPC calls |
1. Shredstream Pool State Cache (300-800ms Advance)
Shredstream provides unconfirmed slot data via QUIC protocol, giving us a 300-800ms head start over RPC.
Implementation:
// internal/quote-service/cache/shredstream_cache.go
type PoolStateCache struct {
mu sync.RWMutex
pools map[string]*PoolState // key: pool address
config CacheConfig
}
type PoolState struct {
Address string
BaseMint string
QuoteMint string
BaseReserve uint64
QuoteReserve uint64
Liquidity float64
Price float64
Slot uint64 // CRITICAL: For consistency
LastUpdated time.Time
}
// Slot-based consistency: ONLY update if newer slot
func (c *PoolStateCache) Update(state *PoolState) {
c.mu.Lock()
defer c.mu.Unlock()
existing, exists := c.pools[state.Address]
if exists && existing.Slot >= state.Slot {
return // Ignore stale update
}
state.LastUpdated = time.Now()
c.pools[state.Address] = state
}
// Thread-safe read
func (c *PoolStateCache) Get(address string) (*PoolState, bool) {
c.mu.RLock()
defer c.mu.RUnlock()
state, exists := c.pools[address]
if !exists {
return nil, false
}
// Check staleness (30s threshold)
if time.Since(state.LastUpdated) > 30*time.Second {
return nil, false
}
return state, true
}
2. NATS Subscriber for Shredstream Events
Subscribe to pool.state.updated.* events from Shredstream Scanner.
Implementation:
// internal/quote-service/nats/subscriber.go
type ShredstreamSubscriber struct {
nc *nats.Conn
js nats.JetStreamContext
cache *cache.PoolStateCache
}
func (s *ShredstreamSubscriber) Start(ctx context.Context) error {
// Subscribe to pool state updates
sub, err := s.js.Subscribe(
"pool.state.updated.*",
func(msg *nats.Msg) {
s.handlePoolUpdate(msg)
msg.Ack()
},
nats.Durable("quote-service-pool-updates"),
nats.DeliverAll(),
)
if err != nil {
return fmt.Errorf("subscribe failed: %w", err)
}
// Background eviction loop
go s.evictionLoop(ctx)
return nil
}
func (s *ShredstreamSubscriber) handlePoolUpdate(msg *nats.Msg) {
var state cache.PoolState
if err := json.Unmarshal(msg.Data, &state); err != nil {
log.Warn("Failed to unmarshal pool state", "error", err)
return
}
// Update cache with slot-based consistency
s.cache.Update(&state)
}
// Evict stale entries every 60s
func (s *ShredstreamSubscriber) evictionLoop(ctx context.Context) {
ticker := time.NewTicker(60 * time.Second)
defer ticker.Stop()
for {
select {
case <-ticker.C:
s.cache.Evict(30 * time.Second)
case <-ctx.Done():
return
}
}
}
3. FlatBuffers Event Publishing (20-150x Faster)
Publish swap route events to NATS MARKET_DATA stream using FlatBuffers for zero-copy serialization.
FlatBuffers Schema:
// internal/quote-service/events/schemas.fbs
namespace events;
table SwapRouteEvent {
token_in: string;
token_out: string;
amount_in: uint64;
amount_out: uint64;
price: double;
price_impact_bps: uint32;
route: [RouteHop];
protocol: string;
pool_address: string;
slot: uint64;
timestamp: uint64;
trace_id: string;
}
table RouteHop {
protocol: string;
pool_address: string;
input_mint: string;
output_mint: string;
amount_in: uint64;
amount_out: uint64;
fee_bps: uint32;
}
Publisher Implementation:
// internal/quote-service/events/publisher.go
type FlatBuffersPublisher struct {
js nats.JetStreamContext
builder *flatbuffers.Builder
}
func (p *FlatBuffersPublisher) PublishSwapRoute(
ctx context.Context,
quote *domain.Quote,
) error {
// Reset builder for reuse
p.builder.Reset()
// Build FlatBuffers message
tokenIn := p.builder.CreateString(quote.InputMint)
tokenOut := p.builder.CreateString(quote.OutputMint)
protocol := p.builder.CreateString(quote.Protocol)
poolAddr := p.builder.CreateString(quote.PoolAddress)
traceID := p.builder.CreateString(observability.TraceID(ctx))
SwapRouteEventStart(p.builder)
SwapRouteEventAddTokenIn(p.builder, tokenIn)
SwapRouteEventAddTokenOut(p.builder, tokenOut)
SwapRouteEventAddAmountIn(p.builder, quote.AmountIn)
SwapRouteEventAddAmountOut(p.builder, quote.AmountOut)
SwapRouteEventAddPrice(p.builder, quote.Price)
SwapRouteEventAddPriceImpactBps(p.builder, quote.PriceImpactBps)
SwapRouteEventAddProtocol(p.builder, protocol)
SwapRouteEventAddPoolAddress(p.builder, poolAddr)
SwapRouteEventAddSlot(p.builder, quote.Slot)
SwapRouteEventAddTimestamp(p.builder, uint64(time.Now().Unix()))
SwapRouteEventAddTraceId(p.builder, traceID)
event := SwapRouteEventEnd(p.builder)
p.builder.Finish(event)
// Publish to NATS (< 1ms)
subject := fmt.Sprintf("market.swap_route.%s.%s",
quote.InputMint[:8], quote.OutputMint[:8])
_, err := p.js.Publish(subject, p.builder.FinishedBytes(),
nats.MsgId(traceID))
return err
}
Performance Comparison:
| Format | Encode | Decode | Size | Performance |
|---|---|---|---|---|
| FlatBuffers | 100ns | 50ns | 400 bytes | 20-150x faster β |
| JSON | 500ns | 2000ns | 1200 bytes | Baseline |
| Protobuf | 200ns | 800ns | 600 bytes | 2-10x faster |
4. Hybrid Cache Strategy
Three-tier cache strategy for optimal latency:
// internal/quote-service/service/quote_service.go
func (s *QuoteService) GetQuote(
ctx context.Context,
inputMint, outputMint string,
amount uint64,
) (*domain.Quote, error) {
// Strategy 1: Try Shredstream pool cache (5-10ms)
if s.config.Shredstream.Enabled {
quote, err := s.getQuoteFromShredstream(inputMint, outputMint, amount)
if err == nil {
s.metrics.CacheHits.Inc()
return quote, nil
}
}
// Strategy 2: Try in-memory quote cache (< 5ms)
if cached, ok := s.cache.Get(inputMint, outputMint, amount); ok {
if time.Since(cached.Timestamp) < s.config.Cache.TTL {
s.metrics.CacheHits.Inc()
return cached, nil
}
}
// Strategy 3: Calculate fresh quote (100-200ms fallback)
s.metrics.CacheMisses.Inc()
quote, err := s.calculateQuote(ctx, inputMint, outputMint, amount)
if err != nil {
return nil, err
}
// Cache for future requests
s.cache.Set(inputMint, outputMint, amount, quote)
return quote, nil
}
5. Configuration
Environment variables for HFT integration:
# Shredstream Integration
SHREDSTREAM_ENABLED=true
SHREDSTREAM_CACHE_MAX_STALENESS=30s
SHREDSTREAM_EVICTION_INTERVAL=60s
# NATS Configuration
NATS_URL=nats://localhost:4222
NATS_SUBJECT_POOL_UPDATES="pool.state.updated.*"
NATS_SUBJECT_SWAP_ROUTE="market.swap_route"
NATS_DURABLE_NAME="quote-service-pool-updates"
# HFT Performance Targets
HFT_QUOTE_LATENCY_TARGET_MS=10
HFT_EVENT_PUBLISH_RATE_TARGET=10000
HFT_CACHE_HIT_RATE_TARGET=0.95
# FlatBuffers
FLATBUFFERS_ENABLED=true
FLATBUFFERS_BUILDER_INITIAL_SIZE=1024
6. Updated Package Structure
internal/quote-service/
βββ cache/ # NEW: Shredstream pool cache
β βββ shredstream_cache.go
β βββ eviction.go
βββ events/ # NEW: FlatBuffers event publishing
β βββ publisher.go
β βββ schemas.fbs # FlatBuffers schema
βββ nats/ # NEW: NATS integration
βββ subscriber.go # Pool state updates
βββ kill_switch.go # Emergency stop
7. Why FlatBuffers Over JSON/Protobuf?
FlatBuffers Advantages:
- Zero-copy deserialization - Access data without parsing
- 20-150x faster than JSON encoding/decoding
- Smaller message size - 400 bytes vs 1200 bytes (JSON)
- Backward/forward compatible - Schema evolution
- No runtime serialization - Data stored in-memory ready to send
When to Use FlatBuffers:
- β High-frequency events (10,000/sec)
- β Latency-critical paths (< 1ms publish)
- β Large message volumes
- β Human-readable debugging (use JSON for admin APIs)
8. HFT Pipeline Integration
Quote-service is Stage 0 of the 4-stage HFT pipeline:
βββββββββββββββββββββββββββββββββββββββββββββββ
β Stage 0: Quote Service (< 10ms) β
β βββββββββββββββββββββββββββββββββββββββββββ β
β INPUT: HTTP/gRPC request β
β PROCESS: Hybrid cache (Shredstream β Mem) β
β OUTPUT: FlatBuffers event β MARKET_DATA β
ββββββββββββββββββ¬βββββββββββββββββββββββββββββ
β NATS: market.swap_route.*
βββββββββββββββββββββββββββββββββββββββββββββββ
β Stage 1: Scanner (< 50ms) β
β Detects arbitrage opportunities β
ββββββββββββββββββ¬βββββββββββββββββββββββββββββ
β NATS: opportunity.*
βββββββββββββββββββββββββββββββββββββββββββββββ
β Stage 2: Planner (< 50ms) β
β Plans execution strategy β
ββββββββββββββββββ¬βββββββββββββββββββββββββββββ
β NATS: execution.planned
βββββββββββββββββββββββββββββββββββββββββββββββ
β Stage 3: Executor (< 90ms) β
β Submits Jito bundle β
βββββββββββββββββββββββββββββββββββββββββββββββ
TOTAL: < 200ms end-to-end (vs current 1.7s)
Quote Service Responsibilities:
- β Serve quotes in < 10ms (Stage 0 target)
- β Publish FlatBuffers events to MARKET_DATA stream
- β Subscribe to Shredstream pool state updates
- β Maintain > 95% cache hit rate
- β Handle 10,000 events/sec throughput
Clean Architecture Benefits
Internal Package Structure
New Directory Layout:
go/
βββ cmd/
β βββ quote-service/
β β βββ main.go # 100 lines (ONLY DI wiring)
β βββ pool-discovery-service/
β βββ main.go
β
βββ internal/
βββ quote-service/
β βββ domain/ # Core business logic
β β βββ interfaces.go # PoolReader, PriceCalculator
β β βββ quote.go # Quote, Pool models
β β βββ errors.go # Business errors
β β
β βββ repository/ # Data access
β β βββ pool_repository.go # Redis pool reader
β β βββ cache_repository.go # In-memory cache
β β βββ oracle_repository.go # Pyth/Jupiter
β β
β βββ calculator/ # Business logic
β β βββ pool_calculator.go # AMM math
β β βββ slippage_calculator.go # Price impact
β β βββ route_optimizer.go # Best route
β β
β βββ service/ # Orchestration
β β βββ quote_service.go # Quote orchestration
β β βββ price_service.go # Price calculation
β β βββ cache_service.go # Cache management
β β
β βββ api/ # HTTP + gRPC
β βββ http/handler.go # Gin handlers
β βββ grpc/server.go # gRPC streaming
β
βββ pool-discovery/
βββ scanner/ # DEX scanners
βββ storage/ # Redis writer
βββ scheduler/ # Periodic job
Code Size Reduction
Before (Monolithic):
cmd/quote-service/
βββ main.go 52,844 bytes β
βββ cache.go 96,419 bytes β
βββ grpc_server.go 40,734 bytes β
βββ ... 17 more files
TOTAL: 317KB (50K+ lines) β
After (Clean Architecture):
internal/quote-service/
βββ domain/ 4,500 bytes β
βββ repository/ 10,000 bytes β
βββ calculator/ 10,000 bytes β
βββ service/ 9,000 bytes β
βββ api/ 10,000 bytes β
cmd/quote-service/
βββ main.go 3,000 bytes β
TOTAL: 46.5KB (15K lines) β
REDUCTION: 85% less code! β
Testability Example
Before (Impossible to Test):
// All dependencies hard-coded
func (c *QuoteCache) UpdateQuote() {
pools := c.solClient.QueryAllPools(...) // Hard-coded RPC β
c.wsPool.Subscribe(...) // Hard-coded WS β
// Cannot inject mocks, must use real infrastructure
}
// Test coverage: 20% (too hard to test)
After (Easy to Test):
// All dependencies are interfaces
type QuoteService struct {
poolRepo domain.PoolReader // Interface β
calculator domain.PriceCalculator // Interface β
cacheManager domain.CacheManager // Interface β
}
// Test with mocks
func TestGetQuote(t *testing.T) {
mockPoolRepo := &MockPoolReader{
pools: testPools, // Inject test data
}
mockCalculator := &MockPriceCalculator{
output: expectedOutput,
}
mockCache := &MockCacheManager{}
service := NewQuoteService(mockPoolRepo, mockCalculator, mockCache)
quote, err := service.GetQuote(ctx, "SOL", "USDC", 1000000000)
assert.NoError(t, err)
assert.Equal(t, expectedOutput, quote.OutputAmount)
}
// Test coverage: 80%+ (easy to test with mocks) β
Single Responsibility Principle
Each package has ONE job:
| Package | Responsibility | Example |
|---|---|---|
domain/ | Define interfaces and models | type PoolReader interface { ... } |
repository/ | Data access (Redis, cache) | GetPoolsByPair(...) |
calculator/ | Business logic (pool math) | CalculateQuote(pool, amount) |
service/ | Orchestration | GetQuote() - coordinates repositories + calculators |
api/ | HTTP + gRPC handlers | Parse request, call service, return response |
Benefits:
- β Easy to understand (each package is small and focused)
- β Easy to test (inject dependencies via interfaces)
- β Easy to change (modify one package without affecting others)
- β Easy to extend (add new calculators, repositories, etc.)
Technology Stack Decisions
Final Technology Stack
| Component | Technology | Rationale |
|---|---|---|
| Quote Service | Go | Fast delivery (2-3 weeks), proven, <10ms easily met, can reuse code |
| Pool Discovery | Go | I/O bound (RPC calls), Go perfect for concurrency |
| RPC Proxy | Rust | Shared by ALL services, worth investment, ideal for connection pooling |
| HTTP + gRPC | Combined in ONE service | Shared cache critical (4-7x faster), simpler deployment |
Architecture Principles
- Clean Architecture β
- Domain layer (interfaces + models)
- Service layer (business logic)
- Repository layer (data access)
- API layer (HTTP + gRPC handlers)
- Service Separation β
- Pool Discovery: Independent background job
- Quote Service: Pure calculation + serving
- RPC Proxy: Centralized RPC management
- Cache Strategy β
- Pool metadata: Redis (slow-changing, shared)
- Quote cache: In-memory (fast, instance-local)
- NO shared quote cache via Redis (defeats performance)
- Testing Strategy β
- Unit tests: >80% coverage (table-driven, mocks)
- Integration tests: Real Redis, synthetic data
- Load tests: 1000 req/s sustained
Expected Improvements
Performance Metrics
| Metric | Before | After (Clean) | After (HFT) | Improvement |
|---|---|---|---|---|
| Quote Latency (cached) | ~5ms | < 5ms | < 5ms β | Same (already fast) |
| Quote Latency (Shredstream) | N/A | N/A | < 5ms β | NEW: 300-800ms advance |
| Quote Latency (uncached) | ~200ms | < 50ms | < 50ms | 4x faster |
| NATS Event Publishing | N/A | N/A | < 1ms β | NEW: 10K events/sec |
| Throughput | 500 req/s | 10K req/s | 10K req/s β | 20x higher |
| Memory Usage | 800MB | 300MB | 350MB | 56% reduction |
| Cache Hit Rate | ~80% | ~90% | > 95% β | HFT: Critical |
HFT Pipeline Metrics (NEW)
| Stage | Service | Latency Target | Current | Status |
|---|---|---|---|---|
| Stage 0 | Quote Service | < 10ms | 5-10ms | β HFT Ready |
| Stage 1 | Scanner | < 50ms | TBD | π§ In Progress |
| Stage 2 | Planner | < 50ms | TBD | π§ In Progress |
| Stage 3 | Executor | < 90ms | TBD | π§ In Progress |
| TOTAL | End-to-End | < 200ms | 1.7s | 8.5x improvement planned |
Code Quality Metrics
| Metric | Before | After | Improvement |
|---|---|---|---|
| Lines of Code | 50K+ | 15K | 70% reduction |
| Test Coverage | ~20% | > 80% | 4x better |
| Largest File | 96KB | < 10KB | 90% reduction |
| Package Structure | Monolithic | Clean architecture | Excellent |
Maintainability Improvements
Before:
- β Adding a new DEX protocol: Touch 5+ files, 200+ lines
- β Fixing a bug: Search through 50K lines, unpredictable side effects
- β Writing tests: Requires full infrastructure (Redis, NATS, RPC)
- β Understanding code: Must read entire 96KB
cache.go
After:
- β
Adding a new DEX protocol: Implement
Protocolinterface, register in DI (50 lines) - β Fixing a bug: Isolated in one package (100-200 lines to search)
- β Writing tests: Unit tests with mocks (no infrastructure)
- β Understanding code: Read one package at a time (500-1000 lines max)
Conclusion: Building for the Future
Why This Matters
Building trading systems is not just about making it work todayβitβs about building for tomorrow. The difference between a successful system and a failed one often comes down to maintainability.
Bad architecture compounds:
- Year 1: βItβs a bit messy, but it worksβ
- Year 2: βAdding features is getting harderβ
- Year 3: βWe canβt change anything without breaking somethingβ
- Year 4: βWe need to rewrite everythingβ β Too late
Good architecture scales:
- Year 1: βClean architecture takes more time upfrontβ
- Year 2: βAdding features is still easyβ
- Year 3: βWe can refactor safely with 80% test coverageβ
- Year 4: βThe system is maintainable and growingβ β Success
The Investment
Time Required: 6 weeks
- Week 1-3: Parallel development (no disruption)
- Week 4: Canary testing (10% traffic)
- Week 5: Gradual rollout (10% β 100%)
- Week 6: Production hardening
Risk: Low (incremental, rollback-friendly)
Outcome: Production-ready, maintainable, performant quote service for the next 5+ years
The Alternative
If we donβt rewrite:
- Technical debt grows exponentially
- Adding features becomes impossible
- Bug fixes become dangerous
- Team velocity grinds to zero
- Eventually forced to rewrite under pressure (high risk)
The choice is clear: Invest 6 weeks now, or pay 10x more later.
Merry Christmas! π
As we close out 2025 and look toward 2026, Iβm excited about this architectural evolution. Building robust, maintainable systems is what separates hobby projects from production systems.
Hereβs to clean architecture, sustainable codebases, and successful trading in 2026! π
Wishing everyone a Merry Christmas and a Happy New Year! May your trades be profitable and your bugs be few! π
References
- Quote Service Rewrite Plan (docs/26-QUOTE-SERVICE-REWRITE-PLAN.md)
- Rust RPC Proxy Design (docs/25-RUST-RPC-PROXY-DESIGN.md)
- Clean Architecture by Robert C. Martin
- Go Project Layout Standards
Next Post: Quote Service Rewrite - Phase 1 Implementation (Foundation Skeleton)
Stay tuned for the journey from architectural debt to clean, maintainable code! π
