Pool Discovery Service: Real-Time Liquidity Tracking and Intelligent RPC Proxy

20 minute read

Published:

TL;DR

Today’s post explores two critical infrastructure components of the Solana trading system:

  1. Pool Discovery Service: Production-ready service discovering and tracking liquidity pools from 6 major Solana DEXes with real-time WebSocket updates
  2. Solana RPC Proxy: Intelligent proxy supporting multiple RPC vendors with automatic failover and health-based routing
  3. Three-Tier Hybrid Architecture: Optimizes for real-time updates while minimizing RPC load through WebSocket-first design with RPC backup

The pool-discovery-service implements a sophisticated WebSocket-first architecture that provides real-time pool state updates with sub-100ms latency while gracefully falling back to RPC polling when WebSocket connections are unavailable.

Pool Discovery Service Architecture

Executive Summary

The pool-discovery-service is a standalone Go service that discovers, enriches, and maintains liquidity pool data from 6 major Solana DEX protocols:

  • Raydium (AMM, CLMM, CPMM)
  • Meteora DLMM
  • PumpSwap AMM
  • Orca Whirlpool

Key capabilities include:

  • ✅ WebSocket-first pool updates via accountSubscribe
  • ✅ RPC backup polling (5-minute fallback)
  • ✅ Bidirectional token pair scanning
  • ✅ Slot number tracking for data freshness
  • ✅ NATS event publishing for downstream consumers
  • ✅ Comprehensive Prometheus metrics
  • ✅ Pool enrichment with TVL and volume data

Performance Metrics:

  • Discovery Cycle: ~30-50 seconds for 235+ pools
  • WebSocket Updates: Real-time (<100ms latency)
  • RPC Backup: Every 5 minutes when WebSocket down
  • Pool TTL: 10 minutes (configurable)

Three-Tier Hybrid Architecture

The service implements a three-tier hybrid approach optimizing for real-time updates while minimizing RPC load:

┌─────────────────────────────────────────────────────────────────┐
│  Tier 1: Initial RPC Discovery (One-Time on Startup)           │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │ MultiProtocolScanner.DiscoverAllPools()                   │  │
│  │  • Bidirectional queries (baseMint↔quoteMint)            │  │
│  │  • 6 protocols scanned in parallel (goroutines)          │  │
│  │  • Fetches current slot once per discovery run           │  │
│  │  • Returns: pools with metadata + slot number            │  │
│  └───────────────────────────────────────────────────────────┘  │
│                              ↓                                   │
│  Storage: Redis cache (pool:{dex}:{poolId}, TTL=10min)         │
└─────────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────────┐
│  Tier 2: WebSocket Subscriptions (PRIMARY - Real-Time)         │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │ WebSocketManager (subscription/websocket_manager.go)      │  │
│  │  • accountSubscribe for each discovered pool             │  │
│  │  • Receives accountNotification on pool changes          │  │
│  │  • Extracts slot from notification.params.result.context │  │
│  │  • Updates pool.Slot and pool.LastUpdated in Redis       │  │
│  │  • Publishes pool.updated event to NATS                  │  │
│  │  • Automatic reconnection with exponential backoff       │  │
│  └───────────────────────────────────────────────────────────┘  │
│                                                                  │
│  Connection: ws://localhost:3030 (Rust RPC Proxy WebSocket)    │
│  Reconnection: 1s → 2s → 4s → ... → 60s max backoff           │
└─────────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────────┐
│  Tier 3: RPC Backup Polling (FALLBACK - Every 5 Minutes)       │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │ PoolDiscoveryScheduler.startRPCBackupPolling()            │  │
│  │  • Checks WebSocket health (IsConnected())                │  │
│  │  • IF WebSocket down: run full RPC discovery             │  │
│  │  • IF WebSocket healthy: skip (rely on Tier 2)           │  │
│  │  • Updates pool.Slot from current RPC response            │  │
│  │  • Publishes pool.updated event with source="rpc_backup" │  │
│  └───────────────────────────────────────────────────────────┘  │
│                                                                  │
│  Trigger: WebSocketManager.IsConnected() == false              │
│  Interval: 5 minutes (configurable via -ws-backup-interval)    │
└─────────────────────────────────────────────────────────────────┘

Component Diagram

┌──────────────────────────────────────────────────────────────────┐
│                    pool-discovery-service                         │
│                                                                   │
│  ┌────────────────┐  ┌──────────────────┐  ┌─────────────────┐  │
│  │   Scanner      │  │   Enricher       │  │   Scheduler     │  │
│  │  (RPC Calls)   │  │   (API)          │  │  (Background)   │  │
│  └────────┬───────┘  └────────┬─────────┘  └────────┬────────┘  │
│           │                   │                      │            │
│           └───────────────────┴──────────────────────┘            │
│                              │                                    │
│                    ┌─────────▼──────────┐                        │
│                    │   Storage Layer    │                        │
│                    │  (RedisPoolWriter) │                        │
│                    └─────────┬──────────┘                        │
│                              │                                    │
│           ┌──────────────────┼──────────────────┐                │
│           │                  │                  │                │
│    ┌──────▼───────┐  ┌──────▼───────┐  ┌──────▼──────┐         │
│    │  WebSocket   │  │    NATS      │  │ Prometheus  │         │
│    │   Manager    │  │  Publisher   │  │   Metrics   │         │
│    └──────────────┘  └──────────────┘  └─────────────┘         │
└──────────────────────────────────────────────────────────────────┘
         │                    │                    │
         │                    │                    │
    ┌────▼────┐         ┌────▼─────┐        ┌────▼─────┐
    │  Rust   │         │  NATS    │        │  Redis   │
    │   RPC   │         │ JetStream│        │  Cache   │
    │  Proxy  │         │          │        │          │
    └─────────┘         └──────────┘        └──────────┘

Core Features

1. Multi-Protocol Discovery

The service supports 6 major Solana DEX protocols with parallel scanning:

ProtocolProgram IDTypeStatus
Raydium AMM675kPX9...Constant Product (x*y=k)✅ Active
Raydium CLMMCAMMCz...Concentrated Liquidity✅ Active
Raydium CPMMCPMMo...Constant Product + Dynamic Fees✅ Active
Meteora DLMMLBUZKh...Dynamic Liquidity Bins✅ Active
PumpSwap AMMpAMMBa...Bonding Curve AMM✅ Active
Orca WhirlpoolwhirL...Concentrated Liquidity✅ Active

Features:

  • Parallel scanning using goroutines (6 protocols simultaneously)
  • Bidirectional queries (forward + reverse) for complete coverage
  • Pool deduplication by ID to avoid duplicates
  • Error tracking per protocol for monitoring
  • Stats collection (pools discovered by DEX)

2. WebSocket-First Pool Updates

The WebSocketManager component provides real-time pool state updates:

Key Components:

  • accountSubscribe JSON-RPC method for pool monitoring
  • Slot-based deduplication (ignores stale updates)
  • Automatic reconnection with exponential backoff (1s → 60s)
  • Handler registration per pool for custom update logic
  • Connection health monitoring
  • Subscription count tracking for observability

WebSocket Message Flow:

Subscribe to a pool account:

{
  "jsonrpc": "2.0",
  "id": 123,
  "method": "accountSubscribe",
  "params": [
    "ANeTpNwPj4i62axSjUE4jZ7d3zvaj4sc9VFZyY1eYrtb",
    {
      "encoding": "base64",
      "commitment": "confirmed"
    }
  ]
}

Receive real-time notifications:

{
  "jsonrpc": "2.0",
  "method": "accountNotification",
  "params": {
    "result": {
      "context": {
        "slot": 283475123
      },
      "value": {
        "data": "...",
        "executable": false,
        "lamports": 2039280,
        "owner": "LBUZKhRxPF3XUpBCjp4YzTKgLccjZhTSDM9YuVaPwxo",
        "rentEpoch": 0
      }
    },
    "subscription": 456
  }
}

The slot number is extracted from the notification context to track data freshness and enable slot-based deduplication.

3. Slot Number Tracking

Problem Solved: Pool data in Redis had slot: 0 (not tracked), making it impossible to determine data freshness.

Solution Implemented:

During RPC Discovery:

  • Get slot once per discovery run via GetSlot() RPC call
  • Pass current slot to all pool conversions
  • Set pool.Slot = currentSlot on each pool

During WebSocket Updates:

  • Extract slot from notification.Params.Result.Context.Slot
  • Update pool.Slot = slot and pool.LastUpdated = time.Now().Unix()
  • Publish updated pool to Redis and NATS

Result: All pools now have accurate slot numbers for tracking data freshness and enabling cache invalidation strategies.

4. NATS Event Publishing

The service publishes events to NATS JetStream for downstream consumers:

SubjectFrequencyPayload
pool.discovery.completedEvery discovery run (~5-10 min)DiscoveryStats (totalPools, poolsByDEX, errors, duration)
pool.enrichment.completedEvery enrichment run (~1 min)EnrichmentStats (enriched, failed, duration)
pool.discovery.errorOn discovery errorsError details and context
pool.updatedOn every pool updatepoolID, slot, timestamp, source (websocket/rpc_backup)

Event Schema (pool.updated):

{
  "poolId": "ANeTpNwPj4i62axSjUE4jZ7d3zvaj4sc9VFZyY1eYrtb",
  "slot": 283475123,
  "timestamp": 1735334400,
  "source": "websocket"
}

Usage:

  • Quote-service can subscribe to pool.updated for cache invalidation
  • Scanner services can monitor pool.discovery.completed for new pools
  • Alerting can track pool.discovery.error for failures

5. Prometheus Metrics

Comprehensive metrics for monitoring:

WebSocket Metrics:

websocket_connection_status              # Gauge: 1=connected, 0=disconnected
websocket_subscriptions_total            # Gauge: Active subscriptions
websocket_updates_received_total         # Counter: Total updates
websocket_update_latency_seconds         # Histogram: Update processing latency
websocket_reconnections_total            # Counter: Reconnection attempts
websocket_reconnection_duration_seconds  # Histogram: Reconnection duration
websocket_errors_total{error_type}       # Counter: Errors by type
websocket_messages_sent_total            # Counter: Sent messages
websocket_messages_received_total        # Counter: Received messages
rpc_backup_triggered_total               # Counter: RPC backup triggers
pool_update_source_total{source}         # Counter: Updates by source
pool_cache_age_seconds{pool_id}          # Histogram: Pool data age

Discovery Metrics:

pool_discovery_runs_total
pool_discovery_duration_seconds
pool_discovery_pools_total
pool_discovery_pools_by_dex{dex}
pool_discovery_errors_total

Enrichment Metrics:

pool_enrichment_runs_total
pool_enrichment_duration_seconds
pool_enrichment_pools_enriched_total
pool_enrichment_pools_failed_total
pool_enrichment_api_requests_total{status}
pool_enrichment_api_latency_seconds

Endpoint: http://localhost:9094/metrics

6. Pool Enrichment

The enricher component fetches additional pool metadata from external APIs:

Enriched Data:

  • Total Value Locked (TVL) in USD
  • Base/Quote reserve amounts
  • Token decimals (base/quote)
  • 24h volume (if available)
  • 24h trade count (if available)

Rate Limiting:

  • 2 requests/second (respecting API rate limits)
  • Configurable via constructor
  • Batch processing (default 10 pools)

Features:

  • Optional HTTP proxy support for rate limit avoidance
  • Graceful degradation (continues if enrichment fails)
  • Separate metrics for enrichment success/failure

7. Redis Caching

Efficient storage pattern with TTL management:

Storage Pattern:

Key: pool:{dex}:{poolId}
Value: JSON-encoded Pool struct
TTL: 600 seconds (10 minutes, configurable)

Example:
pool:meteora_dlmm:ANeTpNwPj4i62axSjUE4jZ7d3zvaj4sc9VFZyY1eYrtb

Features:

  • Batch writes via Redis PIPELINE for efficiency
  • TTL refresh on WebSocket updates to keep active pools cached
  • Shared with quote-service (same repository)
  • Connection health checks for reliability

Data Flow

Initial Discovery Flow:

  1. Service starts → Initialize MultiProtocolScanner
  2. Get current slot via GetSlot() RPC call (once)
  3. For each token pair: Query forward and reverse directions
  4. Deduplicate pools by pool ID
  5. Set pool.Slot = currentSlot
  6. Enrich pools with external API (batched)
  7. Save to Redis with TTL=10min
  8. Subscribe each pool to WebSocket (accountSubscribe)
  9. Publish pool.discovery.completed to NATS

WebSocket Update Flow:

  1. WebSocket receives accountNotification
  2. Extract slot from notification.params.result.context.slot
  3. Find pool by subscription ID
  4. Update pool.Slot = slot, pool.LastUpdated = now()
  5. Save pool to Redis (refresh TTL)
  6. Publish pool.updated event to NATS (source=”websocket”)
  7. Record metrics (WebSocketUpdateLatency, PoolUpdateSource)

RPC Backup Flow:

  1. Every 5 minutes: Check WebSocket health
  2. IF WebSocket.IsConnected() == false:
    • Run full RPC discovery (same as initial)
    • Update all pool.Slot from current RPC response
    • Publish pool.updated events (source=”rpc_backup”)
    • Record metrics (RPCBackupTriggeredTotal)
  3. ELSE: Skip (WebSocket handling updates)

Observability

Logging (Loki):

  • Structured JSON logging via Zap
  • Automatic log aggregation to Loki
  • Service/environment labels for filtering
  • Async buffering (50 entries per flush)
  • Non-fatal push errors (async flush)

Metrics (Prometheus):

  • 20+ metrics exposed on :9094
  • Discovery success rate tracking
  • WebSocket connection health monitoring
  • Pool update latency histograms
  • RPC backup trigger counting
  • Error rates by type

Tracing (OpenTelemetry):

  • Distributed tracing integration
  • Span context propagation across services
  • OTLP/HTTP endpoint export
  • Grafana Tempo visualization

NATS Events:

  • Event-driven architecture
  • Real-time monitoring capabilities
  • Subscribe to pool.> for all events
  • Durable consumers for reliability

Configuration

Command-Line Flags:

pool-discovery-service \
  -rpc-proxy="http://localhost:3030" \
  -redis="redis://localhost:6379/0" \
  -nats="nats://localhost:4222" \
  -metrics-port="9094" \
  -discovery-interval=300 \
  -enrichment-interval=60 \
  -pool-ttl=600 \
  -batch-size=10 \
  -pairs="SOL/USDC,SOL/USDT,USDC/USDT" \
  -use-websocket=true \
  -ws-backup-interval=300

Key Configuration:

  • rpc-proxy: Rust RPC proxy URL (HTTP & WebSocket)
  • redis: Redis connection URL for pool caching
  • nats: NATS server URL for event publishing
  • metrics-port: Prometheus metrics port
  • discovery-interval: RPC discovery interval (seconds)
  • enrichment-interval: Enrichment interval (seconds)
  • pool-ttl: Pool TTL in Redis (seconds)
  • use-websocket: Enable WebSocket-first mode
  • ws-backup-interval: RPC backup interval when WebSocket enabled

Grafana Dashboard Monitoring

The pool-discovery-service includes a comprehensive Grafana dashboard for real-time monitoring and operational visibility.

Dashboard Overview:

Pool Discovery Service Dashboard

Key Metrics Tracked:

1. Service Health & Status

  • Service health indicator (UP/DOWN)
  • Total pools discovered
  • WebSocket connection status
  • Active subscriptions count
  • Cached pools in Redis

2. Discovery Performance

# Discovery cycle duration (p50, p95, p99)
histogram_quantile(0.95, rate(pool_discovery_duration_seconds_bucket[5m]))
histogram_quantile(0.99, rate(pool_discovery_duration_seconds_bucket[5m]))
histogram_quantile(0.50, rate(pool_discovery_duration_seconds_bucket[5m]))

# Total discovery runs
pool_discovery_runs_total

# Pools discovered by DEX protocol
pool_discovery_pools_by_dex{dex}

3. WebSocket Real-Time Updates

# Pool update rate
rate(websocket_updates_received_total[1m])

# WebSocket update latency (p95, p99)
histogram_quantile(0.95, rate(websocket_update_latency_seconds_bucket[5m]))
histogram_quantile(0.99, rate(websocket_update_latency_seconds_bucket[5m]))

# Update source distribution (WebSocket vs RPC backup)
rate(pool_update_source_total{source="websocket"}[5m])
rate(pool_update_source_total{source="rpc_backup"}[5m])

# RPC backup polling triggers
rpc_backup_triggered_total

4. Pool Enrichment Metrics

# Enrichment activity
rate(pool_enrichment_pools_enriched_total[5m])
rate(pool_enrichment_pools_failed_total[5m])

# API latency (Solscan)
histogram_quantile(0.95, rate(pool_enrichment_api_latency_seconds_bucket[5m]))

# Enrichment runs
pool_enrichment_runs_total

5. Error Tracking

# Discovery errors
pool_discovery_errors_total

# WebSocket errors by type
websocket_errors_total{error_type}

# Enrichment errors
pool_enrichment_pools_failed_total

6. RPC Scanner Performance

# Scanner latency by protocol
rpc_scanner_latency_seconds{protocol}

# Scanner requests by protocol and status
rpc_scanner_requests_total{protocol, status}

Dashboard Features:

  • Real-time updates: Live data refresh every 5 seconds
  • Time range selector: View historical trends
  • Protocol breakdown: Per-DEX pool distribution
  • Alert integration: Visual indicators for error conditions
  • Drill-down capability: Click panels for detailed views

Access Dashboard:

  • URL: http://localhost:3000/d/pool-discovery-service
  • Location: deployment/monitoring/grafana/provisioning/dashboards/pool-discovery-service-dashboard.json

Key Insights from Dashboard:

  1. WebSocket Efficiency: Monitor the ratio of WebSocket updates vs RPC backup polling to ensure WebSocket-first architecture is working correctly (target: >99% WebSocket updates)

  2. Protocol Health: Track which DEX protocols are contributing pools and identify any protocol-specific issues

  3. Enrichment Rate: Monitor enrichment API latency to ensure we’re staying within rate limits (target: <2 req/s)

  4. Cache Hit Rate: Verify pool TTL is appropriate by monitoring cache size vs total discovered pools

  5. Error Patterns: Identify systematic issues through error rate trends and WebSocket reconnection frequency

Performance Characteristics

Discovery Performance:

  • Protocols: 6 (parallel goroutines)
  • Token Pairs: 3 (SOL/USDC, SOL/USDT, USDC/USDT)
  • Scans: 36 RPC calls (6 × 3 × 2 directions)
  • Total Duration: ~30-50 seconds
  • Pools Discovered: ~235 pools

WebSocket Update Performance:

  • Connection: Persistent WebSocket to RPC proxy
  • Subscriptions: 235 accountSubscribe (one per pool)
  • Update Latency: <100ms (slot detection → Redis save → NATS publish)
  • Deduplication: O(1) map lookup by slot number
  • Memory: ~500KB for 235 subscriptions

RPC Backup Performance:

  • Trigger: Only when WebSocket disconnected
  • Interval: Every 5 minutes
  • Duration: Same as initial discovery (~30-50s)
  • Impact: Minimal (only runs when WebSocket down)

Resource Usage:

  • CPU: 0.5-1% idle, 5-10% during discovery, 1-2% WebSocket mode
  • Memory: 20-30 MB base, 40-50 MB with 235 pools, ~60 MB peak
  • Network: 1-2 MB per run (RPC), 10-50 KB/s (WebSocket), 200-500 KB/min (enrichment)

Solana RPC Proxy: Intelligent Connection Management

Overview

The Solana RPC Proxy is a high-performance Rust service providing:

  • ✅ HTTP and WebSocket support on the same port (port 3030)
  • ✅ Automatic failover and health-based routing
  • ✅ Rate limiting to prevent 429 errors
  • ✅ Load balancing across multiple RPC vendors
  • ✅ Connection pooling for optimal performance
  • ✅ Circuit breaker pattern for reliability
  • ✅ Comprehensive observability

Supported Features:

  • Same-port HTTP + WebSocket operation (client convenience)
  • Multiple RPC vendor support (Helius, Chainstack, dRPC, QuickNode, etc.)
  • Health-based endpoint selection
  • Automatic reconnection with exponential backoff
  • Prometheus metrics and OpenTelemetry tracing

Design Principles

1. Performance First

  • Target: < 50μs proxy overhead (p95)
  • Zero-copy where possible (Rust + tokio)
  • Async I/O throughout (no blocking operations)
  • Connection pooling to minimize handshake overhead

2. Reliability Over Availability

  • Fail fast on bad endpoints (circuit breaker)
  • Automatic failover to healthy endpoints
  • Rate limiting to prevent cascading failures
  • Graceful degradation under load

3. Observability Built-In

  • Structured logging with tracing context
  • Prometheus metrics for all operations
  • Distributed tracing (OpenTelemetry)
  • Health check endpoints

4. Developer Ergonomics

  • Single-port operation (HTTP + WebSocket)
  • URL derivation (http://ws:// same address)
  • Drop-in replacement for direct RPC clients
  • Zero configuration for basic usage

Architecture Goals

┌─────────────────────────────────────────────────────────────────┐
│                         CLIENT SERVICES                         │
│  (Quote Service, Pool Discovery, Scanners, Executors)          │
└────────────────────────────┬────────────────────────────────────┘
                             │
                    http://proxy:3030 (HTTP)
                     ws://proxy:3030 (WebSocket)
                             │
         ┌───────────────────▼────────────────────┐
         │     SOLANA RPC PROXY (Port 3030)       │
         │  ┌──────────────────────────────────┐  │
         │  │  Protocol Router (Same Port)     │  │
         │  │  - HTTP: JSON-RPC requests       │  │
         │  │  - WebSocket: Upgrade protocol   │  │
         │  └────────────┬─────────────────────┘  │
         │               │                         │
         │  ┌────────────▼─────────────────────┐  │
         │  │  Rate Limiter (Token Bucket)     │  │
         │  │  - Per-endpoint: 20 req/s        │  │
         │  │  - Burst capacity: 5 tokens      │  │
         │  └────────────┬─────────────────────┘  │
         │               │                         │
         │  ┌────────────▼─────────────────────┐  │
         │  │  Load Balancer (Least Conn)      │  │
         │  │  - Health-based routing          │  │
         │  │  - Automatic failover            │  │
         │  └────────────┬─────────────────────┘  │
         │               │                         │
         │  ┌────────────▼─────────────────────┐  │
         │  │  Connection Pool Manager         │  │
         │  │  - HTTP/2 connection pooling     │  │
         │  │  - WebSocket connection pooling  │  │
         │  └────────────┬─────────────────────┘  │
         │               │                         │
         │  ┌────────────▼─────────────────────┐  │
         │  │  Health Monitor & Circuit Breaker│  │
         │  │  - Adaptive backoff              │  │
         │  │  - Automatic re-enable           │  │
         │  └────────────┬─────────────────────┘  │
         └───────────────┼──────────────────────────┘
                         │
         ┌───────────────┼───────────────┐
         │               │               │
         ▼               ▼               ▼
    ┌────────┐      ┌────────┐      ┌────────┐
    │Vendor 1│      │Vendor 2│      │Vendor 3│
    │ (40x)  │      │ (40x)  │      │ (10x)  │
    └────────┘      └────────┘      └────────┘
     90+ RPC Endpoints

Key Features

Same-Port HTTP + WebSocket:

  • Single port for both protocols (port 3030)
  • Automatic protocol upgrade detection
  • Client URL derivation (http://proxy:3030ws://proxy:3030)
  • Standard pattern (AWS ALB, Nginx compatible)

Rate Limiting & 429 Prevention:

  • Per-endpoint token bucket rate limiting (20 req/s)
  • Burst capacity (5 tokens) for traffic spikes
  • Automatic failover to alternate endpoints on rate limit
  • Lock-free implementation using governor crate

Health-Based Load Balancing:

  • Least-connections algorithm for optimal distribution
  • Health-aware routing (skips unhealthy endpoints)
  • Adaptive exponential backoff (1min → 30min)
  • Automatic re-enable after backoff period
  • Round-robin tiebreaker for equal load

Connection Pooling:

  • HTTP/2 connection pooling (max 10 idle per host, 90s timeout)
  • WebSocket connection pooling (2 pre-established per endpoint)
  • RAII-based connection guards for automatic cleanup
  • Health monitoring (60s interval)

Circuit Breaker Pattern:

  • Adaptive backoff levels (1min → 3min → 7min → 15min → 30min)
  • Recovery tracking (10 successful requests reduce level)
  • Automatic re-enable after timeout
  • Error classification (rate limit vs. other errors)

Multiple RPC Vendor Support

The proxy supports multiple commercial RPC vendors for:

  • Redundancy: No single point of failure
  • Geographic distribution: Lower latency from different regions
  • Rate limit spreading: Distribute load across providers
  • Cost optimization: Mix free and paid tiers
  • Performance comparison: Monitor and compare vendor SLAs

Note on Self-Hosted RPC Nodes:

I experimented with running a self-hosted Solana RPC node at home using the Agave validator. While the node ran successfully and was fully functional, I encountered a critical bottleneck: network bandwidth.

Despite having a 500+ Mbps home broadband connection, it was insufficient to keep up with the Solana blockchain’s high throughput. The node fell behind in syncing and couldn’t maintain pace with the latest slots. This revealed an important limitation: running a production-grade Solana RPC node requires enterprise-level bandwidth (1+ Gbps) and dedicated infrastructure.

For future iterations, we may revisit self-hosted RPC nodes in a datacenter environment with adequate bandwidth and co-location near validator infrastructure. For now, the proxy’s multi-vendor approach provides the reliability and performance we need while avoiding infrastructure complexity.

Observability

Prometheus Metrics:

# Request metrics
rpc_proxy_http_requests_total{endpoint, status}
rpc_proxy_http_request_duration_seconds{endpoint}

# WebSocket metrics
rpc_proxy_ws_connections_active{endpoint}
rpc_proxy_ws_messages_total{endpoint, direction}

# Rate limiting metrics
rpc_proxy_rate_limit_checks_total{endpoint, result}
rpc_proxy_rate_limit_available_tokens{endpoint}
rpc_proxy_rate_limit_429_errors_total{endpoint}

# Load balancing metrics
rpc_proxy_endpoint_selected_total{endpoint, algorithm}
rpc_proxy_in_flight_requests{endpoint}

# Health metrics
rpc_proxy_endpoint_health{endpoint, status}
rpc_proxy_circuit_breaker_state{endpoint}

# RPC method tracking
rpc_proxy_rpc_method_calls_total{endpoint, method}

Distributed Tracing:

  • OpenTelemetry integration
  • Span context propagation
  • Parent-child relationships
  • OTLP/HTTP endpoint export
  • Grafana Tempo visualization

Structured Logging:

  • JSON log format
  • Tracing context included
  • Service/environment labels
  • Async buffering for performance

Configuration

# .env
PROXY_PORT=3030                           # Single port for HTTP + WebSocket
METRICS_PORT=9090                         # Prometheus metrics endpoint

# Rate Limiting
RATE_LIMIT_REQUESTS_PER_SECOND=20         # Per endpoint
RATE_LIMIT_BURST_CAPACITY=5               # Burst tokens

# Concurrency Control
MAX_CONCURRENT_PER_ENDPOINT=8             # Max concurrent requests

# Load Balancing
LOAD_BALANCER_ALGORITHM="least_connections"
HEALTH_CHECK_INTERVAL_SECS=60

# Circuit Breaker
CIRCUIT_BREAKER_ENABLED=true
CIRCUIT_BREAKER_FAILURE_THRESHOLD=5
CIRCUIT_BREAKER_TIMEOUT_SECS=60

# Observability
OTLP_ENDPOINT="http://localhost:4318"
LOG_LEVEL="info"

Production Impact

System Integration

The pool-discovery-service and RPC proxy work together to provide:

  1. Reliable Pool Data: Real-time updates with automatic fallback
  2. High Availability: Circuit breaker prevents cascading failures
  3. Optimal Performance: WebSocket-first minimizes latency
  4. Comprehensive Monitoring: Metrics for every component
  5. Event-Driven Architecture: NATS events enable reactive systems

Performance Benefits

Pool Discovery:

  • Real-time updates: <100ms latency (vs 5-minute polling)
  • Reduced RPC load: 99% reduction (WebSocket vs. constant polling)
  • Accurate slot tracking: Sub-second freshness detection
  • Automatic recovery: Seamless failover to RPC backup

RPC Proxy:

  • Zero 429 errors: Rate limiting prevents quota exhaustion
  • Automatic failover: Health-based routing ensures reliability
  • Low latency: <50μs proxy overhead
  • High throughput: 10,000+ req/s capacity

Operational Excellence

Monitoring:

  • 20+ Prometheus metrics per service
  • Real-time health dashboards
  • Distributed tracing for debugging
  • Structured logging for analysis

Reliability:

  • Circuit breaker pattern prevents cascading failures
  • Automatic reconnection with exponential backoff
  • Health-based routing skips failing endpoints
  • RPC backup ensures continuous operation

Scalability:

  • Stateless design enables horizontal scaling
  • Connection pooling optimizes resource usage
  • Event-driven architecture decouples services
  • Redis caching reduces backend load

Next Steps

Pool Discovery Service Improvements

Phase 1: Reliability Enhancements

  • Circuit breaker for enrichment API
  • Advanced pool data parsing (extract reserves from account data)
  • Liquidity-based filtering ($5K minimum)

Phase 2: Protocol Expansion

  • Add Lifinity, Aldrin, Saber protocols
  • Adaptive bidirectional queries (reduce redundant scans)
  • Per-pair liquidity thresholds

Phase 3: Advanced Features

  • Pool health scoring (liquidity + volume + age)
  • Historical pool tracking (TimescaleDB)
  • Anomaly detection (sudden liquidity drain)

RPC Proxy Enhancements

Phase 4: Enhanced Observability (In Progress)

  • RPC method call tracking per endpoint
  • 429 error counter with critical alerts
  • Request counts for quota monitoring
  • Health status tracking per protocol

Phase 5: Testing & Documentation

  • Integration testing (HTTP + WebSocket same port)
  • Rate limit verification (ensure 20 req/s enforced)
  • Failover testing (circuit breaker validation)
  • Configuration documentation

Conclusion

The pool-discovery-service and RPC proxy represent critical infrastructure for the Solana trading system. The three-tier hybrid architecture optimizes for real-time updates while maintaining reliability through intelligent fallback mechanisms.

Key Achievements:

  • ✅ Production-ready pool discovery with 6 DEX protocols
  • ✅ Real-time WebSocket updates (<100ms latency)
  • ✅ Intelligent RPC proxy with multi-vendor support
  • ✅ Comprehensive observability and monitoring
  • ✅ Event-driven architecture for downstream consumers
  • ✅ High reliability through circuit breaker and health tracking

The WebSocket-first approach reduces RPC load by 99% while providing sub-100ms pool state updates. The intelligent RPC proxy ensures zero 429 errors through rate limiting and automatic failover to healthy endpoints.

Together, these services form the foundation for reliable, real-time liquidity tracking essential for HFT trading strategies.



Technical Documentation


Connect


Building robust, production-grade infrastructure for Solana HFT trading. Join me on this journey of architectural excellence and system design!