Pool Discovery Service: Real-Time Liquidity Tracking and Intelligent RPC Proxy

20 minute read

Published: December 28, 2025

TL;DR

Today’s post explores two critical infrastructure components of the Solana trading system:

Pool Discovery Service: Production-ready service discovering and tracking liquidity pools from 6 major Solana DEXes with real-time WebSocket updates
Solana RPC Proxy: Intelligent proxy supporting multiple RPC vendors with automatic failover and health-based routing
Three-Tier Hybrid Architecture: Optimizes for real-time updates while minimizing RPC load through WebSocket-first design with RPC backup

The pool-discovery-service implements a sophisticated WebSocket-first architecture that provides real-time pool state updates with sub-100ms latency while gracefully falling back to RPC polling when WebSocket connections are unavailable.

Pool Discovery Service Architecture

Executive Summary

The pool-discovery-service is a standalone Go service that discovers, enriches, and maintains liquidity pool data from 6 major Solana DEX protocols:

Raydium (AMM, CLMM, CPMM)
Meteora DLMM
PumpSwap AMM
Orca Whirlpool

Key capabilities include:

✅ WebSocket-first pool updates via accountSubscribe
✅ RPC backup polling (5-minute fallback)
✅ Bidirectional token pair scanning
✅ Slot number tracking for data freshness
✅ NATS event publishing for downstream consumers
✅ Comprehensive Prometheus metrics
✅ Pool enrichment with TVL and volume data

Performance Metrics:

Discovery Cycle: ~30-50 seconds for 235+ pools
WebSocket Updates: Real-time (<100ms latency)
RPC Backup: Every 5 minutes when WebSocket down
Pool TTL: 10 minutes (configurable)

Three-Tier Hybrid Architecture

The service implements a three-tier hybrid approach optimizing for real-time updates while minimizing RPC load:

┌─────────────────────────────────────────────────────────────────┐
│  Tier 1: Initial RPC Discovery (One-Time on Startup)           │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │ MultiProtocolScanner.DiscoverAllPools()                   │  │
│  │  • Bidirectional queries (baseMint↔quoteMint)            │  │
│  │  • 6 protocols scanned in parallel (goroutines)          │  │
│  │  • Fetches current slot once per discovery run           │  │
│  │  • Returns: pools with metadata + slot number            │  │
│  └───────────────────────────────────────────────────────────┘  │
│                              ↓                                   │
│  Storage: Redis cache (pool:{dex}:{poolId}, TTL=10min)         │
└─────────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────────┐
│  Tier 2: WebSocket Subscriptions (PRIMARY - Real-Time)         │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │ WebSocketManager (subscription/websocket_manager.go)      │  │
│  │  • accountSubscribe for each discovered pool             │  │
│  │  • Receives accountNotification on pool changes          │  │
│  │  • Extracts slot from notification.params.result.context │  │
│  │  • Updates pool.Slot and pool.LastUpdated in Redis       │  │
│  │  • Publishes pool.updated event to NATS                  │  │
│  │  • Automatic reconnection with exponential backoff       │  │
│  └───────────────────────────────────────────────────────────┘  │
│                                                                  │
│  Connection: ws://localhost:3030 (Rust RPC Proxy WebSocket)    │
│  Reconnection: 1s → 2s → 4s → ... → 60s max backoff           │
└─────────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────────┐
│  Tier 3: RPC Backup Polling (FALLBACK - Every 5 Minutes)       │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │ PoolDiscoveryScheduler.startRPCBackupPolling()            │  │
│  │  • Checks WebSocket health (IsConnected())                │  │
│  │  • IF WebSocket down: run full RPC discovery             │  │
│  │  • IF WebSocket healthy: skip (rely on Tier 2)           │  │
│  │  • Updates pool.Slot from current RPC response            │  │
│  │  • Publishes pool.updated event with source="rpc_backup" │  │
│  └───────────────────────────────────────────────────────────┘  │
│                                                                  │
│  Trigger: WebSocketManager.IsConnected() == false              │
│  Interval: 5 minutes (configurable via -ws-backup-interval)    │
└─────────────────────────────────────────────────────────────────┘

Component Diagram

┌──────────────────────────────────────────────────────────────────┐
│                    pool-discovery-service                         │
│                                                                   │
│  ┌────────────────┐  ┌──────────────────┐  ┌─────────────────┐  │
│  │   Scanner      │  │   Enricher       │  │   Scheduler     │  │
│  │  (RPC Calls)   │  │   (API)          │  │  (Background)   │  │
│  └────────┬───────┘  └────────┬─────────┘  └────────┬────────┘  │
│           │                   │                      │            │
│           └───────────────────┴──────────────────────┘            │
│                              │                                    │
│                    ┌─────────▼──────────┐                        │
│                    │   Storage Layer    │                        │
│                    │  (RedisPoolWriter) │                        │
│                    └─────────┬──────────┘                        │
│                              │                                    │
│           ┌──────────────────┼──────────────────┐                │
│           │                  │                  │                │
│    ┌──────▼───────┐  ┌──────▼───────┐  ┌──────▼──────┐         │
│    │  WebSocket   │  │    NATS      │  │ Prometheus  │         │
│    │   Manager    │  │  Publisher   │  │   Metrics   │         │
│    └──────────────┘  └──────────────┘  └─────────────┘         │
└──────────────────────────────────────────────────────────────────┘
         │                    │                    │
         │                    │                    │
    ┌────▼────┐         ┌────▼─────┐        ┌────▼─────┐
    │  Rust   │         │  NATS    │        │  Redis   │
    │   RPC   │         │ JetStream│        │  Cache   │
    │  Proxy  │         │          │        │          │
    └─────────┘         └──────────┘        └──────────┘

Core Features

1. Multi-Protocol Discovery

The service supports 6 major Solana DEX protocols with parallel scanning:

Protocol	Program ID	Type	Status
Raydium AMM	`675kPX9...`	Constant Product (x*y=k)	✅ Active
Raydium CLMM	`CAMMCz...`	Concentrated Liquidity	✅ Active
Raydium CPMM	`CPMMo...`	Constant Product + Dynamic Fees	✅ Active
Meteora DLMM	`LBUZKh...`	Dynamic Liquidity Bins	✅ Active
PumpSwap AMM	`pAMMBa...`	Bonding Curve AMM	✅ Active
Orca Whirlpool	`whirL...`	Concentrated Liquidity	✅ Active

Features:

Parallel scanning using goroutines (6 protocols simultaneously)
Bidirectional queries (forward + reverse) for complete coverage
Pool deduplication by ID to avoid duplicates
Error tracking per protocol for monitoring
Stats collection (pools discovered by DEX)

2. WebSocket-First Pool Updates

The WebSocketManager component provides real-time pool state updates:

Key Components:

accountSubscribe JSON-RPC method for pool monitoring
Slot-based deduplication (ignores stale updates)
Automatic reconnection with exponential backoff (1s → 60s)
Handler registration per pool for custom update logic
Connection health monitoring
Subscription count tracking for observability

WebSocket Message Flow:

Subscribe to a pool account:

{
  "jsonrpc": "2.0",
  "id": 123,
  "method": "accountSubscribe",
  "params": [
    "ANeTpNwPj4i62axSjUE4jZ7d3zvaj4sc9VFZyY1eYrtb",
    {
      "encoding": "base64",
      "commitment": "confirmed"
    }
  ]
}

Receive real-time notifications:

{
  "jsonrpc": "2.0",
  "method": "accountNotification",
  "params": {
    "result": {
      "context": {
        "slot": 283475123
      },
      "value": {
        "data": "...",
        "executable": false,
        "lamports": 2039280,
        "owner": "LBUZKhRxPF3XUpBCjp4YzTKgLccjZhTSDM9YuVaPwxo",
        "rentEpoch": 0
      }
    },
    "subscription": 456
  }
}

The slot number is extracted from the notification context to track data freshness and enable slot-based deduplication.

3. Slot Number Tracking

Problem Solved: Pool data in Redis had slot: 0 (not tracked), making it impossible to determine data freshness.

Solution Implemented:

During RPC Discovery:

Get slot once per discovery run via GetSlot() RPC call
Pass current slot to all pool conversions
Set pool.Slot = currentSlot on each pool

During WebSocket Updates:

Extract slot from notification.Params.Result.Context.Slot
Update pool.Slot = slot and pool.LastUpdated = time.Now().Unix()
Publish updated pool to Redis and NATS

Result: All pools now have accurate slot numbers for tracking data freshness and enabling cache invalidation strategies.

4. NATS Event Publishing

The service publishes events to NATS JetStream for downstream consumers:

Subject	Frequency	Payload
`pool.discovery.completed`	Every discovery run (~5-10 min)	DiscoveryStats (totalPools, poolsByDEX, errors, duration)
`pool.enrichment.completed`	Every enrichment run (~1 min)	EnrichmentStats (enriched, failed, duration)
`pool.discovery.error`	On discovery errors	Error details and context
`pool.updated`	On every pool update	poolID, slot, timestamp, source (websocket/rpc_backup)

Event Schema (pool.updated):

{
  "poolId": "ANeTpNwPj4i62axSjUE4jZ7d3zvaj4sc9VFZyY1eYrtb",
  "slot": 283475123,
  "timestamp": 1735334400,
  "source": "websocket"
}

Usage:

Quote-service can subscribe to pool.updated for cache invalidation
Scanner services can monitor pool.discovery.completed for new pools
Alerting can track pool.discovery.error for failures

5. Prometheus Metrics

Comprehensive metrics for monitoring:

WebSocket Metrics:

websocket_connection_status              # Gauge: 1=connected, 0=disconnected
websocket_subscriptions_total            # Gauge: Active subscriptions
websocket_updates_received_total         # Counter: Total updates
websocket_update_latency_seconds         # Histogram: Update processing latency
websocket_reconnections_total            # Counter: Reconnection attempts
websocket_reconnection_duration_seconds  # Histogram: Reconnection duration
websocket_errors_total{error_type}       # Counter: Errors by type
websocket_messages_sent_total            # Counter: Sent messages
websocket_messages_received_total        # Counter: Received messages
rpc_backup_triggered_total               # Counter: RPC backup triggers
pool_update_source_total{source}         # Counter: Updates by source
pool_cache_age_seconds{pool_id}          # Histogram: Pool data age

Discovery Metrics:

pool_discovery_runs_total
pool_discovery_duration_seconds
pool_discovery_pools_total
pool_discovery_pools_by_dex{dex}
pool_discovery_errors_total

Enrichment Metrics:

pool_enrichment_runs_total
pool_enrichment_duration_seconds
pool_enrichment_pools_enriched_total
pool_enrichment_pools_failed_total
pool_enrichment_api_requests_total{status}
pool_enrichment_api_latency_seconds

Endpoint: http://localhost:9094/metrics

6. Pool Enrichment

The enricher component fetches additional pool metadata from external APIs:

Enriched Data:

Total Value Locked (TVL) in USD
Base/Quote reserve amounts
Token decimals (base/quote)
24h volume (if available)
24h trade count (if available)

Rate Limiting:

2 requests/second (respecting API rate limits)
Configurable via constructor
Batch processing (default 10 pools)

Features:

Optional HTTP proxy support for rate limit avoidance
Graceful degradation (continues if enrichment fails)
Separate metrics for enrichment success/failure

7. Redis Caching

Efficient storage pattern with TTL management:

Storage Pattern:

Key: pool:{dex}:{poolId}
Value: JSON-encoded Pool struct
TTL: 600 seconds (10 minutes, configurable)

Example:
pool:meteora_dlmm:ANeTpNwPj4i62axSjUE4jZ7d3zvaj4sc9VFZyY1eYrtb

Features:

Batch writes via Redis PIPELINE for efficiency
TTL refresh on WebSocket updates to keep active pools cached
Shared with quote-service (same repository)
Connection health checks for reliability

Data Flow

Initial Discovery Flow:

Service starts → Initialize MultiProtocolScanner
Get current slot via GetSlot() RPC call (once)
For each token pair: Query forward and reverse directions
Deduplicate pools by pool ID
Set pool.Slot = currentSlot
Enrich pools with external API (batched)
Save to Redis with TTL=10min
Subscribe each pool to WebSocket (accountSubscribe)
Publish pool.discovery.completed to NATS

WebSocket Update Flow:

WebSocket receives accountNotification
Extract slot from notification.params.result.context.slot
Find pool by subscription ID
Update pool.Slot = slot, pool.LastUpdated = now()
Save pool to Redis (refresh TTL)
Publish pool.updated event to NATS (source=”websocket”)
Record metrics (WebSocketUpdateLatency, PoolUpdateSource)

RPC Backup Flow:

Every 5 minutes: Check WebSocket health
IF WebSocket.IsConnected() == false:
- Run full RPC discovery (same as initial)
- Update all pool.Slot from current RPC response
- Publish pool.updated events (source=”rpc_backup”)
- Record metrics (RPCBackupTriggeredTotal)
ELSE: Skip (WebSocket handling updates)

Observability

Logging (Loki):

Structured JSON logging via Zap
Automatic log aggregation to Loki
Service/environment labels for filtering
Async buffering (50 entries per flush)
Non-fatal push errors (async flush)

Metrics (Prometheus):

20+ metrics exposed on :9094
Discovery success rate tracking
WebSocket connection health monitoring
Pool update latency histograms
RPC backup trigger counting
Error rates by type

Tracing (OpenTelemetry):

Distributed tracing integration
Span context propagation across services
OTLP/HTTP endpoint export
Grafana Tempo visualization

NATS Events:

Event-driven architecture
Real-time monitoring capabilities
Subscribe to pool.> for all events
Durable consumers for reliability

Configuration

Command-Line Flags:

pool-discovery-service \
  -rpc-proxy="http://localhost:3030" \
  -redis="redis://localhost:6379/0" \
  -nats="nats://localhost:4222" \
  -metrics-port="9094" \
  -discovery-interval=300 \
  -enrichment-interval=60 \
  -pool-ttl=600 \
  -batch-size=10 \
  -pairs="SOL/USDC,SOL/USDT,USDC/USDT" \
  -use-websocket=true \
  -ws-backup-interval=300

Key Configuration:

rpc-proxy: Rust RPC proxy URL (HTTP & WebSocket)
redis: Redis connection URL for pool caching
nats: NATS server URL for event publishing
metrics-port: Prometheus metrics port
discovery-interval: RPC discovery interval (seconds)
enrichment-interval: Enrichment interval (seconds)
pool-ttl: Pool TTL in Redis (seconds)
use-websocket: Enable WebSocket-first mode
ws-backup-interval: RPC backup interval when WebSocket enabled

Grafana Dashboard Monitoring

The pool-discovery-service includes a comprehensive Grafana dashboard for real-time monitoring and operational visibility.

Dashboard Overview:

Pool Discovery Service Dashboard

Key Metrics Tracked:

1. Service Health & Status

Service health indicator (UP/DOWN)
Total pools discovered
WebSocket connection status
Active subscriptions count
Cached pools in Redis

2. Discovery Performance

# Discovery cycle duration (p50, p95, p99)
histogram_quantile(0.95, rate(pool_discovery_duration_seconds_bucket[5m]))
histogram_quantile(0.99, rate(pool_discovery_duration_seconds_bucket[5m]))
histogram_quantile(0.50, rate(pool_discovery_duration_seconds_bucket[5m]))

# Total discovery runs
pool_discovery_runs_total

# Pools discovered by DEX protocol
pool_discovery_pools_by_dex{dex}

3. WebSocket Real-Time Updates

# Pool update rate
rate(websocket_updates_received_total[1m])

# WebSocket update latency (p95, p99)
histogram_quantile(0.95, rate(websocket_update_latency_seconds_bucket[5m]))
histogram_quantile(0.99, rate(websocket_update_latency_seconds_bucket[5m]))

# Update source distribution (WebSocket vs RPC backup)
rate(pool_update_source_total{source="websocket"}[5m])
rate(pool_update_source_total{source="rpc_backup"}[5m])

# RPC backup polling triggers
rpc_backup_triggered_total

4. Pool Enrichment Metrics

# Enrichment activity
rate(pool_enrichment_pools_enriched_total[5m])
rate(pool_enrichment_pools_failed_total[5m])

# API latency (Solscan)
histogram_quantile(0.95, rate(pool_enrichment_api_latency_seconds_bucket[5m]))

# Enrichment runs
pool_enrichment_runs_total

5. Error Tracking

# Discovery errors
pool_discovery_errors_total

# WebSocket errors by type
websocket_errors_total{error_type}

# Enrichment errors
pool_enrichment_pools_failed_total

6. RPC Scanner Performance

# Scanner latency by protocol
rpc_scanner_latency_seconds{protocol}

# Scanner requests by protocol and status
rpc_scanner_requests_total{protocol, status}

Dashboard Features:

Real-time updates: Live data refresh every 5 seconds
Time range selector: View historical trends
Protocol breakdown: Per-DEX pool distribution
Alert integration: Visual indicators for error conditions
Drill-down capability: Click panels for detailed views

Access Dashboard:

URL: http://localhost:3000/d/pool-discovery-service
Location: deployment/monitoring/grafana/provisioning/dashboards/pool-discovery-service-dashboard.json

Key Insights from Dashboard:

WebSocket Efficiency: Monitor the ratio of WebSocket updates vs RPC backup polling to ensure WebSocket-first architecture is working correctly (target: >99% WebSocket updates)
Protocol Health: Track which DEX protocols are contributing pools and identify any protocol-specific issues
Enrichment Rate: Monitor enrichment API latency to ensure we’re staying within rate limits (target: <2 req/s)
Cache Hit Rate: Verify pool TTL is appropriate by monitoring cache size vs total discovered pools
Error Patterns: Identify systematic issues through error rate trends and WebSocket reconnection frequency

Performance Characteristics

Discovery Performance:

Protocols: 6 (parallel goroutines)
Token Pairs: 3 (SOL/USDC, SOL/USDT, USDC/USDT)
Scans: 36 RPC calls (6 × 3 × 2 directions)
Total Duration: ~30-50 seconds
Pools Discovered: ~235 pools

WebSocket Update Performance:

Connection: Persistent WebSocket to RPC proxy
Subscriptions: 235 accountSubscribe (one per pool)
Update Latency: <100ms (slot detection → Redis save → NATS publish)
Deduplication: O(1) map lookup by slot number
Memory: ~500KB for 235 subscriptions

RPC Backup Performance:

Trigger: Only when WebSocket disconnected
Interval: Every 5 minutes
Duration: Same as initial discovery (~30-50s)
Impact: Minimal (only runs when WebSocket down)

Resource Usage:

CPU: 0.5-1% idle, 5-10% during discovery, 1-2% WebSocket mode
Memory: 20-30 MB base, 40-50 MB with 235 pools, ~60 MB peak
Network: 1-2 MB per run (RPC), 10-50 KB/s (WebSocket), 200-500 KB/min (enrichment)

Solana RPC Proxy: Intelligent Connection Management

Overview

The Solana RPC Proxy is a high-performance Rust service providing:

✅ HTTP and WebSocket support on the same port (port 3030)
✅ Automatic failover and health-based routing
✅ Rate limiting to prevent 429 errors
✅ Load balancing across multiple RPC vendors
✅ Connection pooling for optimal performance
✅ Circuit breaker pattern for reliability
✅ Comprehensive observability

Supported Features:

Same-port HTTP + WebSocket operation (client convenience)
Multiple RPC vendor support (Helius, Chainstack, dRPC, QuickNode, etc.)
Health-based endpoint selection
Automatic reconnection with exponential backoff
Prometheus metrics and OpenTelemetry tracing

Design Principles

1. Performance First

Target: < 50μs proxy overhead (p95)
Zero-copy where possible (Rust + tokio)
Async I/O throughout (no blocking operations)
Connection pooling to minimize handshake overhead

2. Reliability Over Availability

Fail fast on bad endpoints (circuit breaker)
Automatic failover to healthy endpoints
Rate limiting to prevent cascading failures
Graceful degradation under load

3. Observability Built-In

Structured logging with tracing context
Prometheus metrics for all operations
Distributed tracing (OpenTelemetry)
Health check endpoints

4. Developer Ergonomics

Single-port operation (HTTP + WebSocket)
URL derivation (http:// → ws:// same address)
Drop-in replacement for direct RPC clients
Zero configuration for basic usage

Architecture Goals

┌─────────────────────────────────────────────────────────────────┐
│                         CLIENT SERVICES                         │
│  (Quote Service, Pool Discovery, Scanners, Executors)          │
└────────────────────────────┬────────────────────────────────────┘
                             │
                    http://proxy:3030 (HTTP)
                     ws://proxy:3030 (WebSocket)
                             │
         ┌───────────────────▼────────────────────┐
         │     SOLANA RPC PROXY (Port 3030)       │
         │  ┌──────────────────────────────────┐  │
         │  │  Protocol Router (Same Port)     │  │
         │  │  - HTTP: JSON-RPC requests       │  │
         │  │  - WebSocket: Upgrade protocol   │  │
         │  └────────────┬─────────────────────┘  │
         │               │                         │
         │  ┌────────────▼─────────────────────┐  │
         │  │  Rate Limiter (Token Bucket)     │  │
         │  │  - Per-endpoint: 20 req/s        │  │
         │  │  - Burst capacity: 5 tokens      │  │
         │  └────────────┬─────────────────────┘  │
         │               │                         │
         │  ┌────────────▼─────────────────────┐  │
         │  │  Load Balancer (Least Conn)      │  │
         │  │  - Health-based routing          │  │
         │  │  - Automatic failover            │  │
         │  └────────────┬─────────────────────┘  │
         │               │                         │
         │  ┌────────────▼─────────────────────┐  │
         │  │  Connection Pool Manager         │  │
         │  │  - HTTP/2 connection pooling     │  │
         │  │  - WebSocket connection pooling  │  │
         │  └────────────┬─────────────────────┘  │
         │               │                         │
         │  ┌────────────▼─────────────────────┐  │
         │  │  Health Monitor & Circuit Breaker│  │
         │  │  - Adaptive backoff              │  │
         │  │  - Automatic re-enable           │  │
         │  └────────────┬─────────────────────┘  │
         └───────────────┼──────────────────────────┘
                         │
         ┌───────────────┼───────────────┐
         │               │               │
         ▼               ▼               ▼
    ┌────────┐      ┌────────┐      ┌────────┐
    │Vendor 1│      │Vendor 2│      │Vendor 3│
    │ (40x)  │      │ (40x)  │      │ (10x)  │
    └────────┘      └────────┘      └────────┘
     90+ RPC Endpoints

Key Features

Same-Port HTTP + WebSocket:

Single port for both protocols (port 3030)
Automatic protocol upgrade detection
Client URL derivation (http://proxy:3030 → ws://proxy:3030)
Standard pattern (AWS ALB, Nginx compatible)

Rate Limiting & 429 Prevention:

Per-endpoint token bucket rate limiting (20 req/s)
Burst capacity (5 tokens) for traffic spikes
Automatic failover to alternate endpoints on rate limit
Lock-free implementation using governor crate

Health-Based Load Balancing:

Least-connections algorithm for optimal distribution
Health-aware routing (skips unhealthy endpoints)
Adaptive exponential backoff (1min → 30min)
Automatic re-enable after backoff period
Round-robin tiebreaker for equal load

Connection Pooling:

HTTP/2 connection pooling (max 10 idle per host, 90s timeout)
WebSocket connection pooling (2 pre-established per endpoint)
RAII-based connection guards for automatic cleanup
Health monitoring (60s interval)

Circuit Breaker Pattern:

Adaptive backoff levels (1min → 3min → 7min → 15min → 30min)
Recovery tracking (10 successful requests reduce level)
Automatic re-enable after timeout
Error classification (rate limit vs. other errors)

Multiple RPC Vendor Support

The proxy supports multiple commercial RPC vendors for:

Redundancy: No single point of failure
Geographic distribution: Lower latency from different regions
Rate limit spreading: Distribute load across providers
Cost optimization: Mix free and paid tiers
Performance comparison: Monitor and compare vendor SLAs

Note on Self-Hosted RPC Nodes:

I experimented with running a self-hosted Solana RPC node at home using the Agave validator. While the node ran successfully and was fully functional, I encountered a critical bottleneck: network bandwidth.

Despite having a 500+ Mbps home broadband connection, it was insufficient to keep up with the Solana blockchain’s high throughput. The node fell behind in syncing and couldn’t maintain pace with the latest slots. This revealed an important limitation: running a production-grade Solana RPC node requires enterprise-level bandwidth (1+ Gbps) and dedicated infrastructure.

For future iterations, we may revisit self-hosted RPC nodes in a datacenter environment with adequate bandwidth and co-location near validator infrastructure. For now, the proxy’s multi-vendor approach provides the reliability and performance we need while avoiding infrastructure complexity.

Observability

Prometheus Metrics:

# Request metrics
rpc_proxy_http_requests_total{endpoint, status}
rpc_proxy_http_request_duration_seconds{endpoint}

# WebSocket metrics
rpc_proxy_ws_connections_active{endpoint}
rpc_proxy_ws_messages_total{endpoint, direction}

# Rate limiting metrics
rpc_proxy_rate_limit_checks_total{endpoint, result}
rpc_proxy_rate_limit_available_tokens{endpoint}
rpc_proxy_rate_limit_429_errors_total{endpoint}

# Load balancing metrics
rpc_proxy_endpoint_selected_total{endpoint, algorithm}
rpc_proxy_in_flight_requests{endpoint}

# Health metrics
rpc_proxy_endpoint_health{endpoint, status}
rpc_proxy_circuit_breaker_state{endpoint}

# RPC method tracking
rpc_proxy_rpc_method_calls_total{endpoint, method}

Distributed Tracing:

OpenTelemetry integration
Span context propagation
Parent-child relationships
OTLP/HTTP endpoint export
Grafana Tempo visualization

Structured Logging:

JSON log format
Tracing context included
Service/environment labels
Async buffering for performance

Configuration

# .env
PROXY_PORT=3030                           # Single port for HTTP + WebSocket
METRICS_PORT=9090                         # Prometheus metrics endpoint

# Rate Limiting
RATE_LIMIT_REQUESTS_PER_SECOND=20         # Per endpoint
RATE_LIMIT_BURST_CAPACITY=5               # Burst tokens

# Concurrency Control
MAX_CONCURRENT_PER_ENDPOINT=8             # Max concurrent requests

# Load Balancing
LOAD_BALANCER_ALGORITHM="least_connections"
HEALTH_CHECK_INTERVAL_SECS=60

# Circuit Breaker
CIRCUIT_BREAKER_ENABLED=true
CIRCUIT_BREAKER_FAILURE_THRESHOLD=5
CIRCUIT_BREAKER_TIMEOUT_SECS=60

# Observability
OTLP_ENDPOINT="http://localhost:4318"
LOG_LEVEL="info"

Production Impact

System Integration

The pool-discovery-service and RPC proxy work together to provide:

Reliable Pool Data: Real-time updates with automatic fallback
High Availability: Circuit breaker prevents cascading failures
Optimal Performance: WebSocket-first minimizes latency
Comprehensive Monitoring: Metrics for every component
Event-Driven Architecture: NATS events enable reactive systems

Performance Benefits

Pool Discovery:

Real-time updates: <100ms latency (vs 5-minute polling)
Reduced RPC load: 99% reduction (WebSocket vs. constant polling)
Accurate slot tracking: Sub-second freshness detection
Automatic recovery: Seamless failover to RPC backup

RPC Proxy:

Zero 429 errors: Rate limiting prevents quota exhaustion
Automatic failover: Health-based routing ensures reliability
Low latency: <50μs proxy overhead
High throughput: 10,000+ req/s capacity

Operational Excellence

Monitoring:

20+ Prometheus metrics per service
Real-time health dashboards
Distributed tracing for debugging
Structured logging for analysis

Reliability:

Circuit breaker pattern prevents cascading failures
Automatic reconnection with exponential backoff
Health-based routing skips failing endpoints
RPC backup ensures continuous operation

Scalability:

Stateless design enables horizontal scaling
Connection pooling optimizes resource usage
Event-driven architecture decouples services
Redis caching reduces backend load

Next Steps

Pool Discovery Service Improvements

Phase 1: Reliability Enhancements

Circuit breaker for enrichment API
Advanced pool data parsing (extract reserves from account data)
Liquidity-based filtering ($5K minimum)

Phase 2: Protocol Expansion

Add Lifinity, Aldrin, Saber protocols
Adaptive bidirectional queries (reduce redundant scans)
Per-pair liquidity thresholds

Phase 3: Advanced Features

Pool health scoring (liquidity + volume + age)
Historical pool tracking (TimescaleDB)
Anomaly detection (sudden liquidity drain)

RPC Proxy Enhancements

Phase 4: Enhanced Observability (In Progress)

RPC method call tracking per endpoint
429 error counter with critical alerts
Request counts for quota monitoring
Health status tracking per protocol

Phase 5: Testing & Documentation

Integration testing (HTTP + WebSocket same port)
Rate limit verification (ensure 20 req/s enforced)
Failover testing (circuit breaker validation)
Configuration documentation

Conclusion

The pool-discovery-service and RPC proxy represent critical infrastructure for the Solana trading system. The three-tier hybrid architecture optimizes for real-time updates while maintaining reliability through intelligent fallback mechanisms.

Key Achievements:

✅ Production-ready pool discovery with 6 DEX protocols
✅ Real-time WebSocket updates (<100ms latency)
✅ Intelligent RPC proxy with multi-vendor support
✅ Comprehensive observability and monitoring
✅ Event-driven architecture for downstream consumers
✅ High reliability through circuit breaker and health tracking

The WebSocket-first approach reduces RPC load by 99% while providing sub-100ms pool state updates. The intelligent RPC proxy ensures zero 429 errors through rate limiting and automatic failover to healthy endpoints.

Together, these services form the foundation for reliable, real-time liquidity tracking essential for HFT trading strategies.

Technical Documentation

Connect

GitHub: @guidebee
LinkedIn: James Shen

Building robust, production-grade infrastructure for Solana HFT trading. Join me on this journey of architectural excellence and system design!

Share on

Bluesky Facebook LinkedIn Mastodon X (formerly Twitter)

James Shen

TL;DR

Pool Discovery Service Architecture

Executive Summary

Three-Tier Hybrid Architecture

Component Diagram

Core Features

1. Multi-Protocol Discovery

2. WebSocket-First Pool Updates

3. Slot Number Tracking

4. NATS Event Publishing

5. Prometheus Metrics

6. Pool Enrichment

7. Redis Caching

Data Flow

Observability

Configuration

Grafana Dashboard Monitoring

Performance Characteristics

Solana RPC Proxy: Intelligent Connection Management

Overview

Design Principles

Architecture Goals

Key Features

Multiple RPC Vendor Support

Observability

Configuration

Production Impact

System Integration

Performance Benefits

Operational Excellence

Next Steps

Pool Discovery Service Improvements

RPC Proxy Enhancements

Conclusion

Related Posts

Technical Documentation

Connect

Share on

You May Also Enjoy

Token Configuration Overhaul: Pruning 9 Dead LSTs and Adding Extra Token Pairs

Scanner Service Production Validation: 9.4M Quotes, 106-Hour Continuous Run, and Multi-DEX Arbitrage Signal Detection

OpenClaw Beyond Trading Bots: AI-Assisted China Stock Data Retrieval and Analysis

OpenClaw: AI-Powered Monitoring for My Solana HFT Trading Bot