Production System Architecture Plan

Overview

This document outlines the architecture for a production-grade Solana trading system based on the scanner → planner → executor pattern, incorporating lessons learned from both prototype systems.

High-Level Architecture

┌─────────────────────────────────────────────────────────────────┐
│                         MONITORS                                 │
│  (Prometheus, Grafana, Jaeger, Loki, Custom Metrics)            │
└─────────────────────────────────────────────────────────────────┘
                              ↓ observability
┌─────────────────────────────────────────────────────────────────┐
│                        PREPARERS                                 │
│  (Wallet Management, Balance Sync, Market Data Cache)           │
└─────────────────────────────────────────────────────────────────┘
                              ↓ initialization
┌──────────────┐      ┌──────────────┐      ┌──────────────────┐
│   SCANNERS   │ ───→ │   PLANNERS   │ ───→ │    EXECUTORS    │
│              │      │              │      │                  │
│ - Market     │      │ - Arbitrage  │      │ - Jito Bundles  │
│   Scanners   │      │ - Grid Trade │      │ - TPU Direct    │
│ - Account    │      │ - DCA        │      │ - Transaction   │
│   Watchers   │      │ - AI Analysis│      │   Confirmation  │
│ - Price Feed │      │ - Quote      │      │ - Error Handler │
│ - Volume     │      │   Optimizer  │      │                  │
│   Monitor    │      │              │      │                  │
└──────────────┘      └──────────────┘      └──────────────────┘
       ↓                     ↓                       ↓
┌─────────────────────────────────────────────────────────────────┐
│                    EVENT BUS (NATS JetStream)                    │
│  Topics: market.events, trade.opportunities, execution.orders   │
└─────────────────────────────────────────────────────────────────┘
       ↓                     ↓                       ↓
┌─────────────────────────────────────────────────────────────────┐
│                         DATA LAYER                               │
│  Redis (hot data) | PostgreSQL (persistent) | TimescaleDB (time)│
└─────────────────────────────────────────────────────────────────┘

System Components

1. SCANNERS (Data Acquisition Layer)

Purpose: Monitor blockchain state and market conditions to detect trading opportunities.

1.1 Market Event Scanner (TypeScript)

Technology: TypeScript + Solana Web3.js Input: Blockchain events via Shredstream or WebSocket Output: NATS events to market.events.*

Responsibilities:

Subscribe to account changes (filtered by relevant addresses)
Monitor DEX pool state changes (Raydium, Meteora, etc.)
Detect large transactions and unusual volume
Track token price movements
Emit normalized events to event bus

Key Features:

Multiple subscription modes (Shredstream primary, WebSocket backup)
Client-side filtering to reduce processing overhead (99% reduction)
Event deduplication via Redis
Latency tracking (target: 100-200ms)

Configuration:

{
  subscriptionMode: "shredstream" | "websocket" | "rpc-polling",
  accountsToWatch: Address[],
  eventBufferSize: 1000,
  dedupWindow: 5000 // ms
}

1.2 Price Feed Scanner (Go)

Technology: Go + concurrent workers Input: DEX pool state, oracle feeds Output: NATS events to market.prices.*

Responsibilities:

Real-time price aggregation across multiple DEXes
Oracle price fetching (Pyth, Switchboard)
Spread calculation between venues
Price impact estimation for various amounts

Key Features:

Concurrent pool queries (goroutines per protocol)
Sub-10ms response time for cached prices
5-minute TTL for price cache
Automatic failover to backup data sources

1.3 Volume Monitor (TypeScript)

Technology: TypeScript + database queries Input: Historical transaction data Output: NATS events to market.volume.*

Responsibilities:

Track 24h trading volume per token
Identify volume spikes (potential opportunities)
Calculate average trade size
Monitor liquidity changes

Key Features:

12-hour cache in Redis
Background refresh every hour
Anomaly detection for volume spikes

1.4 Wallet Balance Scanner (TypeScript)

Technology: TypeScript + RPC batch queries Input: Configured wallet addresses Output: Internal state + alerts

Responsibilities:

Monitor all managed wallet balances
Compare expected vs. actual balances
Trigger rebalancing when thresholds exceeded
Alert on unexpected balance changes

Key Features:

Batch RPC calls to minimize requests
Expected balance validation
Automatic rebalancing triggers

2. PLANNERS (Strategy & Decision Layer)

Purpose: Analyze scanner data to identify profitable trades and create execution plans.

2.1 Arbitrage Planner (TypeScript)

Technology: TypeScript + business logic Input: NATS market.prices.*, market.events.* Output: NATS trade.opportunities.arbitrage

Responsibilities:

Receive price update events from scanners
Calculate profit potential: outAmount - inAmount - fees - tips
Verify profitability threshold (configurable per wallet tier)
Generate swap route plan with flash loan wrapping
Emit trade opportunity if profitable

Strategy Logic:

for each (tokenA, tokenB) pair:
  quoteSell = getQuote(tokenA → tokenB, amount)
  quoteBuy = getQuote(tokenB → tokenA, quoteSell.outAmount)

  profit = quoteBuy.outAmount - amount - flashLoanFee - jitoTip

  if profit > threshold:
    emit TradeOpportunity {
      strategy: "arbitrage",
      route: [sellRoute, buyRoute],
      expectedProfit: profit,
      priority: calculatePriority(profit, latency)
    }

Key Features:

Hybrid quoting (SolRoute primary, Jupiter fallback)
Rate limiting (60 Jupiter API calls/min)
Route template caching with hash-based deduplication
Dynamic profit threshold adjustment

2.2 Grid Trading Planner (TypeScript)

Technology: TypeScript + order book management Input: NATS market.prices.* Output: NATS trade.opportunities.grid

Responsibilities:

Maintain grid order book (buy/sell levels)
Monitor price movements against grid
Trigger buy orders when price drops to grid level
Trigger sell orders when price rises to grid level
Calculate P&L and rebalance grid

Strategy Logic:

gridLevels = generateGridLevels(basePrice, gridCount, spacing)

for each priceUpdate:
  for each gridLevel:
    if currentPrice <= gridLevel.buyPrice && !gridLevel.buyFilled:
      emit BuyOrder(gridLevel)

    if currentPrice >= gridLevel.sellPrice && !gridLevel.sellFilled:
      emit SellOrder(gridLevel)

Key Features:

Configurable grid spacing (percentage or fixed)
Order TTL management (default 12 hours)
Automatic grid rebalancing on price moves
Split orders for large sizes

2.3 DCA Planner (TypeScript)

Technology: TypeScript + time-based triggers Input: Time intervals + market.prices.* Output: NATS trade.opportunities.dca

Responsibilities:

Schedule recurring buy/sell orders
Average entry price calculation
Position size management
Stop-loss/take-profit monitoring

Strategy Logic:

every interval (e.g., 1 hour):
  if shouldExecute(token, currentPrice, constraints):
    emit BuyOrder({
      amount: calculateDCAAmount(position, budget),
      maxPriceImpact: 0.5%,
      urgency: "low"
    })

Key Features:

Configurable intervals (minutes to days)
Price limit orders (buy only below X)
Position size limits
Automatic position tracking

2.4 AI Analysis Planner (TypeScript)

Technology: TypeScript + OpenAI API Input: Chart data, technical indicators Output: NATS trade.opportunities.ai

Responsibilities:

Generate TradingView chart screenshots
Send to ChatGPT for analysis
Parse AI recommendations
Convert to actionable trade signals
Weight recommendations with other signals

Strategy Logic:

periodic or on-demand:
  chartUrl = generateTradingViewChart(token, timeframe)
  analysis = await chatGPT.analyzeChart(chartUrl, prompt)

  if analysis.recommendation == "BUY":
    emit TradeSignal({
      direction: "long",
      confidence: analysis.confidence,
      reasoning: analysis.explanation
    })

Key Features:

Queue-based async processing
Multi-language support
Follow-up analysis
Confidence scoring

2.5 Quote Optimizer (Go)

Technology: Go service (high-performance) Input: RPC requests from planners Output: Optimized quotes with route details

Responsibilities:

Interface between planners and quoting services
Try SolRoute service first (2-10ms)
Fallback to Jupiter API (100-300ms)
Cache quotes in Redis (5-minute TTL)
Health monitoring and automatic failover

Key Features:

Concurrent quote requests across multiple DEXes
Best route selection by output amount
Route template generation for caching
Performance metrics tracking

3. EXECUTORS (Transaction Execution Layer)

Purpose: Execute planned trades efficiently and reliably.

3.1 Jito Bundle Executor (TypeScript)

Technology: TypeScript + Jito SDK Input: NATS trade.opportunities.* (high priority) Output: Transaction signatures + confirmation status

Responsibilities:

Subscribe to high-priority trade opportunities
Build transaction with instructions
Add compute budget and priority fees
Get Jito tip account
Submit bundle with tip
Monitor bundle status
Emit execution results

Transaction Assembly:

flashLoanBorrow (if needed)
setComputeUnitPrice (priority fee)
setComputeUnitLimit (compute budget)
swapInstruction(s) (from route plan)
flashLoanRepay (if needed)
Compress with Address Lookup Tables
Sign with appropriate wallet
Submit to Jito with tip

Key Features:

Bundle composition with multiple transactions
Dynamic tip calculation based on competition
UUID-based bundle tracking
Retry logic with exponential backoff
Confirmation polling (max 30s)

3.2 TPU Direct Executor (TypeScript)

Technology: TypeScript + Solana Web3.js Input: NATS trade.opportunities.* (medium priority) Output: Transaction signatures + confirmation status

Responsibilities:

Alternative to Jito for non-MEV-critical trades
Direct transaction submission to TPU
Leader schedule awareness
Confirmation monitoring

Key Features:

Faster submission (no bundle overhead)
Lower cost (no Jito tips)
Suitable for non-competitive trades
RPC failover on errors

3.3 Transaction Coordinator (TypeScript)

Technology: TypeScript + state management Input: All execution requests Output: Routing decisions + execution tracking

Responsibilities:

Select appropriate executor (Jito vs TPU vs Solayer)
Manage execution queue
Handle concurrent execution limits
Track pending transactions
Coordinate retries on failures
Emit metrics and logs

Routing Logic:

function selectExecutor(opportunity: TradeOpportunity): Executor {
  if (opportunity.expectedProfit > HIGH_PROFIT_THRESHOLD) {
    return jitoExecutor; // MEV protection
  }

  if (opportunity.strategy === "arbitrage") {
    return jitoExecutor; // Time-sensitive
  }

  if (opportunity.urgency === "low") {
    return tpuExecutor; // Save on tips
  }

  return jitoExecutor; // Default
}

Key Features:

Priority-based routing
Concurrent execution limits (configurable)
Dead letter queue for failed transactions
Execution metrics (success rate, latency)

3.4 Confirmation Monitor (TypeScript)

Technology: TypeScript + RPC polling Input: Pending transaction signatures Output: NATS execution.confirmed or execution.failed

Responsibilities:

Poll transaction status (getSignatureStatuses)
Parse transaction logs for actual amounts
Verify expected vs. actual profit
Emit confirmation events
Handle timeouts and resubmissions

Key Features:

Batch status polling (up to 100 sigs)
Exponential backoff on polling
30-second timeout (configurable)
Event decoding for profit verification

4. PREPARERS (Initialization & Management)

Purpose: Setup and maintain system state before trading begins.

4.1 Wallet Manager (TypeScript)

Technology: TypeScript + keypair management Input: Configuration, treasure wallet Output: Initialized wallets with balances

Responsibilities:

Initialize wallet tiers (Proxy, Worker, Controller)
Load private keys securely from secrets manager
Create associated token accounts (ATAs)
Initial balance distribution from treasure wallet
Mask transfers for anonymity

Wallet Tiers:

Treasure Wallet: Centralized funding source (hot wallet)
Controller Wallets: Management operations (3-5 wallets)
Proxy Wallets: External-facing for anonymity (10-20 wallets)
Worker Wallets: Actual trading execution (20-50 wallets)

Key Features:

Expected balance tracking in Redis
Automatic rebalancing triggers
Multi-hop transfers for masking
ATA creation for all required tokens

4.2 Market Data Initializer (Go)

Technology: Go service Input: RPC endpoint, protocol configs Output: Cached market data in Redis

Responsibilities:

Fetch all relevant DEX pools (Raydium, Meteora, etc.)
Load current pool reserves and prices
Cache in Redis with 5-minute TTL
Initialize Kamino lending markets
Load Jupiter route templates from history

Key Features:

Concurrent pool fetching (goroutines)
Batch RPC calls for efficiency
Warm cache before trading starts
Health check on completion

4.3 Config Validator (TypeScript)

Technology: TypeScript + Zod schemas Input: Environment variables, config files Output: Validated configuration or errors

Responsibilities:

Validate all environment variables
Check RPC endpoint connectivity
Verify wallet keypairs are valid
Test Redis/PostgreSQL connections
Validate strategy parameters
Generate default configs if missing

Key Features:

Schema-based validation (Zod)
Detailed error messages
Connection testing
Config file generation

4.4 Historical Data Loader (TypeScript)

Technology: TypeScript + PostgreSQL Input: Database connection Output: Loaded historical data in memory/cache

Responsibilities:

Load past trade history for analysis
Cache profitable route templates
Initialize strategy state from last run
Load wallet balance history

Key Features:

Efficient batch loading
Selective caching (hot data only)
State recovery on restart

5. MONITORS (Observability & Alerting)

Purpose: Monitor all subsystems, track performance, and alert on issues.

5.1 Metrics Collector (Prometheus)

Technology: Prometheus + exporters Input: Metrics from all services Output: Time-series metrics database

Responsibilities:

Scrape metrics endpoints from all services
Store time-series data
Provide query interface for Grafana

Key Metrics:

Scanner: event rate, latency, error rate
Planner: opportunities detected, profit potential, strategy distribution
Executor: transaction success rate, confirmation time, profit realized
System: CPU, memory, network, RPC calls

5.2 Distributed Tracing (Jaeger)

Technology: Jaeger + OpenTelemetry Input: Traces from all services Output: Distributed trace visualization

Responsibilities:

Collect traces from all components
Visualize request flows across services
Identify bottlenecks and errors
Track latency breakdown

Key Traces:

Market event → opportunity detection → execution → confirmation
Quote request → SolRoute/Jupiter → response
Transaction building → signing → submission → confirmation

5.3 Log Aggregation (Loki)

Technology: Loki + Promtail Input: Logs from all services Output: Centralized log storage + queries

Responsibilities:

Collect logs from all services
Index and store efficiently
Provide query interface
Integrate with Grafana

Log Levels:

ERROR: Critical failures requiring immediate attention
WARN: Recoverable issues (RPC errors, quote failures)
INFO: Normal operations (trades executed, balances updated)
DEBUG: Detailed troubleshooting (quote details, route plans)

5.4 Dashboard (Grafana)

Technology: Grafana Input: Prometheus, Loki, Jaeger Output: Unified dashboards

Responsibilities:

Real-time system health dashboard
Trading performance dashboard (P&L, success rate, volume)
Strategy-specific dashboards (arbitrage, grid, DCA)
Alert visualization
Historical analysis

Dashboards:

System Overview (all services health)
Trading Performance (P&L, ROI, success rate)
Strategy Analytics (per-strategy metrics)
Wallet Management (balances, rebalancing events)
RPC Health (latency, error rates, endpoint status)

5.5 Alert Manager (Prometheus Alertmanager)

Technology: Alertmanager + notification channels Input: Alert rules from Prometheus Output: Notifications (Slack, PagerDuty, email)

Responsibilities:

Evaluate alert rules
Route alerts to appropriate channels
Group and deduplicate alerts
Escalation policies

Alert Categories:

Critical: System down, wallet balance low, executor failures
Warning: High error rates, slow confirmations, cache misses
Info: Strategy completed, rebalancing triggered

Data Flow Example: Arbitrage Trade

1. Market Event Scanner
   - Detects SOL/USDC pool update on Raydium
   - Emits: NATS market.events.raydium.pool_update

2. Arbitrage Planner
   - Receives pool update event
   - Gets quote: SOL → USDC (SolRoute: 2ms)
   - Gets quote: USDC → SOL (SolRoute: 2ms)
   - Calculates profit: 0.05 SOL (profitable!)
   - Emits: NATS trade.opportunities.arbitrage

3. Transaction Coordinator
   - Receives arbitrage opportunity
   - Selects Worker Wallet #5 (available)
   - Selects Jito Executor (high priority)
   - Routes to executor

4. Jito Bundle Executor
   - Builds transaction:
     [flashBorrow, setComputeBudget, swap1, swap2, flashRepay]
   - Signs with Worker Wallet #5
   - Submits bundle to Jito (UUID: abc-123)
   - Emits: NATS execution.submitted

5. Confirmation Monitor
   - Polls bundle status every 2s
   - Bundle confirmed in slot 246382819 (12s)
   - Parses logs: actual profit = 0.048 SOL
   - Emits: NATS execution.confirmed

6. Monitors
   - Metrics: arbitrage_profit_realized=0.048 SOL
   - Logs: "Arbitrage trade confirmed, profit 0.048 SOL"
   - Dashboard: Update P&L chart

7. Wallet Manager
   - Updates Worker Wallet #5 expected balance
   - Checks if rebalancing needed (no)

Technology Stack

Core Services

Component	Technology	Rationale
Scanners	TypeScript	Rich ecosystem, Web3.js integration
Planners	TypeScript	Business logic flexibility, fast iteration
Executors	TypeScript	Transaction signing, Solana SDK
Quote Service	Go	High performance, concurrency, 2-10ms latency
RPC Proxy	Rust	Maximum performance, connection pooling
Transaction Builder	Rust	Zero-copy serialization, speed

Infrastructure

Component	Technology	Purpose
Event Bus	NATS JetStream	Pub/sub, persistence, replay
Hot Cache	Redis	Sub-ms access, pub/sub
Persistent DB	PostgreSQL	ACID transactions, relational data
Time-Series	TimescaleDB	Historical metrics, optimized queries
Secrets	Vault / AWS Secrets	Secure key management
Container Runtime	Docker + Kubernetes	Orchestration, scaling
Observability	Prometheus, Grafana, Jaeger, Loki	Metrics, traces, logs

Communication Patterns

Service-to-Service: NATS pub/sub (async, decoupled)
Client-to-Service: REST API / gRPC (sync, request-response)
Cache: Redis (read-through pattern)
State: PostgreSQL (source of truth)
Events: NATS JetStream (persistent streams)

Deployment Architecture

┌─────────────────────────────────────────────────────────┐
│                    Load Balancer (Nginx)                 │
└─────────────────────────────────────────────────────────┘
                          ↓
┌──────────────────────────────────────────────────────────┐
│                  Kubernetes Cluster                       │
│                                                           │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐     │
│  │  Scanners   │  │  Planners   │  │  Executors  │     │
│  │  (3 pods)   │  │  (5 pods)   │  │  (3 pods)   │     │
│  └─────────────┘  └─────────────┘  └─────────────┘     │
│                                                           │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐     │
│  │ Quote Svc   │  │  RPC Proxy  │  │  Preparers  │     │
│  │  (Go-2pods) │  │  (Rust-3)   │  │  (2 pods)   │     │
│  └─────────────┘  └─────────────┘  └─────────────┘     │
└──────────────────────────────────────────────────────────┘
                          ↓
┌──────────────────────────────────────────────────────────┐
│                   Managed Services                        │
│                                                           │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌─────────┐ │
│  │   NATS   │  │  Redis   │  │PostgreSQL│  │ Grafana │ │
│  │JetStream │  │ Cluster  │  │ Primary  │  │  Cloud  │ │
│  └──────────┘  └──────────┘  └──────────┘  └─────────┘ │
└──────────────────────────────────────────────────────────┘

Scaling Strategy

Scanners: Scale horizontally, partition by account ranges Planners: Scale horizontally, subscribe to all events Executors: Scale horizontally, coordinate via NATS Quote Service: Scale horizontally, cache in Redis RPC Proxy: Scale horizontally, round-robin load balancing

Next Steps

See 03-implementation-roadmap.md for the phased implementation plan.

Share on

Bluesky Facebook LinkedIn Mastodon X (formerly Twitter)

James Shen

Production System Architecture Plan

Overview

High-Level Architecture

System Components

1. SCANNERS (Data Acquisition Layer)

1.1 Market Event Scanner (TypeScript)

1.2 Price Feed Scanner (Go)

1.3 Volume Monitor (TypeScript)

1.4 Wallet Balance Scanner (TypeScript)

2. PLANNERS (Strategy & Decision Layer)

2.1 Arbitrage Planner (TypeScript)

2.2 Grid Trading Planner (TypeScript)

2.3 DCA Planner (TypeScript)

2.4 AI Analysis Planner (TypeScript)

2.5 Quote Optimizer (Go)

3. EXECUTORS (Transaction Execution Layer)

3.1 Jito Bundle Executor (TypeScript)

3.2 TPU Direct Executor (TypeScript)

3.3 Transaction Coordinator (TypeScript)

3.4 Confirmation Monitor (TypeScript)

4. PREPARERS (Initialization & Management)

4.1 Wallet Manager (TypeScript)

4.2 Market Data Initializer (Go)

4.3 Config Validator (TypeScript)

4.4 Historical Data Loader (TypeScript)

5. MONITORS (Observability & Alerting)

5.1 Metrics Collector (Prometheus)

5.2 Distributed Tracing (Jaeger)

5.3 Log Aggregation (Loki)

5.4 Dashboard (Grafana)

5.5 Alert Manager (Prometheus Alertmanager)

Data Flow Example: Arbitrage Trade

Technology Stack

Core Services

Infrastructure

Communication Patterns

Deployment Architecture

Scaling Strategy

Next Steps

Share on