Data Sources
The pipeline aggregates signals across six major categories, each with dedicated ingestion and normalization logic:On-Chain Transactions (Solana RPC + Geyser)
On-Chain Transactions (Solana RPC + Geyser)
The primary data source is the Solana blockchain itself. SURCHI runs dedicated RPC nodes with the Geyser plugin enabled, which provides a real-time streaming interface for account updates, transaction confirmations, and program events. Unlike polling-based RPC approaches, Geyser delivers events as they are processed by the validator, enabling sub-100ms event ingestion. All significant program interactions — swaps, liquidity deposits and withdrawals, governance actions, large transfers — are captured and tagged at this layer.
DEX Order Books (Jupiter, Raydium, Orca)
DEX Order Books (Jupiter, Raydium, Orca)
SURCHI aggregates order book depth, quote routing data, and swap execution records from the three major Solana DEX venues. Jupiter’s aggregation layer provides cross-venue routing intelligence. Raydium’s concentrated liquidity pools and Orca’s Whirlpools provide granular tick-level liquidity data. Aggregated DEX data enables the Alpha and Liquidity Sentinels to assess true executable liquidity rather than theoretical pool depth.
Liquidity Pool Depths and Health Metrics
Liquidity Pool Depths and Health Metrics
Pool health is tracked continuously across all major Solana liquidity pools. Health metrics include: current liquidity depth by price range, utilization rate, fee accumulation rate, impermanent loss projections at current price levels, and pool age. These metrics feed directly into the Liquidity Sentinel’s risk models and are surfaced as alerts when health indicators cross user-defined thresholds.
Whale Wallet Tracking
Whale Wallet Tracking
Users configure watchlists of wallet addresses to monitor — their own, known market maker wallets, protocol treasuries, or any other addresses of strategic interest. The pipeline tracks all transactions from watchlisted wallets in real time, tagging movements with asset type, value in USDC terms, counterparty classification, and historical context. Configurable alert thresholds trigger Sentinel Core notifications for movements exceeding defined values.
Social Sentiment (X/Twitter, Telegram)
Social Sentiment (X/Twitter, Telegram)
Funding Rates and Derivatives Data
Funding Rates and Derivatives Data
Perpetual futures funding rates from Solana-native derivatives venues provide a forward-looking sentiment indicator. Elevated positive funding rates signal leveraged long positioning; elevated negative rates signal short pressure. This data feeds Alpha Sentinel’s market regime classification models, helping distinguish trending from mean-reverting conditions.
Processing Architecture
Raw data from all sources enters a four-stage processing pipeline before reaching the AI inference layer:Ingestion and Deduplication
Raw events arrive from multiple sources simultaneously. The ingestion layer deduplicates cross-source events (e.g., the same swap appearing in both Geyser streaming data and DEX API feeds), assigns canonical event IDs, and timestamps events with both chain-native slot timestamps and pipeline ingestion timestamps. Latency between the two timestamps is monitored as a pipeline health metric.
Normalization
Each event type is mapped to a canonical schema. Token amounts are normalized to USDC terms using real-time price feeds. Wallet addresses are resolved against classification databases (known exchanges, protocols, whale accounts). Pool addresses are resolved to their constituent assets and protocols. The output of normalization is a structured event record with a consistent schema regardless of source.
Feature Extraction
Normalized events are aggregated into feature vectors over rolling time windows (1 minute, 5 minutes, 15 minutes, 1 hour). Features include: net DEX flow by asset, liquidity depth changes by pool, whale wallet activity volume, sentiment score deltas, and funding rate changes. Feature extraction produces the dense numerical representations that AI models consume.
Streaming to AI Layer
Completed feature vectors are streamed to the Sentinel Core via an internal message bus. The Core routes vectors to the appropriate sentinel models based on feature type. Vectors that are stale beyond their freshness SLA are discarded rather than processed — the pipeline prioritizes accuracy over throughput when the two conflict.
Data Freshness Guarantees
Different data categories have different freshness requirements and different achievable latencies. The pipeline’s SLA targets by category:| Data Category | Target Latency | Update Frequency |
|---|---|---|
| On-chain transactions (Geyser) | < 50ms | Per block (~400ms slots) |
| DEX order book depth | < 100ms | Continuous streaming |
| Liquidity pool health metrics | < 200ms | Per block |
| Whale wallet movements | < 100ms | Per transaction |
| Social sentiment scores | < 60 seconds | Rolling 1-minute aggregation |
| Funding rates | < 30 seconds | Per funding interval |
