Reinforcement Learning for Quantitative US Equity Trading
From raw market data to autonomous live execution — in four phases.
Architecture
Each phase builds on the previous. Start from Phase 1, go as far as you need.
Multi-source market data ingestion with disk caching. 88 features per stock including technical indicators, macro regime signals, and FinBERT sentiment. Rolling z-score normalization with zero look-ahead bias.
PPO with MLP baseline or Transformer policy. Per-stock temporal encoder with CLS-token aggregation, cross-asset attention for correlation modeling. 4-stage curriculum from calm bull to full crisis.
Alpaca paper and live trading via unified broker interface. 7-layer hard risk constraints independent of the agent. FastAPI inference server with 15 endpoints. Prometheus + Grafana monitoring stack.
CUSUM + KS-test drift detection triggers Champion/Challenger retraining with Welch's t-test for promotion. UCB1 ensemble voting with disagreement-aware position scaling. 7 scheduled jobs, Slack + email alerts.
Demo
Illustrative results using synthetic data. Replace with your real backtest output.
| Metric | RL Agent | SPY B&H |
|---|---|---|
| Annual Return | +23.4% | +11.8% |
| Sharpe Ratio | 1.34 | 0.72 |
| Max Drawdown | -12.3% | -33.9% |
| Calmar Ratio | 1.90 | 0.35 |
| Alpha vs SPY | +11.6% | — |
| Win Rate | 57.2% | 53.1% |
* SIMULATED RESULTS USING SYNTHETIC DATA — NOT REAL TRADING PERFORMANCE
Quickstart
Free API keys available for all data sources. No paid subscriptions required for Phase 1–2.
Clone the repo, create a virtual environment, and install dependencies. Python 3.10+ required.
Register free accounts at Polygon.io, Finnhub, and Alpaca. Copy .env.example and fill in your keys.
Start with Phase 1 validation, train a PPO agent in Phase 2, then launch the full autonomous system.
Tech Stack
Production-grade libraries, no custom wheel reinvention.