Open Source · MIT · Production Ready 开源 · MIT 协议 · 生产可用

OpenRLQuant

Reinforcement Learning for Quantitative US Equity Trading

From raw market data to autonomous live execution — in four phases.

20%+
Annual Return
1.2+
Sharpe Ratio
<15%
Max Drawdown
40
Python Files

Architecture

Four-Phase System

Each phase builds on the previous. Start from Phase 1, go as far as you need.

PHASE 01 — run_phase1.py
Data & Features
数据管道 · 特征工程 · 回测环境

Multi-source market data ingestion with disk caching. 88 features per stock including technical indicators, macro regime signals, and FinBERT sentiment. Rolling z-score normalization with zero look-ahead bias.

Polygon.ioYahoo Finance FinnhubFinBERTGymnasium
PHASE 02 — run_phase2.py
RL Training
PPO训练 · Transformer策略 · 课程学习

PPO with MLP baseline or Transformer policy. Per-stock temporal encoder with CLS-token aggregation, cross-asset attention for correlation modeling. 4-stage curriculum from calm bull to full crisis.

PPOTransformer CurriculumOptunaMLflow
PHASE 03 — run_phase3.py
Live Execution
实盘接入 · 风控系统 · 监控看板

Alpaca paper and live trading via unified broker interface. 7-layer hard risk constraints independent of the agent. FastAPI inference server with 15 endpoints. Prometheus + Grafana monitoring stack.

Alpaca APIFastAPI PrometheusGrafanaDocker
PHASE 04 — run_phase4.py
Autonomous Ops
自动再训练 · 集成投票 · 调度告警

CUSUM + KS-test drift detection triggers Champion/Challenger retraining with Welch's t-test for promotion. UCB1 ensemble voting with disagreement-aware position scaling. 7 scheduled jobs, Slack + email alerts.

CUSUMUCB1 Ensemble APSchedulerSlackWelch t-test

Demo

Simulated Backtest Results

Illustrative results using synthetic data. Replace with your real backtest output.

Annual Returns by Year (RL vs SPY)
RL Strategy
SPY B&H
Performance Metrics vs Benchmark
MetricRL AgentSPY B&H
Annual Return+23.4%+11.8%
Sharpe Ratio1.340.72
Max Drawdown-12.3%-33.9%
Calmar Ratio1.900.35
Alpha vs SPY+11.6%
Win Rate57.2%53.1%

* SIMULATED RESULTS USING SYNTHETIC DATA — NOT REAL TRADING PERFORMANCE

Quickstart

Get Running in Minutes

Free API keys available for all data sources. No paid subscriptions required for Phase 1–2.

STEP 01
Install
克隆 · 虚拟环境 · 依赖安装

Clone the repo, create a virtual environment, and install dependencies. Python 3.10+ required.

STEP 02
Configure
注册 API · 填写 .env · 验证

Register free accounts at Polygon.io, Finnhub, and Alpaca. Copy .env.example and fill in your keys.

STEP 03
Run
Phase 1 验证 · Phase 2 训练 · Phase 4 部署

Start with Phase 1 validation, train a PPO agent in Phase 2, then launch the full autonomous system.

terminal
# 1. Clone & install git clone https://github.com/Donvink/OpenRLQuant.git cd OpenRLQuant python -m venv .venv && source .venv/bin/activate pip install -r requirements.txt # 2. Configure API keys cp .env.example .env # Fill in: POLYGON_API_KEY, FINNHUB_KEY, ALPACA_API_KEY # 3. Phase 1 — validate data pipeline (~5 min) python run_phase1.py --mode quick # 4. Phase 2 — train PPO agent (~30 min CPU) python run_phase2.py --mode mlp --use-synthetic --timesteps 500000 # 5. Phase 4 — launch full autonomous system python run_phase4.py --mode paper --dry-run # → API: http://localhost:8000/docs # → Grafana: http://localhost:3000

Tech Stack

Built With

Production-grade libraries, no custom wheel reinvention.

Data & Features
yfinance / Polygon.ioMarket data
FinnhubNews & filings
FinBERT / transformersNLP sentiment
pandas-taTechnical indicators
PyArrow / ParquetFeature store
Training & RL
PyTorch 2.0+Deep learning
Stable-Baselines3PPO implementation
GymnasiumRL environment
OptunaHyperparameter search
MLflowExperiment tracking
Execution & Ops
Alpaca APIBroker integration
FastAPI + uvicornInference server
Prometheus + GrafanaMonitoring
APSchedulerJob scheduling
Docker ComposeDeployment

Author

Built by One Person

Contributions welcome — see CONTRIBUTING.md

Z
Leo Zhong
Quant Developer · Open Source

Built OpenRLQuant to bridge the gap between academic RL research and production quantitative trading. The system covers the full lifecycle from data ingestion to live autonomous execution, with real-world constraints like transaction costs, slippage, and multi-layer risk management that most research repositories ignore.

OpenRLQuant is for educational and research purposes only. Backtested performance does not guarantee future results. US equity trading involves substantial risk of capital loss. Always paper trade for at least 3 months before using real funds. OpenRLQuant 仅供学习和研究使用。回测结果不代表未来实盘收益。 美股交易存在实质性亏损风险。在使用真实资金前,请至少进行3个月的纸面交易验证。