User Guide Overview
Welcome to the QuantRL-Lab user guide. This section covers core concepts and practical workflows for building and testing trading strategies.
Core Concepts
Strategy Injection
QuantRL-Lab uses dependency injection to decouple environment logic from strategies:
env = SingleStockTradingEnv(
data=df,
config=config,
action_strategy=action_strategy, # (1)!
reward_strategy=reward_strategy, # (2)!
observation_strategy=observation_strategy # (3)!
)
- How raw actions are mapped to market orders
- How scalar rewards are calculated each step
- What the agent sees as its state representation
This architecture enables:
- Modularity: Change reward functions without touching environment code
- Reusability: Compose complex behaviors from simple components
- Testability: Isolate and test strategies independently
- Experimentation: Rapidly iterate on different configurations
Three Strategy Types
Defines how raw agent actions are processed into market orders.
Constructs the state representation the agent sees.
Data Flow
graph TD
A[DataLoader<br/>Alpaca / YFinance / AlphaVantage] -->|get_historical_ohlcv_data| B[DataFrame with OHLCV]
B -->|DataProcessor.data_processing_pipeline| C[DataFrame with indicators,<br/>sentiment, analyst data]
C -->|pass to env| D[SingleStockTradingEnv]
D -->|step delegates to| E[Action / Observation / Reward strategies]
Step Execution Order
Each call to env.step(action) follows this sequence:
sequenceDiagram
participant Agent
participant Env as SingleStockTradingEnv
participant Port as Portfolio
participant Act as ActionStrategy
participant Rew as RewardStrategy
participant Obs as ObservationStrategy
Agent->>Env: step(action)
Env->>Env: store prev_portfolio_value
Env->>Port: process_open_orders()
Note over Port: process pending limit/stop orders
Env->>Act: handle_action(action)
Act-->>Port: execute new order
Env->>Env: advance current_step, check terminated/truncated
Env->>Rew: calculate_reward()
Env->>Env: clip reward to reward_clip_range
Env->>Rew: on_step_end() (stateful hook)
Env->>Obs: build_observation()
Env-->>Agent: observation, reward, terminated, truncated, info
Configuration & Non-Obvious Behaviors
See Configuration for all SingleStockEnvConfig / SimulationConfig parameters. For subtle runtime behaviors (step timing, window padding, price auto-detection, order persistence), see Architecture — Non-Obvious Behaviours.
Backtesting Workflow
For a complete training + evaluation workflow, see Backtesting and Experiments.
Advanced Topics
- Custom Strategies - Build custom action/observation/reward strategies
- Backtesting - Advanced backtesting workflows and metrics
- Reward Shaping - Techniques for stable, informative reward signals
Next Steps
- Explore examples for complete workflows
- Check API reference for detailed documentation
- Review notebooks in
notebooks/for interactive tutorials