Strategies
Base Interfaces
interfaces
BaseActionStrategy
Bases: ABC
An abstract base class for defining action spaces and handling agent actions.
Source code in src/quantrl_lab/environments/core/interfaces.py
define_action_space()
abstractmethod
Defines the action space for the environment.
Returns:
| Type | Description |
|---|---|
Space
|
gym.spaces.Space: The action space for the environment. |
handle_action(env_self, action)
abstractmethod
Handles the action taken by the agent in the environment.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
env_self
|
TradingEnvProtocol
|
The environment instance where the action is taken. |
required |
action
|
Any
|
The action taken by the agent. |
required |
Returns:
| Type | Description |
|---|---|
Tuple[Any, Dict[str, Any]]
|
Tuple[Any, Dict[str, Any]]: The outcome of the action taken in the environment |
Source code in src/quantrl_lab/environments/core/interfaces.py
BaseObservationStrategy
Bases: ABC
Abstract base class for defining how an agent perceives the environment.
Source code in src/quantrl_lab/environments/core/interfaces.py
define_observation_space(env)
abstractmethod
Defines and returns the observation space for the environment.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
env
|
TradingEnvProtocol
|
The trading environment. |
required |
Returns:
| Type | Description |
|---|---|
Space
|
gym.spaces.Space: The observation space. |
Source code in src/quantrl_lab/environments/core/interfaces.py
build_observation(env)
abstractmethod
Builds the observation vector for the current state.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
env
|
TradingEnvProtocol
|
The trading environment. |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
np.ndarray: The observation vector. |
Source code in src/quantrl_lab/environments/core/interfaces.py
get_feature_names(env)
abstractmethod
Returns a list of feature names corresponding to the exact order of elements in the flattened observation vector.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
env
|
TradingEnvProtocol
|
The trading environment. |
required |
Returns:
| Type | Description |
|---|---|
List[str]
|
List[str]: A list of feature names (e.g., ["Close_t-1", "RSI_t", ...]) |
Source code in src/quantrl_lab/environments/core/interfaces.py
BaseRewardStrategy
Bases: ABC
Abstract base class for calculating rewards.
Source code in src/quantrl_lab/environments/core/interfaces.py
calculate_reward(env)
abstractmethod
Calculate the reward based on the action taken in the environment.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
env
|
TradingEnvProtocol
|
The trading environment instance. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
The calculated reward. |
Source code in src/quantrl_lab/environments/core/interfaces.py
Action Strategies
standard
StandardActionStrategy
Bases: BaseActionStrategy
Implements the full-featured action space with a 3-part Box space.
Action: [action_type, amount, price_modifier]
Source code in src/quantrl_lab/environments/stock/strategies/actions/standard.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 | |
define_action_space()
Defines the action space for the trading environment.
Returns:
| Type | Description |
|---|---|
Box
|
gym.spaces.Box: The action space as a Box space. |
Source code in src/quantrl_lab/environments/stock/strategies/actions/standard.py
handle_action(env_self, action)
Handles the action by decoding it and instructing the environment's portfolio.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
env_self
|
TradingEnvProtocol
|
The environment instance. |
required |
action
|
ndarray
|
The raw action from the agent. |
required |
Returns:
| Type | Description |
|---|---|
Tuple[Any, Dict[str, Any]]
|
Tuple[Any, Dict[str, Any]]: The decoded action type and a dictionary of details. |
Source code in src/quantrl_lab/environments/stock/strategies/actions/standard.py
time_in_force
TimeInForceActionStrategy
Bases: BaseActionStrategy
Implements an advanced action space with Time-In-Force (TIF) control.
Action: [action_type, amount, price_modifier, tif_type]
TIF Types: 0: GTC (Good Till Cancelled) 1: IOC (Immediate or Cancel) 2: TTL (Time To Live - uses order_expiration_steps)
Source code in src/quantrl_lab/environments/stock/strategies/actions/time_in_force.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 | |
define_action_space()
Defines the action space for the trading environment.
Returns:
| Type | Description |
|---|---|
Box
|
gym.spaces.Box: The action space as a Box space. |
Source code in src/quantrl_lab/environments/stock/strategies/actions/time_in_force.py
handle_action(env_self, action)
Handles the action by decoding it and instructing the environment's portfolio.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
env_self
|
TradingEnvProtocol
|
The environment instance. |
required |
action
|
ndarray
|
The raw action from the agent. |
required |
Returns:
| Type | Description |
|---|---|
Tuple[Any, Dict[str, Any]]
|
Tuple[Any, Dict[str, Any]]: The decoded action type and a dictionary of details. |
Source code in src/quantrl_lab/environments/stock/strategies/actions/time_in_force.py
64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 | |
Observation Strategies
feature_aware
FeatureAwareObservationStrategy
Bases: BaseObservationStrategy
Feature-aware observation strategy with smart normalization.
Unlike the standard strategy which normalizes everything relative to the window start, this strategy discriminates between feature types: 1. Price-like (Open, High, Low, Close, SMA, EMA, BB): Normalized relative to the first step in the window. 2. Stationary (RSI, STOCH, MFI, ADX, Time Features): Passed through raw or scaled independently, preserving their absolute values (e.g., Overbought/Oversold levels).
Source code in src/quantrl_lab/environments/stock/strategies/observations/feature_aware.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 | |
__init__(volatility_lookback=10, trend_lookback=10, normalize_stationary=True)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
volatility_lookback
|
int
|
Steps to calculate recent volatility. |
10
|
trend_lookback
|
int
|
Steps to calculate trend. |
10
|
normalize_stationary
|
bool
|
If True, attempts to scale known 0-100 indicators to 0-1. |
True
|
Source code in src/quantrl_lab/environments/stock/strategies/observations/feature_aware.py
get_feature_names(env)
Generates the ordered list of feature names corresponding to the observation vector.
The observation space consists of: 1. The flattened market window (oldest step to newest step) 2. The portfolio & engineering features
Source code in src/quantrl_lab/environments/stock/strategies/observations/feature_aware.py
Reward Strategies
portfolio_value
PortfolioValueChangeReward
Bases: BaseRewardStrategy
Calculates reward based on the % change in portfolio value.
Source code in src/quantrl_lab/environments/stock/strategies/rewards/portfolio_value.py
calculate_reward(env)
Calculates the reward based on the percentage change in portfolio value.
This method now correctly interacts with the environment's portfolio component to get the current value.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
env
|
TradingEnvProtocol
|
The environment instance. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
The percentage change in portfolio value since the previous step. |
Source code in src/quantrl_lab/environments/stock/strategies/rewards/portfolio_value.py
sortino
DifferentialSortinoReward
Bases: BaseRewardStrategy
Reward strategy based on the Differential Sortino Ratio.
Unlike the standard Sortino Ratio which is calculated over a fixed period, the Differential Sortino Ratio provides a dense reward signal at each step, representing the contribution of the current return to the overall Sortino Ratio.
It penalizes downside volatility (returns below target) while rewarding positive returns.
Source code in src/quantrl_lab/environments/stock/strategies/rewards/sortino.py
__init__(target_return=0.0, decay=0.99)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
target_return
|
float
|
Minimum acceptable return (MAR). Returns below this are considered downside risk. |
0.0
|
decay
|
float
|
Decay factor for the moving average of returns and downside deviation (0 < decay < 1). Closer to 1 means longer memory. |
0.99
|
Source code in src/quantrl_lab/environments/stock/strategies/rewards/sortino.py
calculate_reward(env)
Calculate the differential Sortino reward.
Ref: "Online Learning of the Differential Sharpe Ratio" logic adapted for Sortino.
Source code in src/quantrl_lab/environments/stock/strategies/rewards/sortino.py
sharpe
DifferentialSharpeReward
Bases: BaseRewardStrategy
Reward strategy based on the Differential Sharpe Ratio.
Provides a dense reward signal at each step, representing the contribution of the current return to the overall Sharpe Ratio.
It rewards high returns and penalizes total volatility.
Source code in src/quantrl_lab/environments/stock/strategies/rewards/sharpe.py
__init__(risk_free_rate=0.0, decay=0.99)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
risk_free_rate
|
float
|
The risk-free rate (per step) to subtract from returns. Defaults to 0 assuming short time steps. |
0.0
|
decay
|
float
|
Decay factor for the moving average of returns and variance. 0 < decay < 1. |
0.99
|
Source code in src/quantrl_lab/environments/stock/strategies/rewards/sharpe.py
calculate_reward(env)
Calculate the differential Sharpe reward.
Source code in src/quantrl_lab/environments/stock/strategies/rewards/sharpe.py
drawdown
DrawdownPenaltyReward
Bases: BaseRewardStrategy
Penalizes the agent proportional to the current drawdown depth.
This provides a continuous pressure to recover from losses. Reward = - (Current_Drawdown_Pct * penalty_factor)
Source code in src/quantrl_lab/environments/stock/strategies/rewards/drawdown.py
__init__(penalty_factor=1.0)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
penalty_factor
|
float
|
Scaling factor for the penalty. |
1.0
|
Source code in src/quantrl_lab/environments/stock/strategies/rewards/drawdown.py
calculate_reward(env)
Calculate drawdown penalty.
Source code in src/quantrl_lab/environments/stock/strategies/rewards/drawdown.py
turnover
TurnoverPenaltyReward
Bases: BaseRewardStrategy
Penalizes excessive trading by applying a multiple of the fees paid.
While PnL implicitly accounts for fees, an explicit penalty helps the agent learn "efficiency" faster, discouraging noise trading where the profit margin is razor-thin compared to the cost.
Source code in src/quantrl_lab/environments/stock/strategies/rewards/turnover.py
__init__(penalty_factor=1.0)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
penalty_factor
|
float
|
Multiplier for fees paid. 1.0 means penalty = fees (doubling the cost impact). 5.0 means extremely high penalty for churning. |
1.0
|
Source code in src/quantrl_lab/environments/stock/strategies/rewards/turnover.py
calculate_reward(env)
Calculate penalty based on transaction costs incurred in this step.
We look at the executed_orders_history for events that happened at the current step.
Source code in src/quantrl_lab/environments/stock/strategies/rewards/turnover.py
invalid_action
InvalidActionPenalty
Bases: BaseRewardStrategy
Applies a fixed penalty for attempting an invalid action.
E.g. Sell or Limit Sell when no shares are held.
Source code in src/quantrl_lab/environments/stock/strategies/rewards/invalid_action.py
calculate_reward(env)
Calculate the reward based on the action taken in the environment. If an invalid action is attempted, a penalty is applied.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
env
|
TradingEnvProtocol
|
The trading environment instance. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
The penalty for invalid action attempt. |
Source code in src/quantrl_lab/environments/stock/strategies/rewards/invalid_action.py
boredom
BoredomPenaltyReward
Bases: BaseRewardStrategy
Penalizes the agent for holding a position too long without significant price movement or profit.
This encourages the agent to: 1. Enter trades only when a move is expected soon. 2. Exit stale positions rather than holding them indefinitely hoping for a turnaround.
Source code in src/quantrl_lab/environments/stock/strategies/rewards/boredom.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 | |
__init__(penalty_per_step=-0.001, grace_period=10, min_profit_pct=0.005)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
penalty_per_step
|
float
|
The negative reward to apply per step after the grace period. |
-0.001
|
grace_period
|
int
|
Number of steps a position can be held without penalty. |
10
|
min_profit_pct
|
float
|
The minimum unrealized profit % required to reset the boredom timer. If the position is profitable enough, we don't penalize holding (letting winners run). |
0.005
|
Source code in src/quantrl_lab/environments/stock/strategies/rewards/boredom.py
calculate_reward(env)
Calculate the boredom penalty.
Source code in src/quantrl_lab/environments/stock/strategies/rewards/boredom.py
execution_bonus
LimitExecutionReward
Bases: BaseRewardStrategy
Provides a reward proportional to the price improvement achieved by a Limit Order filling instead of executing immediately at market.
Source code in src/quantrl_lab/environments/stock/strategies/rewards/execution_bonus.py
__init__(improvement_multiplier=10.0)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
improvement_multiplier
|
float
|
Scales the % improvement. e.g., a 2% price improvement * 10.0 = +0.20 reward. |
10.0
|
Source code in src/quantrl_lab/environments/stock/strategies/rewards/execution_bonus.py
expiration
OrderExpirationPenaltyReward
Bases: BaseRewardStrategy
Penalizes the agent when pending orders expire.
This discourages "order spamming" (placing unrealistic limit orders that never fill and just clog the system until they time out).
Source code in src/quantrl_lab/environments/stock/strategies/rewards/expiration.py
__init__(penalty_per_order=-0.1)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
penalty_per_order
|
float
|
Fixed penalty for each expired order in the step. Should be small but non-zero. |
-0.1
|
Source code in src/quantrl_lab/environments/stock/strategies/rewards/expiration.py
calculate_reward(env)
Calculate penalty based on number of expired orders in this step.
Source code in src/quantrl_lab/environments/stock/strategies/rewards/expiration.py
composite
CompositeReward
Bases: BaseRewardStrategy
A composite strategy that combines multiple reward strategies with weights.
This class implements the Composite design pattern.
Features: - Weight Normalization: Ensures weights sum to 1.0. - Auto-Scaling: Optionally normalizes each component strategy to N(0,1) before weighting, preventing one strategy from dominating others due to scale.
Source code in src/quantrl_lab/environments/stock/strategies/rewards/composite.py
calculate_reward(env)
Calculate the composite reward based on the child strategies.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
env
|
TradingEnvProtocol
|
The trading environment instance. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
The composite reward based on the child strategies. |
Source code in src/quantrl_lab/environments/stock/strategies/rewards/composite.py
on_step_end(env)
Optional: A hook to update any internal state if needed. This method is called at the end of each step in the environment.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
env
|
TradingEnvProtocol
|
The trading environment instance. |
required |
Source code in src/quantrl_lab/environments/stock/strategies/rewards/composite.py
reset()
Reset child strategies.
Note: We do NOT reset running stats (if auto_scale=True) because they represent the global distribution of the environment rewards, which should persist across episodes for stability.