-
Notifications
You must be signed in to change notification settings - Fork 0
Handling Episodes with No Meaningful Actions
ai-lab-projects edited this page Apr 29, 2025
·
1 revision
During training, sometimes the model falls into a state where:
- It does not perform any buy or sell actions, or
- It always behaves the same way without adapting.
If such behavior continues for several episodes, the training is considered unproductive, and we terminate early for that trial.
This phenomenon is often related to the choice of hyperparameters:
- If the exploration rate (epsilon) decays too quickly,
- If the reward signals are too sparse or weak,
- Or if the network architecture or optimizer is poorly suited to the task,
then the agent may fail to discover meaningful trading behaviors.
- We have adjusted some hyperparameters to reduce the frequency of this issue.
- Still, further tuning could improve learning efficiency.
- Further hyperparameter optimization to minimize unproductive episodes.
-
Exploring alternative methods that are less sensitive to initial exploration issues, such as:
- Policy gradient methods
- Actor-critic architectures
- Imitation learning based on heuristic strategies