Handling Episodes with No Meaningful Actions

Jump to bottom

ai-lab-projects edited this page Apr 29, 2025 · 1 revision

Handling Episodes with No Meaningful Actions

During training, sometimes the model falls into a state where:

It does not perform any buy or sell actions, or
It always behaves the same way without adapting.

If such behavior continues for several episodes, the training is considered unproductive, and we terminate early for that trial.

Why This Happens

This phenomenon is often related to the choice of hyperparameters:

If the exploration rate (epsilon) decays too quickly,
If the reward signals are too sparse or weak,
Or if the network architecture or optimizer is poorly suited to the task,

then the agent may fail to discover meaningful trading behaviors.

Current Countermeasures

We have adjusted some hyperparameters to reduce the frequency of this issue.
Still, further tuning could improve learning efficiency.

Future Directions

Further hyperparameter optimization to minimize unproductive episodes.
Exploring alternative methods that are less sensitive to initial exploration issues, such as:
- Policy gradient methods
- Actor-critic architectures
- Imitation learning based on heuristic strategies