Data Preprocessing

The preprocessing step prepares the historical ETF data to be suitable for training Deep Q-Networks (DQN).

Steps

1. Download Data

Historical daily data for the ETF (1655.T) is downloaded using yfinance.
Only dates where the closing price exceeds 50 JPY are kept to ensure data quality.

2. Feature Selection

Close and Open prices are extracted separately.
Close prices are used to calculate moving averages and technical indicators.
Open prices are used for evaluating real-world buy/sell execution prices.

3. Dataset Splitting

The data is divided into three sets:

Training Set (60%)
Validation Set (20%)
Test Set (20%)

This ensures that the models are evaluated on unseen data and can generalize well.

4. Feature Engineering

For the seller agent, the following features are calculated:

Return from Buy Price (%)
Return from Average Price (%)
RSI (Relative Strength Index)
Elapsed Time Since Purchase (logarithmic scaling)

These features form a 4-dimensional input to the selling agent's network.

5. Data Normalization

For the buyer agent:

Recent close prices (look-back window) are normalized using scikit-learn scalers:
- StandardScaler
- MinMaxScaler
- RobustScaler (randomly selected)

This improves model convergence during training.

Notes

No future information is leaked into the past (strictly causal features).
Random factors such as scaling choice are intended to improve model robustness.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Data Preprocessing

Data Preprocessing

Steps

1. Download Data

2. Feature Selection

3. Dataset Splitting

4. Feature Engineering

5. Data Normalization

Notes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally