Skip to content

Data Preprocessing

ai-lab-projects edited this page Apr 29, 2025 · 1 revision

Data Preprocessing

The preprocessing step prepares the historical ETF data to be suitable for training Deep Q-Networks (DQN).

Steps

1. Download Data

  • Historical daily data for the ETF (1655.T) is downloaded using yfinance.
  • Only dates where the closing price exceeds 50 JPY are kept to ensure data quality.

2. Feature Selection

  • Close and Open prices are extracted separately.
  • Close prices are used to calculate moving averages and technical indicators.
  • Open prices are used for evaluating real-world buy/sell execution prices.

3. Dataset Splitting

The data is divided into three sets:

  • Training Set (60%)
  • Validation Set (20%)
  • Test Set (20%)

This ensures that the models are evaluated on unseen data and can generalize well.

4. Feature Engineering

For the seller agent, the following features are calculated:

  • Return from Buy Price (%)
  • Return from Average Price (%)
  • RSI (Relative Strength Index)
  • Elapsed Time Since Purchase (logarithmic scaling)

These features form a 4-dimensional input to the selling agent's network.

5. Data Normalization

For the buyer agent:

  • Recent close prices (look-back window) are normalized using scikit-learn scalers:
    • StandardScaler
    • MinMaxScaler
    • RobustScaler (randomly selected)

This improves model convergence during training.

Notes

  • No future information is leaked into the past (strictly causal features).
  • Random factors such as scaling choice are intended to improve model robustness.
Clone this wiki locally