- Compare the accuracy of various time series forecasting algorithms such as Prophet, DeepAR, VAR, DeepVAR, and LightGBM
- (Optional) Use
tsfreshfor automated feature engineering of time series data.
- The dataset can be downloaded from this Kaggle competition.
- In addition to the Anaconda libraries, you need to install
altair,vega_datasets,category_encoders,mxnet,gluonts,kats,lightgbm,hyperoptandpandarallel.katsrequires Python 3.7 or higher.
- The M5 Competition aims to forecast daily sales for the next 28 days based on sales over the last 1,941 days for IDs of 30,490 items per Walmart store.
- Data includes (i) time series of daily sales quantity by ID, (ii) sales prices, and (iii) holiday and event information.
- Evaluation is done through Weighted Root Mean Squared Scaled Error. A detailed explanation is given in the M5 Participants Guide and the implementation is at this link.
- For hyperparameter tuning, 0.1% of IDs were randomly selected and used, and 1% were used to measure test set performance.
- Prophet can incorporate forward-looking related time series into the model, so additional features were created with holiday and event information.
- Since a Prophet model has to fit for each ID, I had to use the
applyfunction of thepandas dataframeand instead usedpandarallelto maximize the parallelization performance. - Prophet hyperparameters were tuned through 3-fold CV using the Bayesian Optimization module built into the
Katslibrary. In this case, Tweedie was applied as the loss function. Below is the hyperparameter tuning result.
| seasonality_prior_scale | changepoint_prior_scale | changepoint_range | n_changepoints | holidays_prior_scale | seasonality_mode |
|---|---|---|---|---|---|
| 0.01 | 0.046 | 0.93 | 5 | 100.00 | multiplicative |
- In the figures below, the actual sales (black dots), the point predictions and confidence intervals (blue lines and bands), and the red dotted lines representing the test period are shown.
- Since VAR is a multivariate time series model, the more IDs it fits simultaneously, the better the performance, and the memory requirement increases exponentially.
- DeepAR can incorporate metadata and forward-looking related time series into the model, so additional features were created with sales prices, holiday and event information. Dynamic categorical variables were quantified through Feature Hashing.
- As a hyperparameter, it is very important to set the probability distribution of the output, and here it is set as the Negative Binomial distribution.
- In the case of DeepVAR, a multivariate model, what can be set as the probability distribution of the output is limited (i.e. Multivariate Gaussian distribution), which leads to a decrease in performance.
- I used
tsfreshto convert time series into structured data features, which consumes a lot of computational resources even with minimal settings. - A LightGBM Tweedie regression model was fitted. Hyperparameters were tuned via 3-fold CV using the Bayesian Optimization function of the
hyperoptlibrary. The following is the hyperparameter tuning result.
| boosting | learning_rate | num_iterations | num_leaves | min_data_in_leaf | min_sum_hessian_in_leaf | bagging_fraction | bagging_freq | feature_fraction | extra_trees | lambda_l1 | lambda_l2 | path_smooth | max_bin |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| gbdt | 0.01773 | 522 | 11 | 33 | 0.0008 | 0.5297 | 4 | 0.5407 | False | 2.9114 | 0.2127 | 217.3879 | 1023 |
- The sales forecast for day D+1 was used recursively to predict the sales volume for day D+2 through feature engineering, and through this iterative process, 28-day test set performance was measured.
| Algorithm | WRMSSE | sMAPE | MAE | MASE | RMSE |
|---|---|---|---|---|---|
| DeepAR | 0.7513 | 1.4200 | 0.8795 | 0.9269 | 1.1614 |
| LightGBM | 1.0701 | 1.4429 | 0.8922 | 0.9394 | 1.1978 |
| Prophet | 1.0820 | 1.4174 | 1.1014 | 1.0269 | 1.4410 |
| VAR | 1.2876 | 2.3818 | 1.5545 | 1.6871 | 1.9502 |
| Naive Method | 1.3430 | 1.5074 | 1.3730 | 1.1077 | 1.7440 |
| Mean Method | 1.5984 | 1.4616 | 1.1997 | 1.0708 | 1.5352 |
| DeepVAR | 4.6933 | 4.6847 | 1.9201 | 1.3683 | 2.3195 |
As a result, DeepAR was finally selected and submitted its predictions to Kaggle, achieving a WRMSSE value of 0.8112 based on the private leaderboard.
- Taylor SJ, Letham B. 2017. Forecasting at scale. PeerJ Preprints 5:e3190v2
- Prophet: Forecasting at Scale
- Stock, James, H., Mark W. Watson. 2001. Vector Autoregressions. Journal of Economic Perspectives, 15 (4): 101-115.
- David Salinas, Valentin Flunkert, Jan Gasthaus, Tim Januschowski. 2020. DeepAR: Probabilistic forecasting with autoregressive recurrent networks, International Journal of Forecasting, 36 (3): 1181-1191.
- David Salinas, Michael Bohlke-Schneider, Laurent Callot, Roberto Medico, Jan Gasthaus. 2019. High-dimensional multivariate forecasting with low-rank Gaussian Copula Processes. In Advances in Neural Information Processing Systems. 6827–6837.
- Kats - One Stop Shop for Time Series Analysis in Python
- GluonTS - Probabilistic Time Series Modeling