This project analyzes daily industrial gas consumption data from NaTran and Terega across various regions in France. It involves an Exploratory Data Analysis (EDA) to understand trends, seasonality, and regional patterns, followed by time series forecasting for a specific region (Île-de-France) using Facebook Prophet.
The project is contained within a single Jupyter Notebook (time_series_forecasting.ipynb
) and is divided into two main parts:
-
Exploratory Data Analysis (EDA):
- Loading and initial inspection of the dataset.
- Data cleaning: handling missing values, converting data types, dropping irrelevant columns (hourly data, sector, status).
- Column renaming for clarity.
- Analysis of overall consumption trends over time.
- Exploration of consumption patterns across different regions (daily, monthly, annual).
- Identification of top consuming regions.
- Seasonality analysis (monthly and day-of-week).
- All generated plots are saved to an
images/
directory.
-
Time Series Forecasting with Prophet:
- Focuses on forecasting daily gas consumption for the Île-de-France region (Code_Region 11).
- Data preparation specific to Prophet (renaming columns to
ds
andy
). - Train-test split of the regional time series.
- Building and fitting a Prophet model.
- Generating future predictions on the test set.
- Visualizing the forecast and its components (trend, yearly seasonality, weekly seasonality).
- Evaluating model performance using metrics like MAE, MSE, RMSE, and MAPE.
- Includes an optional step for Prophet's built-in cross-validation.
- Source File:
conso-journa-industriel.csv
- Description: The dataset presents daily (originally hourly) gas consumption for industrial clients of NaTran and Terega at a regional level in France (in MWh PCS 0°C).
- Expected Location: The notebook expects the dataset to be located at
../Energy Consumption/data/conso-journa-industriel.csv
relative to the notebook's directory. Please adjust the path in the notebook (e3011751_load_new
cell) if your file is located elsewhere.
- Python 3.8+
- Jupyter Notebook or JupyterLab
- The following Python libraries (see
requirements.txt
for versions, or install manually):pandas
matplotlib
seaborn
numpy
prophet
(Facebook Prophet)scikit-learn
(for metrics)calendar
(standard library)re
(standard library)os
(standard library)
-
Clone the repository or download the files.
-
Create a virtual environment (recommended):
python -m venv venv
-
Install dependencies: It's highly recommended to use a
requirements.txt
file.pip install -r requirements.txt
Note: Installing Prophet can sometimes be tricky. Refer to the official Prophet installation guide if you encounter issues: https://facebook.github.io/prophet/docs/installation.html
-
Place the dataset: Ensure the
conso-journa-industriel.csv
file is in the correct path as expected by the notebook, or update the path in the notebook. A suggested structure:energy-consumption-forecast/ ├── data/ │ └── conso-journa-industriel.csv ├── time_series_forecasting.ipynb ├── images/ (this directory will be created by the notebook if it doesn't exist) ├── requirements.txt ├── LICENSE ├── Report └── README.md
- Activate your virtual environment:
# On Windows venv\Scripts\activate # On macOS/Linux source venv/bin/activate
- Start Jupyter Lab or Jupyter Notebook:
jupyter lab # or jupyter notebook
- Open the
time_series_forecasting.ipynb
notebook. - Run the cells sequentially from top to bottom. The notebook will create an
images/
directory in the same location as the notebook to store the generated plots.
- Console Output: The notebook cells will output data summaries, shapes, information about data cleaning steps, model fitting progress, and performance metrics.
- Plots: Various visualizations will be displayed inline and also saved as PNG files in the
images/
directory. These include:- Distribution plots
- Time series plots (overall, by region, by month)
- Bar charts for annual consumption and top regions
- Seasonality boxplots
- Prophet forecast plots
- Prophet component plots
- Actual vs. Predicted plots for the test set
This project is licensed under the MIT License. See the LICENSE file for details.