π ETL Pipeline: Kaggle β Python β MS SQL Server
This project demonstrates a complete ETL (Extract, Transform, Load) workflow:
- Extracted data from Kaggle using the Kaggle API key
- Transformed and cleaned the dataset using Python (Pandas)
- Loaded the refined data into Microsoft SQL Server
- Performed structured SQL queries (including CTEs)
- Connected back to Python using SQLAlchemy and pyodbc for analysis
This project demonstrates how to use Microsoft SQL Server for database operations and connect it with Python for data extraction, manipulation, and analysis.
You can download the result CSV directly here:
-
SQLQuery1_2.sql
Contains SQL scripts for:- Creating database/tables
- Performing CRUD operations
- Executing queries
- Common Table Expressions (CTEs)
-
Python+SQL.ipynb
A Jupyter Notebook showcasing:- Connecting Python to MS SQL Server
- Running SQL queries using Python (via
pyodbc
or similar) - Fetching and analyzing data using Pandas
- MS SQL Server
- SQL Server Management Studio (SSMS)
- Python 3.x
- Jupyter Notebook
sqlalchemy
pyodbc
pandas
- Open and run
SQLQuery1_2.sql
in SSMS to create and populate the database. - Ensure your SQL Server instance is running and accessible.
- Open
Python+SQL.ipynb
in Jupyter Notebook. - Update the connection string with your SQL Server credentials.
- Run the notebook cells to execute and analyze SQL queries from Python.
- Practice SQL querying in MS SQL Server.
- Integrate SQL database with Python for analytics.
- Demonstrate real-time database interactions from Python.
- You may need to install required Python libraries using:
pip install pyodbc pandas sqlalchemy