This project explores a dataset of Amazon Books to uncover key patterns and trends in book ratings, reviews, prices, and genres. By leveraging Python for data analysis and visualizations,KNIME Analytics for streamlined workflows, as well as Power BI for interactive insights, the analysis provides a comprehensive view of what drives book popularity and sales on the Amazon platform.
- Technologies Used: Python, Jupyter Notebook, KNIME Analytics, Power BI
- Dataset: Contains information on book titles, genres, prices, ratings, and reviews
- Skills Demonstrated:
- Workflow automation with KNIME
- Data cleaning and preparation
- Statistical analysis
- Data visualization using Matplotlib and Seaborn
- Dashboard design with Power BI
-
Interactive Jupyter Notebook:
Click here to view the Jupyter Notebook on Google Colab. -
Power BI Dashboard:
Explore the insights interactively with a Power BI Service account here. -
Python Script:
The complete Python code for this project is available in theAmazon_Books_Project.py
file in this repository. -
KNIME Workflow: Access the KNIME workflow file in this repository to see how automation was implemented in data processing.
-
Data Cleaning & Preparation
- Removed missing or duplicate entries
- Standardized data types and ensured consistency
-
Exploratory Data Analysis (EDA)
- Analyzed the distribution of book ratings and prices.
- Investigated corellation of genre to rating by extracting Genre data from Google Books API ans OpenLibrary API.
- Investigated genre popularity and pricing patterns.
- Examined relationships between ratings, prices, and genres.
-
Visualizations in Python
- Created bar charts, histograms, and scatter plots
- Used heatmaps to uncover correlations between price and ratings.
-
Workflow Automation with KNIME
- Built an automated workflow to streamline data processing and analysis
- Ensured repeatability and efficiency in handling the dataset
-
Dashboard Insights
- Key metrics and insights summarized in an interactive Power BI dashboard
- Users can explore specific genres, price ranges, and ratings interactively
File/Link | Description |
---|---|
EDA on Amazon Best Seller Books.py |
Python script containing the code |
Jupyter Notebook | Viewable notebook on Google Colab |
Power BI Dashboard | Interactive Power BI dashboard for insights |
Amazon_best_seller_books1.knwf | KNIME workflow file for automated data processing |
- Clone this repository:
- Install the required libraries pip install -r requirements.txt
- Run the Python Script EDA on Amazon Best Seller Books.py