Skip to content

preciousoben/Amazon-Best-Seller-Books-In-Depth-Exploratory-Data-Analysis

Repository files navigation

Amazon Best Seller Books In Depth Exploratory Data Analysis

This project explores a dataset of Amazon Books to uncover key patterns and trends in book ratings, reviews, prices, and genres. By leveraging Python for data analysis and visualizations,KNIME Analytics for streamlined workflows, as well as Power BI for interactive insights, the analysis provides a comprehensive view of what drives book popularity and sales on the Amazon platform.


Project Structure & Highlights**

  • Technologies Used: Python, Jupyter Notebook, KNIME Analytics, Power BI
  • Dataset: Contains information on book titles, genres, prices, ratings, and reviews
  • Skills Demonstrated:
    • Workflow automation with KNIME
    • Data cleaning and preparation
    • Statistical analysis
    • Data visualization using Matplotlib and Seaborn
    • Dashboard design with Power BI

How to Explore the Project

  1. Interactive Jupyter Notebook:
    Click here to view the Jupyter Notebook on Google Colab.

  2. Power BI Dashboard:
    Explore the insights interactively with a Power BI Service account here.

  3. Python Script:
    The complete Python code for this project is available in the Amazon_Books_Project.py file in this repository.

  4. KNIME Workflow: Access the KNIME workflow file in this repository to see how automation was implemented in data processing.


Project Workflow

  1. Data Cleaning & Preparation

    • Removed missing or duplicate entries
    • Standardized data types and ensured consistency
  2. Exploratory Data Analysis (EDA)

    • Analyzed the distribution of book ratings and prices.
    • Investigated corellation of genre to rating by extracting Genre data from Google Books API ans OpenLibrary API.
    • Investigated genre popularity and pricing patterns.
    • Examined relationships between ratings, prices, and genres.
  3. Visualizations in Python

    • Created bar charts, histograms, and scatter plots
    • Used heatmaps to uncover correlations between price and ratings.
  4. Workflow Automation with KNIME

    • Built an automated workflow to streamline data processing and analysis
    • Ensured repeatability and efficiency in handling the dataset
  5. Dashboard Insights

    • Key metrics and insights summarized in an interactive Power BI dashboard
    • Users can explore specific genres, price ranges, and ratings interactively

Repository Content

File/Link Description
EDA on Amazon Best Seller Books.py Python script containing the code
Jupyter Notebook Viewable notebook on Google Colab
Power BI Dashboard Interactive Power BI dashboard for insights
Amazon_best_seller_books1.knwf KNIME workflow file for automated data processing

How to Run the Code Locally

  1. Clone this repository:
  2. Install the required libraries pip install -r requirements.txt
  3. Run the Python Script EDA on Amazon Best Seller Books.py

About

Exploratory Data Analysis on Amazon's Best Selling Books with insights using Python and Power BI Dashboard.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages