Skip to content
View TheMrityunjayPathak's full-sized avatar
  • Mumbai, Maharashtra, India

Block or report TheMrityunjayPathak

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

 

About

Hello! My name is Mrityunjay Pathak.

I'm a data scientist passionate about building real-world, end-to-end data solutions - from data analysis and dashboards to machine learning and deployment. I love creating projects that don't just stay in notebooks, but live on the internet - making them interactive, accessible and valuable for everyone.

Some projects I've worked on :

  • AutoIQ : Car Price Prediction
    • Built a car price prediction system using FastAPI and Docker, trained on 2,800+ scraped car records from Cars24.
    • Deployed an interactive HTML/CSS/JS application on GitHub Pages that connects to the API, allowing users to get real-time price predictions.
  • Pickify : Movie Recommender System
    • Built a content-based movie recommender system using metadata from 5,000+ movies.
    • Integrated the TMDB API to fetch and display movie posters dynamically, delivering a personalized user experience.
  • Dashly : Live Sales Dashboard
    • Built a live Power BI dashboard connected to a Neon PostgreSQL database, containing 50,000+ sales records.
    • Developed an automated ETL pipeline using GitHub Actions to collect and ingest data daily, keeping the dashboard continuously updated with the latest insights.

Tools and Technologies I've worked with :

  • Programming Language : Python
  • Libraries : NumPy, Pandas, Matplotlib, Seaborn, Plotly
  • Machine Learning : Scikit-learn
  • Database : MySQL, PostgreSQL
  • BI Tool : Power BI
  • Web Framework : FastAPI
  • Containerization : Docker
  • Version Control : Git
  • Automation : GitHub Actions

I'm currently looking for opportunities in Data Science/Data Analytics, where I can contribute to building data-driven solutions that create measurable business impact.

If you're looking for someone who's eager to learn, collaborate and deliver results, I'd love to connect and explore how I can add value to your team.

📫 Connect with Me

Kaggle  |  LinkedIn  |  GitHub  |  Medium  |  Portfolio

Skills



Projects

AutoIQ : Car Price Prediction

    

➔ Problem

  • In the used car market, buyers and sellers often struggle to determine a fair price for their vehicle.
  • This project aims to provide accurate and transparent pricing for used cars by analyzing real-world data.

➔ Solution

  • Built and deployed an end-to-end machine learning pipeline to predict used car prices from real-world data.
  • Collected and cleaned 2,800+ used car records from Cars24 using Selenium and BeautifulSoup.
  • Optimized memory consumption of the dataset by downcasting data types and converting to Parquet format.
  • Trained models with Scikit-learn Pipelines & ColumnTransformer to avoid leakage.
  • Deployed the machine learning model as an API using FastAPI on Render.
  • Built a HTML/CSS/JS frontend hosted on GitHub Pages to interact with the API and display predictions in real-time.
  • Containerized the entire application using Docker and pushed to Docker Hub for reproducibility.

➔ Results

  • Reduced dataset memory usage by 90% using optimized storage techniques.
  • Achieved a 30% lower MAE and a 12% higher R2-score compared to the baseline model.
  • Improved model stability by 70%, ensuring more stable and reliable predictions.

➔ Impact

  • Helps car owners quickly find the right selling price for their vehicles based on real-world data.
  • Makes it easier for buyers to know if a car is fairly priced before making a purchase.

Pickify : Movie Recommender System

  

➔ Problem

  • With the rise of streaming services, viewers now have access to thousands of movies across platforms.
  • As a result, many viewers spend more time browsing than actually watching.
  • This problem can lead to frustration, lower satisfaction and less time spent on the platform.
  • Ultimately, this impacts both user experience and business performance.

➔ Solution

  • Built a content-based movie recommender system trained on 5,000+ movie metadata records.
  • Recommends the top 5 similar titles for any selected movie in ~2.5 seconds per recommendation.
  • Integrated the TMDB API to dynamically fetch and display movie posters, enhancing user experience.
  • Deployed as a Streamlit web app, used by 100+ users to discover personalized movie suggestions.

➔ Impact

  • If this system gets scaled and integrated with a streaming service, this could :
    • Reduce the time users spend choosing what to watch.
    • Increase user engagement, watch time and customer satisfaction.
    • Help streaming platforms retain users by offering better personalized content.

Netflix Data Analysis

  

➔ Problem Statement

  • To analyze Netflix content data, uncovering valuable insights into how the platform evolves over time.

➔ Some Key Findings

  • Cleaned and analyzed a dataset of 8,000+ Netflix Movies and TV Shows.
  • More than 60% of the content on Netflix is rated for mature audiences.
    • Suggests that Netflix targets adult viewers to boost engagement and retention.
  • More than 25% of the Movies and TV Shows were released on 1st day of the month.
    • Shows a consistent release schedule, likely aligned with subscription renewal cycles.
  • More than 40% of the content on Netflix is exclusive to United States.
    • Shows a strong focus on U.S. market and content availability by location.
  • More than 20% of the content on Netflix falls under the "Drama" genre.
    • Confirms that "Drama" is a key part of Netflix's content library.
  • More than 23% of the content on Netflix was released in 2019 alone.
    • Indicates a major content push that year, possibly tied to growth or user acquisition efforts.

Supermarket Sales Analysis

  

➔ Problem Statement

  • To analyze Supermarket Sales data, identifying key factors for improving profitability and operational efficiency.

➔ Some Key Findings

  • Analyzed purchasing patterns of 9,000+ customers of a Supermarket.
  • More than 15% of the products sold were Snacks.
    • Shows that Snacks are a convenient choice and a major source of revenue.
  • More than 32% of total sales came from the West region of the Supermarket.
    • Suggests that West region is a strong performing area as compared to others.
  • Health and Soft drinks were the most profitable sub-categories in Beverages.
    • Shows that both type of drink options perform well among customers.
  • November was the most profitable month contributing about 15% of the total annual profits.
    • Makes it an ideal time for running promotions and special offers.

Certificates

  

Blogs

  

Pinned Loading

  1. AutoIQ AutoIQ Public

    Thinking of buying or selling, Start with AutoIQ

    Jupyter Notebook

  2. Pickify Pickify Public

    Smart movie picks, based on what you love

    Jupyter Notebook

  3. Netflix-Data-Analysis Netflix-Data-Analysis Public

    Netflix Data Analysis

    Jupyter Notebook

  4. Supermarket-Sales-Analysis Supermarket-Sales-Analysis Public

    Supermarket Sales Analysis

    Jupyter Notebook