Hello! My name is Mrityunjay Pathak.
I'm a data scientist passionate about building real-world, end-to-end data solutions - from data analysis and dashboards to machine learning and deployment. I love creating projects that don't just stay in notebooks, but live on the internet - making them interactive, accessible and valuable for everyone.
Some projects I've worked on :
- AutoIQ : Car Price Prediction
- Built a car price prediction system using FastAPI and Docker, trained on 2,800+ scraped car records from Cars24.
- Deployed an interactive HTML/CSS/JS application on GitHub Pages that connects to the API, allowing users to get real-time price predictions.
- Pickify : Movie Recommender System
- Built a content-based movie recommender system using metadata from 5,000+ movies.
- Integrated the TMDB API to fetch and display movie posters dynamically, delivering a personalized user experience.
- Dashly : Live Sales Dashboard
- Built a live Power BI dashboard connected to a Neon PostgreSQL database, containing 50,000+ sales records.
- Developed an automated ETL pipeline using GitHub Actions to collect and ingest data daily, keeping the dashboard continuously updated with the latest insights.
Tools and Technologies I've worked with :
- Programming Language : Python
- Libraries : NumPy, Pandas, Matplotlib, Seaborn, Plotly
- Machine Learning : Scikit-learn
- Database : MySQL, PostgreSQL
- BI Tool : Power BI
- Web Framework : FastAPI
- Containerization : Docker
- Version Control : Git
- Automation : GitHub Actions
I'm currently looking for opportunities in Data Science/Data Analytics, where I can contribute to building data-driven solutions that create measurable business impact.
If you're looking for someone who's eager to learn, collaborate and deliver results, I'd love to connect and explore how I can add value to your team.
📫 Connect with Me
Kaggle | LinkedIn | GitHub | Medium | Portfolio
➔ Problem
- In the used car market, buyers and sellers often struggle to determine a fair price for their vehicle.
- This project aims to provide accurate and transparent pricing for used cars by analyzing real-world data.
➔ Solution
- Built and deployed an end-to-end machine learning pipeline to predict used car prices from real-world data.
- Collected and cleaned 2,800+ used car records from Cars24 using Selenium and BeautifulSoup.
- Optimized memory consumption of the dataset by downcasting data types and converting to Parquet format.
- Trained models with Scikit-learn Pipelines & ColumnTransformer to avoid leakage.
- Deployed the machine learning model as an API using FastAPI on Render.
- Built a HTML/CSS/JS frontend hosted on GitHub Pages to interact with the API and display predictions in real-time.
- Containerized the entire application using Docker and pushed to Docker Hub for reproducibility.
➔ Results
- Reduced dataset memory usage by 90% using optimized storage techniques.
- Achieved a 30% lower MAE and a 12% higher R2-score compared to the baseline model.
- Improved model stability by 70%, ensuring more stable and reliable predictions.
➔ Impact
- Helps car owners quickly find the right selling price for their vehicles based on real-world data.
- Makes it easier for buyers to know if a car is fairly priced before making a purchase.
➔ Problem
- With the rise of streaming services, viewers now have access to thousands of movies across platforms.
- As a result, many viewers spend more time browsing than actually watching.
- This problem can lead to frustration, lower satisfaction and less time spent on the platform.
- Ultimately, this impacts both user experience and business performance.
➔ Solution
- Built a content-based movie recommender system trained on 5,000+ movie metadata records.
- Recommends the top 5 similar titles for any selected movie in ~2.5 seconds per recommendation.
- Integrated the TMDB API to dynamically fetch and display movie posters, enhancing user experience.
- Deployed as a Streamlit web app, used by 100+ users to discover personalized movie suggestions.
➔ Impact
- If this system gets scaled and integrated with a streaming service, this could :
- Reduce the time users spend choosing what to watch.
- Increase user engagement, watch time and customer satisfaction.
- Help streaming platforms retain users by offering better personalized content.
➔ Problem Statement
- To analyze Netflix content data, uncovering valuable insights into how the platform evolves over time.
➔ Some Key Findings
- Cleaned and analyzed a dataset of 8,000+ Netflix Movies and TV Shows.
- More than 60% of the content on Netflix is rated for mature audiences.
- Suggests that Netflix targets adult viewers to boost engagement and retention.
- More than 25% of the Movies and TV Shows were released on 1st day of the month.
- Shows a consistent release schedule, likely aligned with subscription renewal cycles.
- More than 40% of the content on Netflix is exclusive to United States.
- Shows a strong focus on U.S. market and content availability by location.
- More than 20% of the content on Netflix falls under the "Drama" genre.
- Confirms that "Drama" is a key part of Netflix's content library.
- More than 23% of the content on Netflix was released in 2019 alone.
- Indicates a major content push that year, possibly tied to growth or user acquisition efforts.
➔ Problem Statement
- To analyze Supermarket Sales data, identifying key factors for improving profitability and operational efficiency.
➔ Some Key Findings
- Analyzed purchasing patterns of 9,000+ customers of a Supermarket.
- More than 15% of the products sold were Snacks.
- Shows that Snacks are a convenient choice and a major source of revenue.
- More than 32% of total sales came from the West region of the Supermarket.
- Suggests that West region is a strong performing area as compared to others.
- Health and Soft drinks were the most profitable sub-categories in Beverages.
- Shows that both type of drink options perform well among customers.
- November was the most profitable month contributing about 15% of the total annual profits.
- Makes it an ideal time for running promotions and special offers.
























