Skip to content

A modular NLP pipeline for binary sentiment classification on IMDb movie reviews using Hugging Face Transformers. The project demonstrates scalable model evaluation, multilingual compatibility (tabularisai), and clean engineering practices.

Notifications You must be signed in to change notification settings

owenwienczkowski/Hugging-Face-IMDb-Fine-Tuned-Sentiment-Classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IMDb Sentiment Classifier using Hugging Face Transformers

A modular NLP pipeline for binary sentiment classification on IMDb movie reviews using Hugging Face Transformers. The project demonstrates scalable model evaluation, multilingual compatibility (tabularisai), and clean engineering practices.

Project Overview

Directory Structure

imdb-sentiment-classifier/ 
├── scripts/ 
│ └── run_pipeline.py # Entry point to run the full workflow 
├── src/ 
│ ├── load_data.py # Loads the dataset 
│ ├── preprocess.py # Tokenizes and decodes review texts │ ├── inference.py # Performs inference with chosen model
│ └── evaluate.py # Includes multiclass and binary evaluation 
├── outputs/ # Evaluation results and logs 
└── metrics_log.md # pasted results of metrics from demonstrated models 
├── requirements.txt # Project dependencies 
└── README.md # You are here

How to Run

Make sure you're in the project root directory and have Python 3.8+.

  1. Install dependencies
    pip install -r requirements.txt
  2. Run the pipeline
    python -m scripts/run_pipeline.py

Results

DistilBERT

Metric Positive Negative Accuracy
Precision 0.89 0.95 0.92
Recall 0.93 0.90
F1-Score 0.91 0.92

The DistilBERT model achieved strong performance, with balanced precision and recall across both sentiment classes. Evaluation was based on a stratified sample of 500 reviews.

Tabularisai

When evaluated using label bucketing (e.g., grouping "Very Positive" and "Positive" together), the multilingual model achieved 100% accuracy on a reduced subset after removing neutral entries:

Classes Accuracy Positive vs Negative 1.00

Neutral predictions (not present in the original dataset) were excluded dynamically to enable fair binary classification.

Skills Demonstrated

Hugging Face Transformers & Pipelines

Tokenization, decoding, and review preprocessing

Label mapping and evaluation customization

Modular pipeline design for structured inference

Label bucketing

Metrics visualization: classification reports

About

A modular NLP pipeline for binary sentiment classification on IMDb movie reviews using Hugging Face Transformers. The project demonstrates scalable model evaluation, multilingual compatibility (tabularisai), and clean engineering practices.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages