A modular NLP pipeline for binary sentiment classification on IMDb movie reviews using Hugging Face Transformers. The project demonstrates scalable model evaluation, multilingual compatibility (tabularisai), and clean engineering practices.
- Goal: Predict whether a movie review expresses positive or negative sentiment
- Dataset: IMDb Dataset via Hugging Face
datasets
library - Models:
- Evaluation:
- Multiclass accuracy using model-defined label IDs
- Custom bucketed evaluation to map diverse outputs into binary classes (positive/negative)
imdb-sentiment-classifier/
├── scripts/
│ └── run_pipeline.py # Entry point to run the full workflow
├── src/
│ ├── load_data.py # Loads the dataset
│ ├── preprocess.py # Tokenizes and decodes review texts │ ├── inference.py # Performs inference with chosen model
│ └── evaluate.py # Includes multiclass and binary evaluation
├── outputs/ # Evaluation results and logs
└── metrics_log.md # pasted results of metrics from demonstrated models
├── requirements.txt # Project dependencies
└── README.md # You are here
Make sure you're in the project root directory and have Python 3.8+.
- Install dependencies
pip install -r requirements.txt
- Run the pipeline
python -m scripts/run_pipeline.py
Metric | Positive | Negative | Accuracy |
---|---|---|---|
Precision | 0.89 | 0.95 | 0.92 |
Recall | 0.93 | 0.90 | |
F1-Score | 0.91 | 0.92 |
The DistilBERT model achieved strong performance, with balanced precision and recall across both sentiment classes. Evaluation was based on a stratified sample of 500 reviews.
When evaluated using label bucketing (e.g., grouping "Very Positive" and "Positive" together), the multilingual model achieved 100% accuracy on a reduced subset after removing neutral entries:
Classes Accuracy Positive vs Negative 1.00
Neutral predictions (not present in the original dataset) were excluded dynamically to enable fair binary classification.
Hugging Face Transformers & Pipelines
Tokenization, decoding, and review preprocessing
Label mapping and evaluation customization
Modular pipeline design for structured inference
Label bucketing
Metrics visualization: classification reports