This project is an NLP-based Persian Emotion Classifier built with ParsBERT.
It predicts emotions from Persian text across 7 categories:
- 😃 HAPPY
- 😨 FEAR
- 😢 SAD
- 😡 HATE
- 😠 ANGRY
- 😲 SURPRISED
- ❓ OTHER
The project has two main parts:
- Model Training → Fine-tuning ParsBERT on a Persian emotion dataset using Kaggle GPU.
- Streamlit App → An interactive interface where users enter text and get predictions in real-time.
We used the dataset hosted on Kaggle:
👉 Emotions in Persian Texts
- 6,000 training samples
- 1,000 test samples
- Each record contains:
- A Persian text
- An emotion label (one of 7 categories)
Clone the repository and install dependencies:
git clone https://github.com/your-username/persian-emotion-classifier.git
cd persian-emotion-classifier
pip install -r requirements.txt
To run the application locally:
-
✅ Open a terminal in the project folder
-
✅ Run the following command:
streamlit run app.py
Before training, the Persian text data went through several preprocessing steps:
-
✅ Text Normalization
- Remove elongated letters (e.g.,
خیییییییلی
→خیلی
) - Unify punctuation (convert Arabic to Persian variants, normalize question marks, etc.)
- Remove extra spaces and unwanted characters (like emojis or Latin words if present)
- Remove elongated letters (e.g.,
-
✅ Tokenization
- Use the ParsBERT tokenizer (
HooshvareLab/bert-base-parsbert-uncased
) - Truncate or pad sequences to a maximum of 128 tokens
- Use the ParsBERT tokenizer (
-
✅ Label Encoding
- Convert categorical labels (HAPPY, SAD, etc.) into integer IDs
- Example:
{"HAPPY": 0, "FEAR": 1, "SAD": 2, ...}
-
✅ Dataset Splitting
- Training set → 6,000 samples
- Test set → 1,000 samples
📌 After preprocessing, the dataset was ready for input into the ParsBERT model.
The fine-tuning of ParsBERT was done on Kaggle with GPU acceleration.
Below are the main steps:
-
✅ Model Selection
- Base model:
HooshvareLab/bert-base-parsbert-uncased
- Suitable for Persian NLP tasks
- Base model:
-
✅ Training Environment
- Kaggle GPU
- Python 3.10, Transformers 4.40+
-
✅ Hyperparameters
- Epochs: 4
- Batch size: 16
- Learning rate: 2e-5
- Max sequence length: 128
-
✅ Evaluation Metric
- Macro F1-score (handles class imbalance better than accuracy)
-
✅ Training Process
- Train dataset: 6,000 samples
- Test dataset: 1,000 samples
- Saved the best model checkpoint at the end of training
-
✅ Results
- Final Macro F1-score: ~0.72
- Good overall performance
- Some overlap between ANGRY, HATE, and SAD classes (expected due to semantic similarity)
📌 After training, the best-performing model was exported to the folder parsbert-emotion/
for use in the Streamlit app.
Here’s how the model performs on a sample Persian input:
-
✅ User Input
من خیلی ناراحت هستم -
✅ Model Processing
- Text is tokenized with ParsBERT tokenizer
- Input is passed through the fine-tuned
parsbert-emotion
model
- ✅ Prediction Output
Example probabilities:
- 😢 SAD →
0.82
- 😠 ANGRY →
0.10
- ❓ OTHER →
0.05
- ✅ Visualization
- Streamlit displays a bar chart of all 7 emotion probabilities
- The highest probability is shown as the predicted emotion
📌 In this case, the model correctly classifies the sentence as expressing Sadness.
Common issues and their solutions when running the project:
-
❌ Error:
meta tensor
or corrupted model- Cause: Model files were incomplete or corrupted
- ✅ Solution:
- Re-extract the
parsbert-emotion/
folder - Ensure it contains all required files:
(pytorch_model.bin
,config.json
,tokenizer.json
,vocab.txt
, etc.)
- Re-extract the
-
❌ Blank Streamlit Page
- Cause: Running with
python app.py
instead of Streamlit - ✅ Solution:
- Always run with:
streamlit run app.py
- Always run with:
- Cause: Running with
-
❌ NumPy Conversion Error
- Cause: Attempting
.numpy()
directly on a GPU tensor - ✅ Solution:
- Use
.cpu().numpy()
instead to safely move tensors to CPU
- Use
- Cause: Attempting
-
❌ Model ID Not Found
- Cause:
parsbert-emotion
is not available on Hugging Face Hub - ✅ Solution:
- Use the local folder name when loading:
AutoModelForSequenceClassification.from_pretrained("parsbert-emotion")
- Use the local folder name when loading:
- Cause:
📌 Following these steps should resolve most issues encountered when running the model or app.
You can run the Persian Emotion Classifier both locally and online:
-
✅ Local Deployment
- Run the app with:
streamlit run app.py
- Opens on
http://localhost:8501
- Run the app with:
-
✅ Online Deployment Options
- Streamlit Cloud → Quick, free deployment for sharing apps
- Hugging Face Spaces → Free hosting with Streamlit or Gradio
The project includes the following components:
- ✅ Fine-tuned model →
parsbert-emotion/
- ✅ Training notebook →
emotions-classification-nlp.ipynb
- ✅ Streamlit app →
app.py
- ✅ Project report →
report.pdf
- ✅ Dataset → Emotions in Persian Texts
-
✅ Objective Achieved
- Fine-tuned ParsBERT for Persian emotion classification
-
✅ Key Features
- Supports 7 emotion classes
- Achieved Macro F1 ~0.72
- Integrated with an interactive Streamlit app
-
✅ Impact
- Demonstrates the power of transformer models in low-resource languages like Persian
- Provides a practical tool for text emotion analysis
-
✅ Future Improvements
- Expand dataset with more labeled examples
- Explore advanced transformer models (e.g., RoBERTa, mBERT)
- Deploy on cloud for public access
📌 With this project, we showed that transformer-based NLP models can effectively handle Persian emotion classification, bridging the gap for practical AI applications in Persian language processing.
This work is licensed under a Creative Commons Attribution–NonCommercial–NoDerivatives 4.0 International License.
Use in CVs, portfolios, or derivative works is not permitted without explicit permission from the author.