📘 Emotion Classification in Natural Language Text

🔰 Overview

This project is an NLP-based Persian Emotion Classifier built with ParsBERT.
It predicts emotions from Persian text across 7 categories:

😃 HAPPY
😨 FEAR
😢 SAD
😡 HATE
😠 ANGRY
😲 SURPRISED
❓ OTHER

The project has two main parts:

Model Training → Fine-tuning ParsBERT on a Persian emotion dataset using Kaggle GPU.
Streamlit App → An interactive interface where users enter text and get predictions in real-time.

📦 Dataset

We used the dataset hosted on Kaggle:
👉 Emotions in Persian Texts

6,000 training samples
1,000 test samples
Each record contains:
- A Persian text
- An emotion label (one of 7 categories)

⚙️ Installation

Clone the repository and install dependencies:

git clone https://github.com/your-username/persian-emotion-classifier.git
cd persian-emotion-classifier
pip install -r requirements.txt

🚀 Usage

To run the application locally:

✅ Open a terminal in the project folder
✅ Run the following command:
```
streamlit run app.py
```

🧹 Preprocessing Steps

Before training, the Persian text data went through several preprocessing steps:

✅ Text Normalization
- Remove elongated letters (e.g., خیییییییلی → خیلی)
- Unify punctuation (convert Arabic to Persian variants, normalize question marks, etc.)
- Remove extra spaces and unwanted characters (like emojis or Latin words if present)
✅ Tokenization
- Use the ParsBERT tokenizer (HooshvareLab/bert-base-parsbert-uncased)
- Truncate or pad sequences to a maximum of 128 tokens
✅ Label Encoding
- Convert categorical labels (HAPPY, SAD, etc.) into integer IDs
- Example: {"HAPPY": 0, "FEAR": 1, "SAD": 2, ...}
✅ Dataset Splitting
- Training set → 6,000 samples
- Test set → 1,000 samples

📌 After preprocessing, the dataset was ready for input into the ParsBERT model.

🧠 Model Training Steps

The fine-tuning of ParsBERT was done on Kaggle with GPU acceleration.
Below are the main steps:

✅ Model Selection
- Base model: HooshvareLab/bert-base-parsbert-uncased
- Suitable for Persian NLP tasks
✅ Training Environment
- Kaggle GPU
- Python 3.10, Transformers 4.40+
✅ Hyperparameters
- Epochs: 4
- Batch size: 16
- Learning rate: 2e-5
- Max sequence length: 128
✅ Evaluation Metric
- Macro F1-score (handles class imbalance better than accuracy)
✅ Training Process
- Train dataset: 6,000 samples
- Test dataset: 1,000 samples
- Saved the best model checkpoint at the end of training
✅ Results
- Final Macro F1-score: ~0.72
- Good overall performance
- Some overlap between ANGRY, HATE, and SAD classes (expected due to semantic similarity)

📌 After training, the best-performing model was exported to the folder parsbert-emotion/ for use in the Streamlit app.

📊 Example Prediction Steps

Here’s how the model performs on a sample Persian input:

✅ User Input
من خیلی ناراحت هستم
✅ Model Processing

Text is tokenized with ParsBERT tokenizer
Input is passed through the fine-tuned parsbert-emotion model

✅ Prediction Output
Example probabilities:

😢 SAD → 0.82
😠 ANGRY → 0.10
❓ OTHER → 0.05

✅ Visualization

Streamlit displays a bar chart of all 7 emotion probabilities
The highest probability is shown as the predicted emotion

📌 In this case, the model correctly classifies the sentence as expressing Sadness.

🧪 Troubleshooting

Common issues and their solutions when running the project:

❌ Error: meta tensor or corrupted model
- Cause: Model files were incomplete or corrupted
- ✅ Solution:
  - Re-extract the parsbert-emotion/ folder
  - Ensure it contains all required files:
    (pytorch_model.bin, config.json, tokenizer.json, vocab.txt, etc.)
❌ Blank Streamlit Page
- Cause: Running with python app.py instead of Streamlit
- ✅ Solution:
  - Always run with:
```
streamlit run app.py
```
❌ NumPy Conversion Error
- Cause: Attempting .numpy() directly on a GPU tensor
- ✅ Solution:
  - Use .cpu().numpy() instead to safely move tensors to CPU
❌ Model ID Not Found
- Cause: parsbert-emotion is not available on Hugging Face Hub
- ✅ Solution:
  - Use the local folder name when loading:
```
AutoModelForSequenceClassification.from_pretrained("parsbert-emotion")
```

📌 Following these steps should resolve most issues encountered when running the model or app.

🚀 Deployment Options

You can run the Persian Emotion Classifier both locally and online:

✅ Local Deployment
- Run the app with:
```
streamlit run app.py
```
- Opens on http://localhost:8501
✅ Online Deployment Options
- Streamlit Cloud → Quick, free deployment for sharing apps
- Hugging Face Spaces → Free hosting with Streamlit or Gradio

📦 Deliverables

The project includes the following components:

✅ Fine-tuned model → parsbert-emotion/
✅ Training notebook → emotions-classification-nlp.ipynb
✅ Streamlit app → app.py
✅ Project report → report.pdf
✅ Dataset → Emotions in Persian Texts

🏁 Conclusion

✅ Objective Achieved
- Fine-tuned ParsBERT for Persian emotion classification
✅ Key Features
- Supports 7 emotion classes
- Achieved Macro F1 ~0.72
- Integrated with an interactive Streamlit app
✅ Impact
- Demonstrates the power of transformer models in low-resource languages like Persian
- Provides a practical tool for text emotion analysis
✅ Future Improvements
- Expand dataset with more labeled examples
- Explore advanced transformer models (e.g., RoBERTa, mBERT)
- Deploy on cloud for public access

📌 With this project, we showed that transformer-based NLP models can effectively handle Persian emotion classification, bridging the gap for practical AI applications in Persian language processing.

⚖️ License

This work is licensed under a Creative Commons Attribution–NonCommercial–NoDerivatives 4.0 International License.
Use in CVs, portfolios, or derivative works is not permitted without explicit permission from the author.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
README.md		README.md
app.py		app.py
emotion-classification-in-natural-language-text.ipynb		emotion-classification-in-natural-language-text.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📘 Emotion Classification in Natural Language Text

🔰 Overview

📦 Dataset

⚙️ Installation

🚀 Usage

🧹 Preprocessing Steps

🧠 Model Training Steps

📊 Example Prediction Steps

🧪 Troubleshooting

🚀 Deployment Options

📦 Deliverables

🏁 Conclusion

⚖️ License

About

Uh oh!

Releases

Packages

Languages

mbsoroush/Emotion-Classification-in-Natural-Language-Text

Folders and files

Latest commit

History

Repository files navigation

📘 Emotion Classification in Natural Language Text

🔰 Overview

📦 Dataset

⚙️ Installation

🚀 Usage

🧹 Preprocessing Steps

🧠 Model Training Steps

📊 Example Prediction Steps

🧪 Troubleshooting

🚀 Deployment Options

📦 Deliverables

🏁 Conclusion

⚖️ License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages