Scam Detection Using NLP & Machine Learning

Overview

This project is a Scam Detection System designed to identify fraudulent messages across text-based communication platforms like (e.g., WhatsApp, Instagram, Telegram). It uses a hybrid approach combining Natural Language Processing (NLP) techniques (TF-IDF, N-gram analysis) and Machine Learning (Random Forest) to detect scams with high accuracy and minimal false positives.

Features

Hybrid Detection: Combines pattern-based (N-gram) and ML-based (TF-IDF + Random Forest) analysis.
Multi-Platform: Detects scams from various messaging platforms.
Scam Type Classification: Identifies the type of scam (phishing, impersonation, financial, urgent, etc.).
Continuous Learning: Learns from user feedback to adapt to new scam tactics.
Educational Component: Provides information to help users recognize scams.

Project Structure

run.py — Main entry point; starts the Flask web server and runs evaluation.
scam_detector.py — Core logic for text processing, feature extraction, and ML models.
routes.py — Flask routes for web/API interaction.
index.html — Web interface for user interaction.
__init__.py — Flask app initialization.
Financial.html — Educational page for financial scams.
Phishing.html — Educational page for phishing scams.
Urgency.html — Educational page for urgent scams.
Impersonation.html — Educational page for impersonation scams.

Installation

Clone the repository:

git clone https://github.com/yourusername/Scam-Detection-Using-NLP-Machine-Learning-Text-Based.git
cd Scam-Detection-Using-NLP-Machine-Learning-Text-Based

Create and activate a virtual environment (recommended):
```
python3 -m venv venv
source venv/bin/activate
```

Install dependencies:

pip install flask scikit-learn pandas numpy nltk

Download NLTK resources (automatically handled on first run):
- The system will download required NLTK resources (punkt, stopwords, wordnet, averaged_perceptron_tagger) if not already present.

Usage

Run the application:
```
python run.py
```
Access the web interface:
- Open your browser and go to: http://127.0.0.1:5000
How it works:
- Paste or type a message into the web interface.
- The system will analyze the message and display the risk level, scam type, and important features.
- You can provide feedback to help the system learn and improve.

Evaluation

On startup, the system performs a comprehensive evaluation (cross-validation and holdout test) and prints results in the terminal.
Metrics include accuracy, precision, recall, F1 score, and a confusion matrix.

Notes

This is a Web Application (Not directly implemented into WhatsApp or Instagram.
The system currently supports only English-language messages.
Training data is stored in data/training_data.csv and grows as users provide feedback.
In the event of dataset corruption, the system incorporates a backup mechanism that restores the dataset using the flagged_csv file. (This feature was implemented to address issues encountered during the production phase of the project)

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
app		app
data		data
venv		venv
.nltk_resources_downloaded		.nltk_resources_downloaded
README.md		README.md
requirements.txt		requirements.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Scam Detection Using NLP & Machine Learning

Overview

Features

Project Structure

Installation

Usage

Evaluation

Notes

License

About

Uh oh!

Releases

Packages

Languages

Sameer0791/Scam-Detection-Using-NLP-Machine-Learning-Text-Based

Folders and files

Latest commit

History

Repository files navigation

Scam Detection Using NLP & Machine Learning

Overview

Features

Project Structure

Installation

Usage

Evaluation

Notes

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages