Skip to content

A hybrid text summarization system using extractive (spaCy) and abstractive (BERT/GPT) techniques to summarize long-form content from the CNN/DailyMail dataset. Includes model fine-tuning and evaluation on real-world articles.

Notifications You must be signed in to change notification settings

ahsankhizar5/text-summarization-cnn-dailymail

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Text Summarization with CNN/DailyMail Dataset

A comprehensive text summarization system that condenses lengthy news articles and blogs into concise summaries using both extractive and abstractive methods. Built with spaCy, BERT, and GPT via HuggingFace Transformers, the project explores the strengths of traditional and modern NLP techniques.

📄 Includes a detailed Report.pdf and explanatory ProjectVideo.mp4 for academic presentation or documentation.

📦 Repository: https://github.com/ahsankhizar5/text-summarization-cnn-dailymail.git


✨ Features

  • 📚 Preprocessing of large-scale text data
  • ✂️ Extractive summarization using spaCy
  • 🤖 Abstractive summarization using BERT and GPT via HuggingFace
  • 🎯 Fine-tuning transformer models for improved output
  • 🧪 Evaluation of summaries on real-world content
  • 📄 Includes Report & Presentation Video

🚀 Getting Started

1. Clone the Repository

git clone https://github.com/ahsankhizar5/text-summarization-cnn-dailymail.git
cd text-summarization-cnn-dailymail

2. Install Requirements

pip install spacy transformers datasets torch nltk
python -m nltk.downloader punkt

✅ Ensure you are using Python 3.7+ for compatibility with HuggingFace.

3. Run the Notebook

Open the Code.ipynb notebook and follow the cells step-by-step to run extractive and abstractive summarization pipelines.


🛠️ Tech Stack

  • Python
  • spaCy – Extractive summarization
  • HuggingFace Transformers – Abstractive summarization
  • BERT, GPT-2 – Pre-trained language models
  • NLTK, PyTorch – NLP and deep learning backends

📁 Folder Structure

├── Code.ipynb
├── Report.pdf
└── ProjectVideo.mp4

🤝 Want to Contribute?

  1. Fork the repo

  2. Create a branch

    git checkout -b feature/your-feature
  3. Commit your changes

    git add .
    git commit -m "Add your feature"
  4. Push and submit a PR

    git push origin feature/your-feature

📄 License

MIT License — free to use, modify, and distribute.


🌟 Give a Star

If this project helped you, inspired you, or saved you time — consider giving it a ⭐ on GitHub!


🧠 "In a world full of information, clarity is power."

About

A hybrid text summarization system using extractive (spaCy) and abstractive (BERT/GPT) techniques to summarize long-form content from the CNN/DailyMail dataset. Includes model fine-tuning and evaluation on real-world articles.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published