Document Summarizer

A lightweight AI app for summarizing and querying PDFs using OpenAI models.

Features

📄 PDF Upload: Support for documents up to 200MB
🤖 AI Summarization: Generate concise summaries using GPT-4o-mini
📝 Key Points: Extract bullet points automatically
❓ Document Q&A: Ask natural-language questions using embeddings-based retrieval
⚙️ Customizable: Adjustable token limits and model selection
🎨 Modern UI: Clean, responsive Streamlit interface

📸 Screenshots

Quick Start

Prerequisites

Python 3.9 or higher
OpenAI API key

Installation

Clone the repository

git clone https://github.com/nickcarndt/Document-Summarizer.git
cd Document-Summarizer

Create virtual environment

python3 -m venv .venv  # On macOS/Linux use python3
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Install dependencies
```
pip install -r requirements.txt
```

Set up environment variables

Create a .env file in the project root:

echo "OPENAI_API_KEY=sk-your-actual-key-here" > .env

Or export directly:

export OPENAI_API_KEY=sk-your-actual-key-here

Activate virtual environment and run the application

source .venv/bin/activate  # On Windows: .venv\Scripts\activate
streamlit run app.py

Open your browser

Navigate to http://localhost:8501

Usage

Upload a PDF: Use the file uploader to select your document
View Summary: The app automatically extracts text and generates a summary
Review Key Points: Browse the automatically generated bullet points
Ask Questions: Use the Q&A section to query specific information about the document

Example Output

Summary: The document outlines a comprehensive strategy for implementing AI-powered document analysis in enterprise environments, focusing on scalability, security, and user experience. It emphasizes the importance of choosing the right LLM model for specific use cases and implementing proper data governance frameworks.

Key Points:

AI document analysis can reduce processing time by 80% compared to manual review
GPT-4o-mini provides optimal cost-performance balance for most use cases
Embedding-based retrieval enables accurate Q&A without full document context
Security considerations include data encryption and access controls
Integration with existing workflows requires careful API design
Performance monitoring and error handling are critical for production deployment

Technical Details

Backend: Python 3.9+ with Streamlit
AI Models: OpenAI GPT-4o-mini for summarization, text-embedding-3-small for retrieval
PDF Processing: PyPDF for text extraction
Vector Search: Cosine similarity for document chunk retrieval
Environment: Virtual environment with pinned dependencies

Future Improvements

📚 Multi-document support: Process multiple PDFs simultaneously
🔍 Enhanced RAG: Implement more sophisticated retrieval strategies
☁️ Cloud deployment: Deploy to Streamlit Community Cloud or AWS
📊 Analytics: Track usage patterns and document insights
🔐 Authentication: Add user management and document access controls
🌐 API: RESTful API for integration with other applications

License

MIT License - see LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
screenshots		screenshots
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Document Summarizer

Features

📸 Screenshots

Quick Start

Prerequisites

Installation

Usage

Example Output

Technical Details

Future Improvements

License

Contributing

About

Uh oh!

Releases

Packages

Languages

nickcarndt/document-summarizer

Folders and files

Latest commit

History

Repository files navigation

Document Summarizer

Features

📸 Screenshots

Quick Start

Prerequisites

Installation

Usage

Example Output

Technical Details

Future Improvements

License

Contributing

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages