This is my own custom-built offline AI bot that lets you chat with PDFs and web pages using local embeddings and local LLMs like LLaMA 3.
I built it step by step using LangChain, FAISS, HuggingFace, and Ollama — without relying on OpenAI or DeepSeek APIs anymore (they just kept failing or costing too much).
- 📄 Chat with uploaded PDF files
- 🌍 Ask questions about a webpage URL
- 🧠 Uses local HuggingFace embeddings (all-MiniLM-L6-v2)
- 🦙 Powered by Ollama + LLaMA 3 (fully offline LLM)
- 🗃️ Built-in FAISS vectorstore
- 🧾 PDF inline preview
- 🧮 Built-in calculator + summarizer tools (via LangChain agents)
- 🧠 Page citation support (know where each answer came from)
- 📜 Chat history viewer with download button (JSON)
- 🎛️ Simple Streamlit UI with dark/light mode toggle
- 👨💻 Footer credit: Developed by EzioDEVio
- langchain,- langchain-community
- sentence-transformersfor local embeddings
- ollamafor local LLMs (- llama3)
- PyPDF2for PDF parsing
- FAISSfor vector indexing
- Streamlitfor frontend
git clone https://github.com/EzioDEVio/ai-knowledge-bot.git
cd ai-knowledge-botpython -m venv venv
.\venv\Scripts\activate  # Windows for Mac is differentpip install -r requirements.txtMake sure sentence-transformers is installed — needed for local embeddings.
Download and install from:
After installation, verify:
ollama --versionThen pull and run the model:
ollama run llama3This will download the LLaMA 3 model (approx. 4–8GB). You can also try
mistral,codellama, etc.
streamlit run app.pyThe app will open at:
http://localhost:8501
ai-knowledge-bot/
├── app.py                     # Main Streamlit UI
├── backend/
│   ├── pdf_loader.py          # PDF text extraction
│   ├── web_loader.py          # Webpage scraper
│   ├── vector_store.py        # Embedding + FAISS
│   └── qa_chain.py            # LLM QA logic (Ollama + tools)
├── .env                       # Not used anymore (was for API keys)
├── requirements.txt
└── README.md
| Component | Mode | 
|---|---|
| Embeddings | Local ( HuggingFace) | 
| Vectorstore | Local ( FAISS) | 
| LLM Response | Local ( Ollama+llama3) | 
| Internet Needed? | ❌ Only for first-time model download | 
- OpenAI failed with RateLimitErrorand quota issues unless I added billing.
- DeepSeek embedding endpoints didn’t work — only chat models supported.
So I switched to:
- 🔁 Local HuggingFaceEmbeddingsfor vectorization
- 🦙 ChatOllamafor full offline AI answers
- ✅ PDF upload + preview
- ✅ URL content QA
- ✅ Chat history with page citations
- ✅ Calculator + summarizer tools
- ✅ Footer attribution
- ✅ JSON export
- ✅ 100% offline functionality
Build and run the app securely using a multi-stage Dockerfile:
- Build the container
docker build -t ai-knowledge-bot .- Run the container Make sure Ollama is running on the host, open up a powershell or in different terminal then:
docker run -p 8501:8501 \
  --add-host=host.docker.internal:host-gateway \
  ai-knowledge-bot
✅ Multi-stage build (separates dependencies from runtime)
✅ Minimal base (python:3.10-slim)
✅ Non-root appuser by default
✅ .env, venv, logs excluded via .dockerignore
✅ Exposes only necessary port (8501)
✅ Automatically starts Streamlit app
MIT — feel free to fork, use, or improve it.
From concept to offline AI — all step by step.