Welcome to the Divar Advertisement Scraper, a powerful and user-friendly web application designed to streamline your search for advertisements on Divar.ir, Iran's leading online marketplace. This tool allows you to effortlessly find relevant ads by entering a search query (e.g., "apartment for rent") and selecting a city, with results presented in a clean, bulleted list format. Powered by AI-driven query optimization and robust web scraping, this application is perfect for users seeking efficiency and developers exploring AI and web technologies.
- Save Time: Quickly find relevant advertisements without manually navigating Divar.ir.
- AI-Enhanced Search: Optimize search queries for more accurate and relevant results.
- User-Friendly: Enjoy a simple, intuitive web interface built with Streamlit.
- Customizable: Search across major Iranian cities with support for Persian and English names.
- Developer-Friendly: Open-source with clear setup instructions for contributions.
- Intuitive Interface: Built with Streamlit, offering a simple way to input queries and select cities.
- AI-Powered Query Optimization: Uses Ollama with the "qwen2.5:latest" model to refine search terms, preferably in Persian, for better results.
- City Support: Supports major Iranian cities with a predefined mapping of Persian (e.g., تهران) to English (e.g., Tehran) names for URL generation.
- Efficient Web Scraping: Utilizes Selenium to handle dynamic content and infinite scrolling on Divar.ir.
- Clean Output: Displays advertisements with title, description, and price in a neat, bulleted list.
- Workflow Management: Employs LangGraph to structure the workflow, ensuring a systematic process from input to output.
- Error Handling: Includes robust error checking for AI model initialization, web scraping, and user inputs, with troubleshooting guidance.
Technology | Role |
---|---|
Python | Primary programming language for the application. |
Streamlit | Creates the interactive web interface for user input and result display. |
Selenium | Handles web scraping and automates browser interactions. |
BeautifulSoup | Parses HTML content for extracting advertisement details. |
Ollama | Runs local AI models for query optimization. |
LangChain | Manages prompts and AI interactions for structured processing. |
LangGraph | Structures the application's workflow for efficient task management. |
Follow these steps to set up the Divar Advertisement Scraper on your local machine:
-
Install Python:
- Ensure Python 3.8 or higher is installed.
-
Install Dependencies:
- Install required Python libraries using pip:
pip install beautifulsoup4 selenium streamlit langchain-core langchain-ollama langgraph
- Install required Python libraries using pip:
-
Install Chrome and Chrome WebDriver:
- Install Google Chrome.
- Download the ChromeDriver matching your Chrome version.
- Add ChromeDriver to your system's PATH or specify its location in the code.
-
Set up Ollama:
- Install Ollama via its official website or package manager.
- Pull the required AI model:
ollama pull qwen2.5
- Start Ollama:
ollama serve
-
Clone the Repository:
- Clone the repository (replace
your_username
with the actual GitHub username):git clone https://github.com/armanjscript/divar-ad-scraper.git
- Navigate to the project directory:
cd divar-ad-scraper
- Clone the repository (replace
-
Run the Application:
- Launch the Streamlit app:
streamlit run main.py
- Launch the Streamlit app:
Note: Running AI models locally requires sufficient computational resources (e.g., 16GB RAM, capable CPU/GPU). Ensure a stable internet connection for initial setup and ChromeDriver compatibility.
-
Launch the Application:
- Run
streamlit run main.py
to open the app in your default web browser.
- Run
-
Interact with the Interface:
- Enter a Query: Input a search term (e.g., "apartment for rent").
- Select a City: Choose a city from the dropdown (e.g., تهران, مشهد).
- Search: Click the "Search" button to initiate the process.
-
View Results:
- The app displays the optimized query and a bulleted list of advertisements, including title, description, and price.
- If no results are found, try rephrasing the query or selecting a different city.
The interface features:
- A text input for search queries.
- A dropdown for city selection.
- A search button to trigger the scraping process.
- A display area for results, formatted for clarity.
Issue | Solution |
---|---|
Ollama Not Running | Ensure Ollama is running (ollama serve ) and the qwen2.5 model is pulled. |
ChromeDriver Issues | Verify ChromeDriver matches your Chrome version; update Chrome if needed. |
No Results | Rephrase the query or select a different city; check internet connection. |
We welcome contributions to enhance the project! To contribute:
- Fork the repository.
- Create a new branch for your changes.
- Submit a pull request with a clear description.
- For bugs or feature requests, open an issue on GitHub.
This project is licensed under the MIT License. See the LICENSE file for details.
For questions or feedback, contact [armannew73@gmail.com] or open an issue on GitHub.