Welcome to the Scraping Google News Feed Using GoLogin & NodeMaven Proxies repository! This project provides a stealthy web scraper designed to extract structured data from Google News. It captures article sources and links in a clean format for tracking, analysis, or automation.
You can find the latest releases here. Please download the necessary files and execute them as instructed.
- Introduction
- Features
- Technologies Used
- Installation
- Usage
- Configuration
- Examples
- Contributing
- License
- Contact
In today’s fast-paced world, staying updated with the latest news is crucial. This repository simplifies that process by allowing you to scrape data from Google News efficiently. Using GoLogin for browser automation and NodeMaven for residential proxies, this scraper operates stealthily to ensure reliable data extraction without getting blocked.
- Stealthy Scraping: Uses advanced techniques to avoid detection.
- Structured Data: Extracts article titles, sources, and links in a clean format.
- Fast and Scalable: Optimized for speed and can handle multiple requests.
- Automation Ready: Perfect for tracking news articles over time.
- Proxy Rotation: Leverages NodeMaven's residential proxies for better anonymity.
This project employs the following technologies:
- Python: The primary programming language for the scraper.
- GoLogin: For browser automation.
- NodeMaven: Provides residential proxies.
- Playwright: For handling browser interactions.
- Selenium: Used for web scraping tasks.
- Requests: For making HTTP requests.
- BeautifulSoup: For parsing HTML and extracting data.
To get started, follow these steps:
-
Clone the Repository:
git clone https://github.com/parsakeshtkar/Scraping-Google-News-Feed-Using-GoLogin-NodeMaven-Proxies.git cd Scraping-Google-News-Feed-Using-GoLogin-NodeMaven-Proxies
-
Install Required Packages: Make sure you have Python 3.8 or higher installed. Use pip to install the necessary packages:
pip install -r requirements.txt
-
Download Latest Release: For the latest release, visit this link to download the files. Execute them as instructed.
To run the scraper, follow these steps:
-
Configure Your Proxy: Edit the configuration file to include your NodeMaven proxy details.
-
Run the Scraper: Use the command line to execute the scraper:
python scraper.py
-
View the Output: The scraped data will be saved in a CSV file named
news_data.csv
.
Before running the scraper, you need to configure a few settings:
- Proxy Settings: Update the
config.py
file with your NodeMaven proxy credentials. - Scraping Interval: Set the interval for how often you want to scrape data.
PROXY = {
'host': 'your_proxy_host',
'port': 'your_proxy_port',
'username': 'your_username',
'password': 'your_password'
}
SCRAPING_INTERVAL = 60 # in seconds
To scrape articles from Google News, simply run the scraper. It will collect data based on your configuration.
The output will be in CSV format with the following columns:
- Title: The title of the news article.
- Source: The publication source.
- Link: The URL to the article.
We welcome contributions to improve this project. If you have suggestions or find bugs, please create an issue or submit a pull request.
- Fork the repository.
- Create a new branch for your feature or fix.
- Make your changes.
- Submit a pull request.
This project is licensed under the MIT License. See the LICENSE file for details.
For questions or feedback, please reach out:
- Email: your-email@example.com
- GitHub: your-github-profile
Feel free to visit the Releases section for updates and new features. Thank you for checking out this repository!