Skip to content

This repo offers a stealthy web scraper for extracting structured data from Google News, capturing article sources and links in a clean format for tracking, analysis, or automation. Built with Python and GoLogin for browser automation, it is optimized for speed, scale, and stability, using NodeMaven's residential proxies.

Notifications You must be signed in to change notification settings

parsakeshtkar/Scraping-Google-News-Feed-Using-GoLogin-NodeMaven-Proxies

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 

Repository files navigation

Scraping Google News Feed Using GoLogin & NodeMaven Proxies 🌐📰

GitHub release Python License

Welcome to the Scraping Google News Feed Using GoLogin & NodeMaven Proxies repository! This project provides a stealthy web scraper designed to extract structured data from Google News. It captures article sources and links in a clean format for tracking, analysis, or automation.

You can find the latest releases here. Please download the necessary files and execute them as instructed.

Table of Contents

Introduction

In today’s fast-paced world, staying updated with the latest news is crucial. This repository simplifies that process by allowing you to scrape data from Google News efficiently. Using GoLogin for browser automation and NodeMaven for residential proxies, this scraper operates stealthily to ensure reliable data extraction without getting blocked.

Features

  • Stealthy Scraping: Uses advanced techniques to avoid detection.
  • Structured Data: Extracts article titles, sources, and links in a clean format.
  • Fast and Scalable: Optimized for speed and can handle multiple requests.
  • Automation Ready: Perfect for tracking news articles over time.
  • Proxy Rotation: Leverages NodeMaven's residential proxies for better anonymity.

Technologies Used

This project employs the following technologies:

  • Python: The primary programming language for the scraper.
  • GoLogin: For browser automation.
  • NodeMaven: Provides residential proxies.
  • Playwright: For handling browser interactions.
  • Selenium: Used for web scraping tasks.
  • Requests: For making HTTP requests.
  • BeautifulSoup: For parsing HTML and extracting data.

Installation

To get started, follow these steps:

  1. Clone the Repository:

    git clone https://github.com/parsakeshtkar/Scraping-Google-News-Feed-Using-GoLogin-NodeMaven-Proxies.git
    cd Scraping-Google-News-Feed-Using-GoLogin-NodeMaven-Proxies
  2. Install Required Packages: Make sure you have Python 3.8 or higher installed. Use pip to install the necessary packages:

    pip install -r requirements.txt
  3. Download Latest Release: For the latest release, visit this link to download the files. Execute them as instructed.

Usage

To run the scraper, follow these steps:

  1. Configure Your Proxy: Edit the configuration file to include your NodeMaven proxy details.

  2. Run the Scraper: Use the command line to execute the scraper:

    python scraper.py
  3. View the Output: The scraped data will be saved in a CSV file named news_data.csv.

Configuration

Before running the scraper, you need to configure a few settings:

  • Proxy Settings: Update the config.py file with your NodeMaven proxy credentials.
  • Scraping Interval: Set the interval for how often you want to scrape data.

Example Configuration

PROXY = {
    'host': 'your_proxy_host',
    'port': 'your_proxy_port',
    'username': 'your_username',
    'password': 'your_password'
}

SCRAPING_INTERVAL = 60  # in seconds

Examples

Scraping Articles

To scrape articles from Google News, simply run the scraper. It will collect data based on your configuration.

Output Format

The output will be in CSV format with the following columns:

  • Title: The title of the news article.
  • Source: The publication source.
  • Link: The URL to the article.

Contributing

We welcome contributions to improve this project. If you have suggestions or find bugs, please create an issue or submit a pull request.

Steps to Contribute

  1. Fork the repository.
  2. Create a new branch for your feature or fix.
  3. Make your changes.
  4. Submit a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contact

For questions or feedback, please reach out:

Feel free to visit the Releases section for updates and new features. Thank you for checking out this repository!

About

This repo offers a stealthy web scraper for extracting structured data from Google News, capturing article sources and links in a clean format for tracking, analysis, or automation. Built with Python and GoLogin for browser automation, it is optimized for speed, scale, and stability, using NodeMaven's residential proxies.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •