A command-line tool to download websites from the Wayback Machine, re-written in Go.
This program is a Go port of the popular Ruby-based wayback-machine-downloader
by hartator (available at https://github.com/hartator/wayback-machine-downloader). It allows you to download all available snapshots of a given URL from the Internet Archive's Wayback Machine, saving them locally.
- Download Entire Websites: Recursively downloads all files associated with a given URL from the Wayback Machine.
- Exact URL Download: Option to download only the exact URL provided, without following links.
- Timestamp Filtering: Specify
from
andto
timestamps to download snapshots within a particular date range. - Regex Filtering: Include or exclude URLs based on regular expressions.
- All Timestamps: Download all available timestamps for each file, not just the latest.
- Concurrency: Utilizes multiple threads for faster downloads.
- List Only Mode: Preview the list of files that would be downloaded in JSON format without actually downloading them.
- Error Handling: Option to download all files, even those that return errors.
To install wayback-go
, you need to have Go installed on your system (Go 1.16 or later is recommended).
- Clone the repository:
git clone https://github.com/Cat-Ling/wayback-go.git cd wayback-go
- Build the executable:
go build -o wayback-go
- Move to your PATH (optional):
sudo mv wayback-go /usr/local/bin/
./wayback-go --url <URL> [options]
--url <URL>
: The base URL to download from Wayback Machine (required).--exact-url
: Download only the exact URL.--dir <directory>
: Directory to save the downloaded files (defaults towebsites/<domain>
).--all-timestamps
: Download all available timestamps for each file.--from <timestamp>
: Download snapshots from this timestamp (e.g.,20060102150405
).--to <timestamp>
: Download snapshots to this timestamp (e.g.,20060102150405
).--only <regex>
: Only download URLs matching this regex filter.--exclude <regex>
: Exclude URLs matching this regex filter.--all
: Download all files, even if they return an error.--max-pages <number>
: Maximum number of snapshot pages to retrieve from Wayback Machine API (default: 100).--threads <number>
: Number of concurrent download threads (default: 1).--list
: Only list file URLs in JSON format, won't download anything.
- Download a website:
./wayback-go --url https://example.com
- Download only a specific URL:
./wayback-go --url https://example.com/page.html --exact-url
- Download with a specific output directory:
./wayback-go --url https://example.com --dir my_archive
- Download snapshots from a specific date:
./wayback-go --url https://example.com --from 20200101000000 --to 20201231235959
- List files in JSON format:
./wayback-go --url https://example.com --list
- Download with 5 concurrent threads:
./wayback-go --url https://example.com --threads 5
- Only download CSS files:
./wayback-go --url https://example.com --only "\.css$"
Contributions are welcome! Please feel free to open issues or submit pull requests.
This project is licensed under the MIT License. See the LICENSE
file for details.