Wayback-Go

A command-line tool to download websites from the Wayback Machine, re-written in Go.

Overview

This program is a Go port of the popular Ruby-based wayback-machine-downloader by hartator (available at https://github.com/hartator/wayback-machine-downloader). It allows you to download all available snapshots of a given URL from the Internet Archive's Wayback Machine, saving them locally.

Features

Download Entire Websites: Recursively downloads all files associated with a given URL from the Wayback Machine.
Exact URL Download: Option to download only the exact URL provided, without following links.
Timestamp Filtering: Specify from and to timestamps to download snapshots within a particular date range.
Regex Filtering: Include or exclude URLs based on regular expressions.
All Timestamps: Download all available timestamps for each file, not just the latest.
Concurrency: Utilizes multiple threads for faster downloads.
List Only Mode: Preview the list of files that would be downloaded in JSON format without actually downloading them.
Error Handling: Option to download all files, even those that return errors.

Installation

To install wayback-go, you need to have Go installed on your system (Go 1.16 or later is recommended).

Clone the repository:

git clone https://github.com/Cat-Ling/wayback-go.git
cd wayback-go

Build the executable:
```
go build -o wayback-go
```
Move to your PATH (optional):
```
sudo mv wayback-go /usr/local/bin/
```

Usage

./wayback-go --url <URL> [options]

Options:

--url <URL>: The base URL to download from Wayback Machine (required).
--exact-url: Download only the exact URL.
--dir <directory>: Directory to save the downloaded files (defaults to websites/<domain>).
--all-timestamps: Download all available timestamps for each file.
--from <timestamp>: Download snapshots from this timestamp (e.g., 20060102150405).
--to <timestamp>: Download snapshots to this timestamp (e.g., 20060102150405).
--only <regex>: Only download URLs matching this regex filter.
--exclude <regex>: Exclude URLs matching this regex filter.
--all: Download all files, even if they return an error.
--max-pages <number>: Maximum number of snapshot pages to retrieve from Wayback Machine API (default: 100).
--threads <number>: Number of concurrent download threads (default: 1).
--list: Only list file URLs in JSON format, won't download anything.

Examples:

Download a website:
```
./wayback-go --url https://example.com
```

Download only a specific URL:

./wayback-go --url https://example.com/page.html --exact-url

Download with a specific output directory:

./wayback-go --url https://example.com --dir my_archive

Download snapshots from a specific date:

./wayback-go --url https://example.com --from 20200101000000 --to 20201231235959

List files in JSON format:

./wayback-go --url https://example.com --list

Download with 5 concurrent threads:

./wayback-go --url https://example.com --threads 5

Only download CSS files:

./wayback-go --url https://example.com --only "\.css$"

Contributing

Contributions are welcome! Please feel free to open issues or submit pull requests.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LICENSE		LICENSE
README.md		README.md
archive.go		archive.go
downloader.go		downloader.go
go.mod		go.mod
go.sum		go.sum
main.go		main.go
regex.go		regex.go
tidybytes.go		tidybytes.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Wayback-Go

Overview

Features

Installation

Usage

Options:

Examples:

Contributing

License

About

Uh oh!

Releases 1

Languages

License

Cat-Ling/wayback-go

Folders and files

Latest commit

History

Repository files navigation

Wayback-Go

Overview

Features

Installation

Usage

Options:

Examples:

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Languages