HyFetcher

HyFetcher is an efficient offline website/article batch downloader and index generator written in Rust. It supports concurrent downloading of web pages, automatic localization of images and videos, and generates a browsable index.html index page.

Features

🚀 Multi-threaded high-concurrency downloading, significantly faster than the Python version
🖼️ Automatically localizes images and videos in web pages
🗂️ Automatically generates a browsable index page
🛠️ Flexible command-line arguments to specify data directory, output directory, concurrency, etc.
📦 Simple and easy to use, suitable for personal knowledge management, web archiving, and similar scenarios
🔧 Automatic external tool detection and installation

Data and Directory Structure

hyfetcher/
├── src/
│   ├── main.rs
│   ├── model.rs
│   ├── parser/
│   │   └── ...
│   ├── fetcher/
│   │   └── ...
│   └── ...
├── data/
│   ├── <category>
│   │   ├── <sub-category>
│   │   │   ├── hypress.csv
│   │   │   └── ...
│   │   └── ...
│   └── ...
├── outputs/
│   ├── index.html
│   ├── <category>
│   │   ├── <sub-category>
│   │   │   ├── hypress
│   │   │   │   ├── example-page.html
│   │   │   │   └── ...
│   │   │   └── ...
│   │   └── ...
│   └── ...
├── Cargo.toml
├── README.md
└── ...

You need to prepare a tree-structured input directory (such as data/). Each level of the directory corresponds to a category in the generated index.html. The leaf directories contain CSV files describing the crawl targets. The CSV format is defined in model.rs and must include at least the fields url and title.
Each web page will be saved as a local HTML file. The output directory (such as outputs/) will preserve the same hierarchical structure as the input directory.
Images, videos, and other resources are automatically downloaded to the local outputs/<category>/<sub-category>/images/ or outputs/<category>/<sub-category>/videos/ directories.

The program will automatically generate index.html in the output directory. You can open it directly in your browser to quickly browse all downloaded web pages.

Available options

You can use the following command-line options to configure HyFetcher:

-d, --data_dir <DATA_DIR>: Input data directory, default is data
-o, --outputs_dir <OUTPUTS_DIR>: Output directory, default is outputs
-c, --concurrency <CONCURRENCY>: Number of concurrent tasks, default is 8
--skip-tool-check: Skip external tool detection and installation

Example:

./target/release/hyfetcher -d data -o outputs -c 16

Usage on Different Platforms

HyFetcher provides pre-built executables for Windows, macOS, and Linux. You can download them from the Releases page. No local compilation is required—just download and run.

Windows

Go to the Releases page and download the latest hyfetcher-windows-amd64.zip.
Extract it to obtain hyfetcher-windows-amd64.exe.
Place your data directory (such as data) and output directory (such as outputs) in the same directory or specify their paths.

In the command line (cmd or PowerShell), run:

.\hyfetcher-windows-amd64.exe -d data -o outputs

After the program finishes, open outputs/index.html in your browser to view the downloaded web pages.

macOS

Go to the Releases page and download the latest hyfetcher-macos-amd64.tar.gz (for Intel chips) or hyfetcher-macos-arm64.tar.gz (for Apple Silicon).
Extract it to obtain the executable (such as hyfetcher-macos-amd64 or hyfetcher-macos-arm64).
Grant execute permission if needed:
```
chmod +x hyfetcher-macos-amd64
```

Run in Terminal:

./hyfetcher-macos-amd64 -d data -o outputs

After the program finishes, open outputs/index.html in your browser to view all downloaded web pages.

Linux

Go to the Releases page and download the latest hyfetcher-linux-amd64.tar.gz.
Extract it to obtain hyfetcher-linux-amd64.
Grant execute permission if needed:
```
chmod +x hyfetcher-linux-amd64
```

Run in Terminal:

./hyfetcher-linux-amd64 -d data -o outputs

After the program finishes, open outputs/index.html in your browser to view all downloaded web pages.

Dependencies

Rust Crates

tokio - Async runtime
reqwest - HTTP client
scraper - HTML parsing
clap - Command line argument parsing
anyhow - Error handling
url - URL parsing
futures - Async utilities
env_logger - Logging
See Cargo.toml for complete list

External Tools

yt-dlp: Required for downloading videos from platforms like Bilibili. The program will automatically detect and install this tool if not found.
- Windows: Downloaded as executable from GitHub releases
- macOS: Installed via pip3 install --user yt-dlp
- Linux: Downloaded as binary from GitHub releases

The program automatically handles external tool installation on first run. You can use --skip-tool-check to bypass this feature if needed.

Usage from Source

Compilation

Make sure you have installed the Rust toolchain. Then, in the project directory, run:

cargo build --release

The executable will be located at target/release/hyfetcher.

Running

In the project root directory, run:

./target/release/hyfetcher [OPTIONS]

See above for available options.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.github/workflows		.github/workflows
data/examples/links		data/examples/links
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
README_CN.md		README_CN.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

HyFetcher

Features

Data and Directory Structure

Available options

Usage on Different Platforms

Windows

macOS

Linux

Dependencies

Rust Crates

External Tools

Usage from Source

Compilation

Running

License

About

Uh oh!

Releases 2

Languages

License

hyperplasma/hyfetcher

Folders and files

Latest commit

History

Repository files navigation

HyFetcher

Features

Data and Directory Structure

Available options

Usage on Different Platforms

Windows

macOS

Linux

Dependencies

Rust Crates

External Tools

Usage from Source

Compilation

Running

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Languages