HyFetcher is an efficient offline website/article batch downloader and index generator written in Rust. It supports concurrent downloading of web pages, automatic localization of images and videos, and generates a browsable index.html
index page.
- 🚀 Multi-threaded high-concurrency downloading, significantly faster than the Python version
- 🖼️ Automatically localizes images and videos in web pages
- 🗂️ Automatically generates a browsable index page
- 🛠️ Flexible command-line arguments to specify data directory, output directory, concurrency, etc.
- 📦 Simple and easy to use, suitable for personal knowledge management, web archiving, and similar scenarios
- 🔧 Automatic external tool detection and installation
hyfetcher/
├── src/
│ ├── main.rs
│ ├── model.rs
│ ├── parser/
│ │ └── ...
│ ├── fetcher/
│ │ └── ...
│ └── ...
├── data/
│ ├── <category>
│ │ ├── <sub-category>
│ │ │ ├── hypress.csv
│ │ │ └── ...
│ │ └── ...
│ └── ...
├── outputs/
│ ├── index.html
│ ├── <category>
│ │ ├── <sub-category>
│ │ │ ├── hypress
│ │ │ │ ├── example-page.html
│ │ │ │ └── ...
│ │ │ └── ...
│ │ └── ...
│ └── ...
├── Cargo.toml
├── README.md
└── ...
- You need to prepare a tree-structured input directory (such as
data/
). Each level of the directory corresponds to a category in the generatedindex.html
. The leaf directories contain CSV files describing the crawl targets. The CSV format is defined inmodel.rs
and must include at least the fieldsurl
andtitle
. - Each web page will be saved as a local HTML file. The output directory (such as
outputs/
) will preserve the same hierarchical structure as the input directory. - Images, videos, and other resources are automatically downloaded to the local
outputs/<category>/<sub-category>/images/
oroutputs/<category>/<sub-category>/videos/
directories.
The program will automatically generate index.html
in the output directory. You can open it directly in your browser to quickly browse all downloaded web pages.
You can use the following command-line options to configure HyFetcher:
-d, --data_dir <DATA_DIR>
: Input data directory, default isdata
-o, --outputs_dir <OUTPUTS_DIR>
: Output directory, default isoutputs
-c, --concurrency <CONCURRENCY>
: Number of concurrent tasks, default is 8--skip-tool-check
: Skip external tool detection and installation
Example:
./target/release/hyfetcher -d data -o outputs -c 16
HyFetcher provides pre-built executables for Windows, macOS, and Linux. You can download them from the Releases page. No local compilation is required—just download and run.
-
Go to the Releases page and download the latest
hyfetcher-windows-amd64.zip
. -
Extract it to obtain
hyfetcher-windows-amd64.exe
. -
Place your data directory (such as
data
) and output directory (such asoutputs
) in the same directory or specify their paths. -
In the command line (cmd or PowerShell), run:
.\hyfetcher-windows-amd64.exe -d data -o outputs
-
After the program finishes, open
outputs/index.html
in your browser to view the downloaded web pages.
-
Go to the Releases page and download the latest
hyfetcher-macos-amd64.tar.gz
(for Intel chips) orhyfetcher-macos-arm64.tar.gz
(for Apple Silicon). -
Extract it to obtain the executable (such as
hyfetcher-macos-amd64
orhyfetcher-macos-arm64
). -
Grant execute permission if needed:
chmod +x hyfetcher-macos-amd64
-
Run in Terminal:
./hyfetcher-macos-amd64 -d data -o outputs
-
After the program finishes, open
outputs/index.html
in your browser to view all downloaded web pages.
-
Go to the Releases page and download the latest
hyfetcher-linux-amd64.tar.gz
. -
Extract it to obtain
hyfetcher-linux-amd64
. -
Grant execute permission if needed:
chmod +x hyfetcher-linux-amd64
-
Run in Terminal:
./hyfetcher-linux-amd64 -d data -o outputs
-
After the program finishes, open
outputs/index.html
in your browser to view all downloaded web pages.
- tokio - Async runtime
- reqwest - HTTP client
- scraper - HTML parsing
- clap - Command line argument parsing
- anyhow - Error handling
- url - URL parsing
- futures - Async utilities
- env_logger - Logging
- See
Cargo.toml
for complete list
- yt-dlp: Required for downloading videos from platforms like Bilibili. The program will automatically detect and install this tool if not found.
- Windows: Downloaded as executable from GitHub releases
- macOS: Installed via
pip3 install --user yt-dlp
- Linux: Downloaded as binary from GitHub releases
The program automatically handles external tool installation on first run. You can use --skip-tool-check
to bypass this feature if needed.
Make sure you have installed the Rust toolchain. Then, in the project directory, run:
cargo build --release
The executable will be located at target/release/hyfetcher
.
In the project root directory, run:
./target/release/hyfetcher [OPTIONS]
See above for available options.