simd-minimizers

A SIMD-accelerated library to compute random minimizers.

It can compute all the minimizers of a human genome in 4 seconds using a single thread. It also provides a canonical version that ensures that a sequence and its reverse-complement always select the same positions, which takes 6 seconds on a human genome.

This crate builds on packed_seq and seq-hash.

The underlying algorithm is described in the following paper:

SimdMinimizers: Computing random minimizers, fast. Ragnar Groot Koerkamp, Igor Martayan SEA 2025 doi.org/10.4230/LIPIcs.SEA.2025.20

Requirements

This library supports AVX2 and NEON instruction sets. Make sure to set RUSTFLAGS="-C target-cpu=native" when compiling to use the instruction sets available on your architecture:

RUSTFLAGS="-C target-cpu=native" cargo build --release

Or set it in your project or system wide .cargo/config.toml:

rustflags = ["-C", "target-cpu=native"]

Enable the -F scalar feature flag to fall back to a scalar implementation with reduced performance.

Usage example

Full documentation can be found on docs.rs.

use packed_seq::{PackedSeqVec, SeqVec};

let seq = b"ACGTGCTCAGAGACTCAGAGGA";
let packed_seq = PackedSeqVec::from_ascii(seq);

let k = 5;
let w = 7;
let hasher = <seq_hash::NtHasher>::new(k);

// Simple usage with default hasher, returning only positions.
let minimizer_positions = canonical_minimizer_positions(packed_seq.as_slice(), k, w);
assert_eq!(minimizer_positions, vec![0, 7, 9, 15]);

// Advanced usage with custom hasher, super-kmer positions, and minimizer values as well.
let mut minimizer_positions = Vec::new();
let mut super_kmers = Vec::new();
let minimizer_vals: Vec<u64> = canonical_minimizers(k, w)
    .hasher(&hasher)
    .super_kmers(&mut super_kmers)
    .run(packed_seq.as_slice(), &mut minimizer_positions)
    .values_u64()
    .collect();

Benchmarks

Benchmarks can be found in the bench directory in the GitHub repository.

bench/benches/bench.rs contains benchmarks used in this blogpost.

bench/src/bin/paper.rs contains benchmarks used in the paper.

Note that the benchmarks require some nightly features, you can install the latest nightly version with

rustup install nightly

To replicate results from the paper, go into bench and run

RUSTFLAGS="-C target-cpu=native" cargo +nightly run --release
python eval.py

The human genome we use is from the T2T consortium, and available by following the first link here.

Name		Name	Last commit message	Last commit date
Latest commit History 488 Commits
.cargo		.cargo
.github/workflows		.github/workflows
bench		bench
examples		examples
src		src
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
rustfmt.toml		rustfmt.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

simd-minimizers

Requirements

Usage example

Benchmarks

About

Uh oh!

Releases 1

Uh oh!

Contributors 3

Uh oh!

Languages

rust-seq/simd-minimizers

Folders and files

Latest commit

History

Repository files navigation

simd-minimizers

Requirements

Usage example

Benchmarks

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Uh oh!

Contributors 3

Uh oh!

Languages