Speech enhancement based on cascaded two flows

This repository contains the PyTorch implementations for the paper:

Speech enhancement based on cascaded two flows [1]

Presentation video in english, Presentation video in korean, Post about the work, Paper link

This repository builds upon previous great works:

[FlowSE] https://github.com/seongq/flowmse
[SGMSE] https://github.com/sp-uhh/sgmse
[SGMSE-CRP] https://github.com/sp-uhh/sgmse_crp
[BBED] https://github.com/sp-uhh/sgmse-bbed
[StoRM] https://github.com/sp-uhh/storm

Installation

Create a new virtual environment with Python 3.10 (we have not tested other Python versions, but they may work).
Install the package dependencies via pip install -r requirements.txt.
W&B is required.

Training

Training is done by executing train.py. A minimal running example with default settings (as in our paper [1]) can be run with

python train.py --base_dir <your_dataset_dir>

where your_dataset_dir should be a containing subdirectories train/ and valid/ (optionally test/ as well).

Trained models are saved a directory named "logs".

Each subdirectory must itself have two subdirectories clean/ and noisy/, with the same filenames present in both. We currently only support training with .wav files.

To get the training set WSJ0+CHiME3 (H), WSJ0+CHiME3 (L) and WSJ0+Reverb, we refer to https://github.com/sp-uhh/sgmse and https://github.com/sp-uhh/storm.

To see all available training options, run python train.py --help.

Checkpoints

We provide pretrained checkpoints for the models trained on WSJ0+CHiME3 (H), WSJ0+CHiME3 (L), WSJ0+Reverb, Voicebank/DEMNAD (VB-DMD). All checkpoints can be downloaded here

Evaluation

To evaluate on a test set, run

python evaluate_cascading.py --test_dir <your_test_dataset_dir> --folder_destination <your_enh_result_save_dir> --ckpt <path_to_model_checkpoint> --N_second <num_of_time_steps_for_the_second_flow>

"N_second" is the evaluation number of the numerical integration for the second flow. For the first flow, we set the number of evaluation to be 1.

your_test_dataset_dir should contain a subfolder test which contains subdirectories clean and noisy. clean and noisy should contain .wav files.

Citations / References

[1] Seonggyu Lee, Sein Cheong, Sangwook Han, Kihyuk Kim and Jong Won Shin, “Speech Enhancement based on cascaded two flows” in Proceedings of Interspeech, Aug. 2025.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
flowmse		flowmse
.gitignore		.gitignore
evaluate_cascading.py		evaluate_cascading.py
readme.md		readme.md
requirements.txt		requirements.txt
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Speech enhancement based on cascaded two flows

Installation

Training

Checkpoints

Evaluation

Citations / References

About

Uh oh!

Releases

Packages

Languages

seongq/cascadingtwoflowmatching

Folders and files

Latest commit

History

Repository files navigation

Speech enhancement based on cascaded two flows

Installation

Training

Checkpoints

Evaluation

Citations / References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages