Skip to content

seongq/cascadingtwoflowmatching

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Speech enhancement based on cascaded two flows

This repository contains the PyTorch implementations for the paper:

  • Speech enhancement based on cascaded two flows [1]

FlowSE fig1

Presentation video in english, Presentation video in korean, Post about the work, Paper link

This repository builds upon previous great works:

Installation

  • Create a new virtual environment with Python 3.10 (we have not tested other Python versions, but they may work).
  • Install the package dependencies via pip install -r requirements.txt.
  • W&B is required.

Training

Training is done by executing train.py. A minimal running example with default settings (as in our paper [1]) can be run with

python train.py --base_dir <your_dataset_dir>

where your_dataset_dir should be a containing subdirectories train/ and valid/ (optionally test/ as well).

Trained models are saved a directory named "logs".

Each subdirectory must itself have two subdirectories clean/ and noisy/, with the same filenames present in both. We currently only support training with .wav files.

To get the training set WSJ0+CHiME3 (H), WSJ0+CHiME3 (L) and WSJ0+Reverb, we refer to https://github.com/sp-uhh/sgmse and https://github.com/sp-uhh/storm.

To see all available training options, run python train.py --help.

Checkpoints

We provide pretrained checkpoints for the models trained on WSJ0+CHiME3 (H), WSJ0+CHiME3 (L), WSJ0+Reverb, Voicebank/DEMNAD (VB-DMD). All checkpoints can be downloaded here

Evaluation

To evaluate on a test set, run

python evaluate_cascading.py --test_dir <your_test_dataset_dir> --folder_destination <your_enh_result_save_dir> --ckpt <path_to_model_checkpoint> --N_second <num_of_time_steps_for_the_second_flow>

"N_second" is the evaluation number of the numerical integration for the second flow. For the first flow, we set the number of evaluation to be 1.

your_test_dataset_dir should contain a subfolder test which contains subdirectories clean and noisy. clean and noisy should contain .wav files.

Citations / References

[1] Seonggyu Lee, Sein Cheong, Sangwook Han, Kihyuk Kim and Jong Won Shin, “Speech Enhancement based on cascaded two flows” in Proceedings of Interspeech, Aug. 2025.

About

(Interspeech 2025, official code) Speech enhancement based on cascaded two flows

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 90.6%
  • Cuda 8.4%
  • C++ 1.0%