Remove generic lfilter loop #4023

samanklesaria · 2025-08-05T23:27:58Z

The code for lfilter already has explicit cpu and CUDA. Do we really need a third fallback option? I guess there's an mps device type too. But shouldn't everything else get handled by cpu and cuda? If we removed the fallback, we wouldn't have to worry about porting the transpose, squeeze, and index_put operations to the stable ABI.

If it's important to keep support for other devices, we could also port this generic code to python instead.

pytorch-bot · 2025-08-05T23:28:01Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/audio/4023

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures, 4 Unrelated Failures

As of commit 48f077a with merge base f0a4999 ():

NEW FAILURES - The following jobs have failed:

.github/workflows/integration-test.yml (gh)
Build Aarch64 Linux Wheels / pytorch/audio / build-wheel-py3_10-cuda-aarch6413_0-aarch64 (gh)
Build Aarch64 Linux Wheels / pytorch/audio / upload / upload-wheel-py3_10-cuda-aarch6412_8-aarch64 (gh)
Unable to download artifact(s): Artifact not found for name: pytorch_audio__3.10_cu128_aarch64

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

Build Aarch64 Linux Wheels / pytorch/audio / build-wheel-py3_10-cuda-aarch6412_6-aarch64 (gh) (trunk failure)
Process completed with exit code 1.
Build Aarch64 Linux Wheels / pytorch/audio / build-wheel-py3_10-cuda-aarch6412_8-aarch64 (gh) (trunk failure)
Process completed with exit code 1.
Build Aarch64 Linux Wheels / pytorch/audio / upload / upload-wheel-py3_10-cuda-aarch6412_6-aarch64 (gh) (trunk failure)
Unable to download artifact(s): Artifact not found for name: pytorch_audio__3.10_cu126_aarch64
Build Aarch64 Linux Wheels / pytorch/audio / upload / upload-wheel-py3_10-cuda-aarch6413_0-aarch64 (gh) (trunk failure)
Unable to download artifact(s): Artifact not found for name: pytorch_audio__3.10_cu130_aarch64

This comment was automatically generated by Dr. CI and updates every 15 minutes.

NicolasHug

Do we really need a third fallback option? I guess there's an mps device type too. But shouldn't everything else get handled by cpu and cuda?

I am not sure, but I don't think MPS devices would trigger either of the CPU/CUDA registered ops. I think it'd be safe to merge the PR if we can confirm that the M1 tests are passing... But we would need to re-enable them first.

samanklesaria · 2025-08-07T20:47:08Z

This PR will be converted back from draft once tests for MPS devices are turned on.

pearu · 2025-09-05T13:16:19Z

Heads up: #4091 enables M1 tests.

pearu · 2025-09-05T13:21:50Z

If we plan to land this, then apply also

diff --git a/src/torchaudio/functional/filtering.py b/src/torchaudio/functional/filtering.py
index 76deb04a..6dc488cf 100644
--- a/src/torchaudio/functional/filtering.py
+++ b/src/torchaudio/functional/filtering.py
@@ -923,7 +923,7 @@ def highpass_biquad(waveform: Tensor, sample_rate: int, cutoff_freq: float, Q: f
     return biquad(waveform, b0, b1, b2, a0, a1, a2)
 
 
-def _lfilter_core_generic_loop(input_signal_windows: Tensor, a_coeffs_flipped: Tensor, padded_output_waveform: Tensor):
+def _lfilter_core_loop(input_signal_windows: Tensor, a_coeffs_flipped: Tensor, padded_output_waveform: Tensor):
     n_order = a_coeffs_flipped.size(1)
     a_coeffs_flipped = a_coeffs_flipped.unsqueeze(2)
     for i_sample, o0 in enumerate(input_signal_windows.permute(2, 0, 1)):
@@ -932,12 +932,6 @@ def _lfilter_core_generic_loop(input_signal_windows: Tensor, a_coeffs_flipped: T
         padded_output_waveform[:, :, i_sample + n_order - 1] = o0
 
 
-if _IS_TORCHAUDIO_EXT_AVAILABLE:
-    _lfilter_core_loop = torch.ops.torchaudio._lfilter_core_loop
-else:
-    _lfilter_core_loop = _lfilter_core_generic_loop
-
-
 class DifferentiableFIR(torch.autograd.Function):
     @staticmethod
     def forward(ctx, waveform, b_coeffs):

to prevent breaking tests.

samanklesaria · 2025-09-05T14:36:17Z

@pearu I had originally thought that we'd remove this generic loop from the C++ code and add it back as python code for the special case of mps (or anything that's both not cpu and not cuda). This way, select, matmul, unsqueeze, index_put, etc are no longer necessary in the ABI, and the only thing holding us back is parallel_for. I don't think I understand your previous comment; now that M1 tests exist, to make this work, I'd need to add the python port of the generic code. Does that make sense?

samanklesaria · 2025-09-05T14:39:32Z

Just realized that there's already a python implementation of the generic loop.

samanklesaria · 2025-09-05T14:42:47Z

If we plan to land this, then apply also

diff --git a/src/torchaudio/functional/filtering.py b/src/torchaudio/functional/filtering.py
index 76deb04a..6dc488cf 100644
--- a/src/torchaudio/functional/filtering.py
+++ b/src/torchaudio/functional/filtering.py
@@ -923,7 +923,7 @@ def highpass_biquad(waveform: Tensor, sample_rate: int, cutoff_freq: float, Q: f
     return biquad(waveform, b0, b1, b2, a0, a1, a2)
 
 
-def _lfilter_core_generic_loop(input_signal_windows: Tensor, a_coeffs_flipped: Tensor, padded_output_waveform: Tensor):
+def _lfilter_core_loop(input_signal_windows: Tensor, a_coeffs_flipped: Tensor, padded_output_waveform: Tensor):
     n_order = a_coeffs_flipped.size(1)
     a_coeffs_flipped = a_coeffs_flipped.unsqueeze(2)
     for i_sample, o0 in enumerate(input_signal_windows.permute(2, 0, 1)):
@@ -932,12 +932,6 @@ def _lfilter_core_generic_loop(input_signal_windows: Tensor, a_coeffs_flipped: T
         padded_output_waveform[:, :, i_sample + n_order - 1] = o0
 
 
-if _IS_TORCHAUDIO_EXT_AVAILABLE:
-    _lfilter_core_loop = torch.ops.torchaudio._lfilter_core_loop
-else:
-    _lfilter_core_loop = _lfilter_core_generic_loop
-
-
 class DifferentiableFIR(torch.autograd.Function):
     @staticmethod
     def forward(ctx, waveform, b_coeffs):

to prevent breaking tests.

We don't want to always use the generic loop. When it's applicable (cpu or cuda), we do want to use the extension version of lfilter_core_loop. It's just for macs (or when the extension doesn't exist) that we want to swap in the python one.

pearu · 2025-09-05T14:43:25Z

@pearu I had originally thought that we'd remove this generic loop from the C++ code and add it back as python code for the special case of mps (or anything that's both not cpu and not cuda). This way, select, matmul, unsqueeze, index_put, etc are no longer necessary in the ABI, and the only thing holding us back is parallel_for. I don't think I understand your previous comment;

The comment originates from reading the existing code: when this PR lands as it is, the following statement

_lfilter_core_loop = torch.ops.torchaudio._lfilter_core_loop

will fail.

Just realized that there's already a python implementation of the generic loop.

👍

samanklesaria · 2025-09-05T14:44:40Z

@pearu I had originally thought that we'd remove this generic loop from the C++ code and add it back as python code for the special case of mps (or anything that's both not cpu and not cuda). This way, select, matmul, unsqueeze, index_put, etc are no longer necessary in the ABI, and the only thing holding us back is parallel_for. I don't think I understand your previous comment;

The comment originates from reading the existing code: when this PR lands as it is, the following statement
_lfilter_core_loop = torch.ops.torchaudio._lfilter_core_loop
will fail.

Just realized that there's already a python implementation of the generic loop.

👍

Why? _lfilter_core_loop is still defined on line 80 of lfilter.pp.

pearu · 2025-09-05T14:45:28Z

We don't want to always use the generic loop. When it's applicable (cpu or cuda), we do want to use the extension version of lfilter_core_loop. It's just for macs (or when the extension doesn't exist) that we want to swap in the python one.

OK, I did not realize that. 🤞 M1 CI will pass.

pearu · 2025-09-05T14:47:35Z

Why? _lfilter_core_loop is still defined on line 80 of lfilter.pp.

I was misreading:

+ TORCH_LIBRARY_IMPL(torchaudio, CompositeExplicitAutograd, m) {
+   m.impl("torchaudio::_lfilter_core_loop", &lfilter_core_generic_loop);
+ }

Sorry for the noise.

pearu

LGTM! Thanks, @samanklesaria!

samanklesaria requested a review from a team as a code owner August 5, 2025 23:27

meta-cla bot added the CLA Signed label Aug 5, 2025

NicolasHug reviewed Aug 6, 2025

View reviewed changes

samanklesaria marked this pull request as draft August 7, 2025 20:46

pearu added this to the 2.9 milestone Sep 5, 2025

pearu mentioned this pull request Sep 5, 2025

[STABLE ABI] Porting lfilter.cpp #4074

Open

6 tasks

samanklesaria marked this pull request as ready for review September 5, 2025 15:48

samanklesaria requested a review from NicolasHug September 5, 2025 15:49

pearu approved these changes Sep 8, 2025

View reviewed changes

samanklesaria added 2 commits September 11, 2025 14:41

Remove generic lfilter loop

40243c0

Add dispatcher for mac core loop

d923531

pearu force-pushed the no_lfilter_generic_loop branch from 89ee52e to d923531 Compare September 11, 2025 11:41

Fix lint

48f077a

pearu modified the milestones: 2.9, 2.10 Sep 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove generic lfilter loop #4023

Remove generic lfilter loop #4023

Uh oh!

samanklesaria commented Aug 5, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Aug 5, 2025 •

edited

Loading

Uh oh!

NicolasHug left a comment

Uh oh!

samanklesaria commented Aug 7, 2025

Uh oh!

pearu commented Sep 5, 2025

Uh oh!

pearu commented Sep 5, 2025

Uh oh!

samanklesaria commented Sep 5, 2025

Uh oh!

samanklesaria commented Sep 5, 2025 •

edited

Loading

Uh oh!

samanklesaria commented Sep 5, 2025

Uh oh!

pearu commented Sep 5, 2025

Uh oh!

samanklesaria commented Sep 5, 2025 •

edited

Loading

Uh oh!

pearu commented Sep 5, 2025

Uh oh!

pearu commented Sep 5, 2025

Uh oh!

pearu left a comment

Uh oh!

Uh oh!

Remove generic lfilter loop #4023

Are you sure you want to change the base?

Remove generic lfilter loop #4023

Uh oh!

Conversation

samanklesaria commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/audio/4023

❌ 3 New Failures, 4 Unrelated Failures

Uh oh!

NicolasHug left a comment

Choose a reason for hiding this comment

Uh oh!

samanklesaria commented Aug 7, 2025

Uh oh!

pearu commented Sep 5, 2025

Uh oh!

pearu commented Sep 5, 2025

Uh oh!

samanklesaria commented Sep 5, 2025

Uh oh!

samanklesaria commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

samanklesaria commented Sep 5, 2025

Uh oh!

pearu commented Sep 5, 2025

Uh oh!

samanklesaria commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pearu commented Sep 5, 2025

Uh oh!

pearu commented Sep 5, 2025

Uh oh!

pearu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

samanklesaria commented Aug 5, 2025 •

edited

Loading

pytorch-bot bot commented Aug 5, 2025 •

edited

Loading

samanklesaria commented Sep 5, 2025 •

edited

Loading

samanklesaria commented Sep 5, 2025 •

edited

Loading