Skip to content

Conversation

samanklesaria
Copy link
Collaborator

@samanklesaria samanklesaria commented Aug 5, 2025

The code for lfilter already has explicit cpu and CUDA. Do we really need a third fallback option? I guess there's an mps device type too. But shouldn't everything else get handled by cpu and cuda? If we removed the fallback, we wouldn't have to worry about porting the transpose, squeeze, and index_put operations to the stable ABI.

If it's important to keep support for other devices, we could also port this generic code to python instead.

@samanklesaria samanklesaria requested a review from a team as a code owner August 5, 2025 23:27
Copy link

pytorch-bot bot commented Aug 5, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/audio/4023

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures, 4 Unrelated Failures

As of commit 48f077a with merge base f0a4999 (image):

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed label Aug 5, 2025
Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need a third fallback option? I guess there's an mps device type too. But shouldn't everything else get handled by cpu and cuda?

I am not sure, but I don't think MPS devices would trigger either of the CPU/CUDA registered ops. I think it'd be safe to merge the PR if we can confirm that the M1 tests are passing... But we would need to re-enable them first.

@samanklesaria samanklesaria marked this pull request as draft August 7, 2025 20:46
@samanklesaria
Copy link
Collaborator Author

This PR will be converted back from draft once tests for MPS devices are turned on.

@pearu pearu added this to the 2.9 milestone Sep 5, 2025
@pearu
Copy link
Collaborator

pearu commented Sep 5, 2025

Heads up: #4091 enables M1 tests.

@pearu
Copy link
Collaborator

pearu commented Sep 5, 2025

If we plan to land this, then apply also

diff --git a/src/torchaudio/functional/filtering.py b/src/torchaudio/functional/filtering.py
index 76deb04a..6dc488cf 100644
--- a/src/torchaudio/functional/filtering.py
+++ b/src/torchaudio/functional/filtering.py
@@ -923,7 +923,7 @@ def highpass_biquad(waveform: Tensor, sample_rate: int, cutoff_freq: float, Q: f
     return biquad(waveform, b0, b1, b2, a0, a1, a2)
 
 
-def _lfilter_core_generic_loop(input_signal_windows: Tensor, a_coeffs_flipped: Tensor, padded_output_waveform: Tensor):
+def _lfilter_core_loop(input_signal_windows: Tensor, a_coeffs_flipped: Tensor, padded_output_waveform: Tensor):
     n_order = a_coeffs_flipped.size(1)
     a_coeffs_flipped = a_coeffs_flipped.unsqueeze(2)
     for i_sample, o0 in enumerate(input_signal_windows.permute(2, 0, 1)):
@@ -932,12 +932,6 @@ def _lfilter_core_generic_loop(input_signal_windows: Tensor, a_coeffs_flipped: T
         padded_output_waveform[:, :, i_sample + n_order - 1] = o0
 
 
-if _IS_TORCHAUDIO_EXT_AVAILABLE:
-    _lfilter_core_loop = torch.ops.torchaudio._lfilter_core_loop
-else:
-    _lfilter_core_loop = _lfilter_core_generic_loop
-
-
 class DifferentiableFIR(torch.autograd.Function):
     @staticmethod
     def forward(ctx, waveform, b_coeffs):

to prevent breaking tests.

@pearu pearu mentioned this pull request Sep 5, 2025
6 tasks
@samanklesaria
Copy link
Collaborator Author

@pearu I had originally thought that we'd remove this generic loop from the C++ code and add it back as python code for the special case of mps (or anything that's both not cpu and not cuda). This way, select, matmul, unsqueeze, index_put, etc are no longer necessary in the ABI, and the only thing holding us back is parallel_for. I don't think I understand your previous comment; now that M1 tests exist, to make this work, I'd need to add the python port of the generic code. Does that make sense?

@samanklesaria
Copy link
Collaborator Author

samanklesaria commented Sep 5, 2025

Just realized that there's already a python implementation of the generic loop.

@samanklesaria
Copy link
Collaborator Author

If we plan to land this, then apply also

diff --git a/src/torchaudio/functional/filtering.py b/src/torchaudio/functional/filtering.py
index 76deb04a..6dc488cf 100644
--- a/src/torchaudio/functional/filtering.py
+++ b/src/torchaudio/functional/filtering.py
@@ -923,7 +923,7 @@ def highpass_biquad(waveform: Tensor, sample_rate: int, cutoff_freq: float, Q: f
     return biquad(waveform, b0, b1, b2, a0, a1, a2)
 
 
-def _lfilter_core_generic_loop(input_signal_windows: Tensor, a_coeffs_flipped: Tensor, padded_output_waveform: Tensor):
+def _lfilter_core_loop(input_signal_windows: Tensor, a_coeffs_flipped: Tensor, padded_output_waveform: Tensor):
     n_order = a_coeffs_flipped.size(1)
     a_coeffs_flipped = a_coeffs_flipped.unsqueeze(2)
     for i_sample, o0 in enumerate(input_signal_windows.permute(2, 0, 1)):
@@ -932,12 +932,6 @@ def _lfilter_core_generic_loop(input_signal_windows: Tensor, a_coeffs_flipped: T
         padded_output_waveform[:, :, i_sample + n_order - 1] = o0
 
 
-if _IS_TORCHAUDIO_EXT_AVAILABLE:
-    _lfilter_core_loop = torch.ops.torchaudio._lfilter_core_loop
-else:
-    _lfilter_core_loop = _lfilter_core_generic_loop
-
-
 class DifferentiableFIR(torch.autograd.Function):
     @staticmethod
     def forward(ctx, waveform, b_coeffs):

to prevent breaking tests.

We don't want to always use the generic loop. When it's applicable (cpu or cuda), we do want to use the extension version of lfilter_core_loop. It's just for macs (or when the extension doesn't exist) that we want to swap in the python one.

@pearu
Copy link
Collaborator

pearu commented Sep 5, 2025

@pearu I had originally thought that we'd remove this generic loop from the C++ code and add it back as python code for the special case of mps (or anything that's both not cpu and not cuda). This way, select, matmul, unsqueeze, index_put, etc are no longer necessary in the ABI, and the only thing holding us back is parallel_for. I don't think I understand your previous comment;

The comment originates from reading the existing code: when this PR lands as it is, the following statement

_lfilter_core_loop = torch.ops.torchaudio._lfilter_core_loop

will fail.

Just realized that there's already a python implementation of the generic loop.

👍

@samanklesaria
Copy link
Collaborator Author

samanklesaria commented Sep 5, 2025

@pearu I had originally thought that we'd remove this generic loop from the C++ code and add it back as python code for the special case of mps (or anything that's both not cpu and not cuda). This way, select, matmul, unsqueeze, index_put, etc are no longer necessary in the ABI, and the only thing holding us back is parallel_for. I don't think I understand your previous comment;

The comment originates from reading the existing code: when this PR lands as it is, the following statement

_lfilter_core_loop = torch.ops.torchaudio._lfilter_core_loop

will fail.

Just realized that there's already a python implementation of the generic loop.

👍

Why? _lfilter_core_loop is still defined on line 80 of lfilter.pp.

@pearu
Copy link
Collaborator

pearu commented Sep 5, 2025

We don't want to always use the generic loop. When it's applicable (cpu or cuda), we do want to use the extension version of lfilter_core_loop. It's just for macs (or when the extension doesn't exist) that we want to swap in the python one.

OK, I did not realize that. 🤞 M1 CI will pass.

@pearu
Copy link
Collaborator

pearu commented Sep 5, 2025

Why? _lfilter_core_loop is still defined on line 80 of lfilter.pp.

I was misreading:

+ TORCH_LIBRARY_IMPL(torchaudio, CompositeExplicitAutograd, m) {
+   m.impl("torchaudio::_lfilter_core_loop", &lfilter_core_generic_loop);
+ }

Sorry for the noise.

@samanklesaria samanklesaria marked this pull request as ready for review September 5, 2025 15:48
Copy link
Collaborator

@pearu pearu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks, @samanklesaria!

@pearu pearu force-pushed the no_lfilter_generic_loop branch from 89ee52e to d923531 Compare September 11, 2025 11:41
@pearu pearu modified the milestones: 2.9, 2.10 Sep 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants