-
Notifications
You must be signed in to change notification settings - Fork 721
Remove generic lfilter loop #4023
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/audio/4023
Note: Links to docs will display an error until the docs builds have been completed. ❌ 3 New Failures, 4 Unrelated FailuresAs of commit 48f077a with merge base f0a4999 ( NEW FAILURES - The following jobs have failed:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need a third fallback option? I guess there's an mps device type too. But shouldn't everything else get handled by cpu and cuda?
I am not sure, but I don't think MPS devices would trigger either of the CPU/CUDA registered ops. I think it'd be safe to merge the PR if we can confirm that the M1 tests are passing... But we would need to re-enable them first.
This PR will be converted back from draft once tests for MPS devices are turned on. |
Heads up: #4091 enables M1 tests. |
If we plan to land this, then apply also
to prevent breaking tests. |
@pearu I had originally thought that we'd remove this generic loop from the C++ code and add it back as python code for the special case of mps (or anything that's both not cpu and not cuda). This way, select, matmul, unsqueeze, index_put, etc are no longer necessary in the ABI, and the only thing holding us back is parallel_for. I don't think I understand your previous comment; now that M1 tests exist, to make this work, I'd need to add the python port of the generic code. Does that make sense? |
Just realized that there's already a python implementation of the generic loop. |
We don't want to always use the generic loop. When it's applicable (cpu or cuda), we do want to use the extension version of |
The comment originates from reading the existing code: when this PR lands as it is, the following statement _lfilter_core_loop = torch.ops.torchaudio._lfilter_core_loop will fail.
👍 |
Why? |
OK, I did not realize that. 🤞 M1 CI will pass. |
I was misreading:
Sorry for the noise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks, @samanklesaria!
89ee52e
to
d923531
Compare
The code for lfilter already has explicit cpu and CUDA. Do we really need a third fallback option? I guess there's an mps device type too. But shouldn't everything else get handled by cpu and cuda? If we removed the fallback, we wouldn't have to worry about porting the transpose, squeeze, and index_put operations to the stable ABI.
If it's important to keep support for other devices, we could also port this generic code to python instead.