Skip to content

Conversation

abhishek-singh591
Copy link
Contributor

@abhishek-singh591 abhishek-singh591 commented Aug 23, 2025

Optimized ONNX Transform via Class Merging and Thread Pooling

This PR follows up on #539 – Optimized ONNX transform class via multithreading.

It merges the FP16 and Split ONNX transform classes into a single implementation to eliminate redundant tensor loading and iteration. Additionally, the transform logic has been refactored to use a thread pool, replacing the previous sequential loop to parallelize tensor operations.

Performance Benchmarks:-

Model Original Duration (s) Optimized Duration (s)
LLaMA 3.1 8B 88.35 58.55
LLaMA 3.1 70B 1029.82 727.37

Note: Thread count is set to os.cpu_count() * 4 to better handle I/O-bound workloads. Performance may vary depending on system hardware and threading capabilities.

Copy link
Contributor

@ochougul ochougul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. can merge if CI is passing

@abhishek-singh591 abhishek-singh591 force-pushed the optimized_onnx_tranform branch 2 times, most recently from 00fcaf2 to 9b9c41d Compare September 5, 2025 11:12
Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com>
…added removed comments

Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com>
…added removed comments

Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com>
Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com>
Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com>
Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants