Optimized ONNX Transform via Class Merging and Thread Pooling #546

abhishek-singh591 · 2025-08-23T09:53:23Z

Optimized ONNX Transform via Class Merging and Thread Pooling

This PR follows up on #539 – Optimized ONNX transform class via multithreading.

It merges the FP16 and Split ONNX transform classes into a single implementation to eliminate redundant tensor loading and iteration. Additionally, the transform logic has been refactored to use a thread pool, replacing the previous sequential loop to parallelize tensor operations.

Performance Benchmarks:-

Model	Original Duration (s)	Optimized Duration (s)
LLaMA 3.1 8B	88.35	58.55
LLaMA 3.1 70B	1029.82	727.37

Note: Thread count is set to os.cpu_count() * 4 to better handle I/O-bound workloads. Performance may vary depending on system hardware and threading capabilities.

QEfficient/base/onnx_transforms.py

ochougul

LGTM. can merge if CI is passing

QEfficient/base/onnx_transforms.py

Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com>

…added removed comments Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com>

Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com>

QEfficient/base/onnx_transforms.py

abhishek-singh591 requested review from ochougul, quic-amitraj, quic-hemagnih and quic-rishinr as code owners August 23, 2025 09:53

ochougul requested changes Sep 1, 2025

View reviewed changes

QEfficient/base/onnx_transforms.py Show resolved Hide resolved

QEfficient/base/onnx_transforms.py Show resolved Hide resolved

QEfficient/base/onnx_transforms.py Show resolved Hide resolved

abhishek-singh591 force-pushed the optimized_onnx_tranform branch from 2df6780 to a528b29 Compare September 3, 2025 09:40

abhishek-singh591 requested a review from ochougul September 3, 2025 10:06

ochougul requested changes Sep 4, 2025

View reviewed changes

QEfficient/base/onnx_transforms.py Show resolved Hide resolved

abhishek-singh591 force-pushed the optimized_onnx_tranform branch 2 times, most recently from 00fcaf2 to 9b9c41d Compare September 5, 2025 11:12

abhishek-singh591 requested a review from ochougul September 5, 2025 18:03

abhishek-singh591 added 6 commits September 10, 2025 06:23

merged fp16 and split in onnx transform

6361518

Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com>

Add warning when both flags apply_split and apply_clip are false and …

e957632

…added removed comments Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com>

Add warning when both flags apply_split and apply_clip are false and …

a9b01c3

…added removed comments Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com>

Fixed transform flag and other importing related issue

cfc4809

Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com>

Deleted run.py file

6910ece

Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com>

Modified the default model_name argument

f64b429

Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com>

abhishek-singh591 force-pushed the optimized_onnx_tranform branch from e83ac9e to f64b429 Compare September 10, 2025 06:24

quic-rishinr requested changes Sep 10, 2025

View reviewed changes

QEfficient/base/onnx_transforms.py Show resolved Hide resolved

abhishek-singh591 mentioned this pull request Oct 9, 2025

Added memory optimization for onnx transforms #538

Open

abhishek-singh591 closed this Oct 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimized ONNX Transform via Class Merging and Thread Pooling #546

Optimized ONNX Transform via Class Merging and Thread Pooling #546

Uh oh!

abhishek-singh591 commented Aug 23, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ochougul left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Optimized ONNX Transform via Class Merging and Thread Pooling #546

Optimized ONNX Transform via Class Merging and Thread Pooling #546

Uh oh!

Conversation

abhishek-singh591 commented Aug 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!