Skip to content

Conversation

@Eyoel-gebre
Copy link

@Eyoel-gebre Eyoel-gebre commented Apr 22, 2025

Adding low rank linear layer logic to support the new lite-whisper model:

https://huggingface.co/efficient-speech/lite-whisper-large-v3-turbo

Adds low rank linear layer logic.

@bzikst
Copy link

bzikst commented Apr 26, 2025

Hey @Eyoel-gebre, thank you for your work. I've successfully converted a model to ct2 format using your code: https://huggingface.co/bzikst/lite-whisper-large-v3-acc-ct2, but when I try to load it it throws IndexError: variable encoder/layer_0/self_attention/linear_0/weight not found. Any hints of what I'm doing wong?

@Eyoel-gebre
Copy link
Author

Eyoel-gebre commented Apr 27, 2025

Hey @Eyoel-gebre, thank you for your work. I've successfully converted a model to ct2 format using your code: https://huggingface.co/bzikst/lite-whisper-large-v3-acc-ct2, but when I try to load it it throws IndexError: variable encoder/layer_0/self_attention/linear_0/weight not found. Any hints of what I'm doing wong?

There's a small miss-match between what the converted model produces and what the C++ model execution code expects due to fused layer logic: #1887

I think I have a work around and hope to get things wrapped up soon though.

Copy link
Contributor

@MahmoudAshraf97 MahmoudAshraf97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the effort, if QKV will not be fused, we should evaluate the performance loss due to not fusing vs the gains from low rank, because this might render the whole approach useless if no gains are expected from it

"""
self._model_name_or_path = model_name_or_path
self._model_processor_name = model_name_or_path
if model_name_or_path.startswith('efficient-speech/lite-whisper'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the best approach, since a different org might upload their own model and then this condition will evaluate to false

if hasattr(transformers, loader.architecture_name):
model_class = getattr(transformers, loader.architecture_name)
model = self.load_model(model_class, self._model_name_or_path, **kwargs)
elif self._model_name_or_path.startswith('efficient-speech/lite-whisper'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@HuangJian2024
Copy link

Thanks for your effort! . I've successfully converted a model to ct2 format using your code, https://github.com/Eyoel-gebre/CTranslate2.git. But, when I used the converted model to infer, the results were not normal. Any hints? Or could you please provide the example to convert and infer?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants