Skip to content

Conversation

Dong1017
Copy link
Contributor

@Dong1017 Dong1017 commented Sep 17, 2025

What does this PR do?

Adds

1. QwenImage Pipelines and Required Modules

(Comparable with Diffusers Master)

a. Pipelines

  • mindone.diffusers.QwenImagePipeline
  • mindone.diffusers.QwenImageImg2ImgPipeline
  • mindone.diffusers.QwenImageInpaintPipeline
  • mindone.diffusers.QwenImageEditPipeline
  • mindone.diffusers.QwenImageEditInpaintPipeline

b. Modules

  • mindone.diffusers.models.AutoencoderQwenImage
  • mindone.diffusers.models.QwenImageTransformer2DModel
  • mindone.diffusers.loaders.QwenImageLoraLoaderMixin

2. add UTs of pipelines

  • All UTs were setup according to Diffusers Master, accessed in Sep 17, 2025.
    • tests/diffusers_tests/pipelines/qwenimage/test_qwenimage.py
    • tests/diffusers_tests/pipelines/qwenimage/test_qwenimage_img2img.py
    • tests/diffusers_tests/pipelines/qwenimage/test_qwenimage_inpaint.py
    • tests/diffusers_tests/pipelines/qwenimage/test_qwenimage_edit.py
  • Using MindSpore 2.7.0 can pass both fp32 and bf16 UTs.
  • Using MindSpore 2.6.0 can pass bf16 UTs, while fp32 will happen to TypeError.

Usage

  • QwenImagePipeline
import mindspore as ms 
from mindone.diffusers import QwenImagePipeline 

pipe = QwenImagePipeline.from_pretrained("Qwen/Qwen-Image", mindspore_dtype=ms.bfloat16) 
prompt = "A cat holding a sign that says hello world" 
# Depending on the variant being used, the pipeline call will slightly vary. 
# Refer to the pipeline documentation for more details. 
image = pipe(prompt, num_inference_steps=50)[0][0] 
image.save("qwenimage.png") 
  • QwenImageImg2ImgPipeline
import mindspore as ms 
from mindone.diffusers import QwenImageImg2ImgPipeline
from mindone.diffusers.utils import load_image

pipe = QwenImageImg2ImgPipeline.from_pretrained("Qwen/Qwen-Image", mindspore_dtype=mindspore.bfloat16)
url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
init_image = load_image(url).resize((1024, 1024))
prompt = "cat wizard, gandalf, lord of the rings, detailed, fantasy, cute, adorable, Pixar, Disney"
images = pipe(prompt=prompt, negative_prompt=" ", image=init_image, strength=0.95)[0][0]
images.save("qwenimage_img2img.png")
  • QwenImageInpaintPipeline
import mindspore as ms 
from mindone.diffusers import QwenImageInpaintPipeline 
from mindone.diffusers.utils import load_image 

pipe = QwenImageInpaintPipeline.from_pretrained("Qwen/Qwen-Image", mindspore_dtype=ms.bfloat16) 
prompt = "Face of a yellow cat, high resolution, sitting on a park bench" 
img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png" 
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png" 
source = load_image(img_url) 
mask = load_image(mask_url) 
image = pipe(prompt=prompt, negative_prompt=" ", image=source, mask_image=mask, strength=0.85)[0][0] 
image.save("qwenimage_inpainting.png") 
  • QwenImageEditPipeline
import mindspore as ms 
from PIL import Image 
from mindone.diffusers import QwenImageEditPipeline 
from mindone.diffusers.utils import load_image 

pipe = QwenImageEditPipeline.from_pretrained("Qwen/Qwen-Image-Edit", mindspore_dtype=ms.bfloat16) 
image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/yarn-art-pikachu.png").convert("RGB") 
prompt = ("Make Pikachu hold a sign that says 'Qwen Edit is awesome', yarn art style, detailed, vibrant colors") 
# Depending on the variant being used, the pipeline call will slightly vary. 
# Refer to the pipeline documentation for more details. 
image = pipe(image, prompt, num_inference_steps=50)[0][0] 
image.save("qwenimage_edit.png") 
  • QwenImageEditInpaintPipeline
import mindspore as ms 
from PIL import Image
from mindone.diffusers import QwenImageEditInpaintPipeline
from mindone.diffusers.utils import load_image

pipe = QwenImageEditInpaintPipeline.from_pretrained("Qwen/Qwen-Image-Edit", mindspore_dtype=mindspore.bfloat16)
prompt = "Face of a yellow cat, high resolution, sitting on a park bench"
img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"
source = load_image(img_url)
mask = load_image(mask_url)
image = pipe(prompt=prompt, negative_prompt=" ", image=source, mask_image=mask, strength=1.0, num_inference_steps=50)[0][0]
image.save("qwenimage_inpainting.png")

Performance

Experiments are tested on Ascend Atlas 800T A2 machines with MindSpore 2.7.0

Pipeline Weight Loading Time Mode Speed
QwenImagePipeline 15m21s Pynative 9.93 s/it
QwenImageImg2ImgPipeline 14m57s Pynative 9.56 s/it
QwenImageInpaintPipeline 10m10s Pynative 4.80 s/it
QwenImageEditPipeline 13m57s Pynative 13.25 s/it
QwenImageEditInpaintPipeline 13m20s Pynative 13.98 s/it

Limitation

QwenImageEditPipeline and QwenImageEditInpaintPipeline will load modules from Qwen-Image-Edit. The use of these two pipes requires manually changing image_processor_type from Qwen2VLImageProcessorFast to Qwen2VLImageProcessor in Qwen-Image-Edit/processor/preprocessor_config.json

Notes

  1. require transformers==4.52.1
  2. The produced pictures are nearly identical to those by Torch, when setting consistent random seed and hidden states from the text encoder.
  3. TODO: jit mode; LORA test; UTs of modules

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline?
  • Did you make sure to update the documentation with your changes? E.g. record bug fixes or new features in What's New. Here are the
    documentation guidelines
  • Did you build and run the code without any errors?
  • Did you report the running environment (NPU type/MS version) and performance in the doc? (better record it for data loading, model inference, or training tasks)
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@xxx

@SamitHuang
Copy link
Collaborator

How to fix the requirement of transformers==4.52.1?

@SamitHuang
Copy link
Collaborator

Can add an inference example and lora fine-tune example in examples folder, which helps introduce QwenImage

@SamitHuang SamitHuang mentioned this pull request Sep 22, 2025
6 tasks
@Dong1017
Copy link
Contributor Author

Dong1017 commented Sep 26, 2025

How to fix the requirement of transformers==4.52.1?

The main reason for using transformers==4.52.1 rather than transformers==4.50.0 is to avoid AttributeError and keep consistent with the requirements from Qwen-Image.
Using transformers==4.50.0 will raise the following AttributeError:

../../transformers/src/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py:1517: in __init__
    super().__init__(config)
../../transformers/src/transformers/modeling_utils.py:1898: in __init__
    self.generation_config = GenerationConfig.from_model_config(config) if self.can_generate() else None
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        decoder_config = model_config.get_text_config(decoder=True)
        if decoder_config is not model_config:
            default_generation_config = GenerationConfig()
>           decoder_config_dict = decoder_config.to_dict()
                                  ^^^^^^^^^^^^^^^^^^^^^^
E           AttributeError: 'dict' object has no attribute 'to_dict'

../../transformers/src/transformers/generation/configuration_utils.py:1287: AttributeError

Upgrading transformers from 4.50.0 to 4.52.1 or highr version will solve this error.

@vigo999 vigo999 added the new model add new model to mindone label Sep 29, 2025
@vigo999 vigo999 added this to mindone Sep 29, 2025
@vigo999 vigo999 moved this to In Progress in mindone Sep 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new model add new model to mindone
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

4 participants