Skip to content
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 33 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,55 +5,72 @@ This repository contains SoTA algorithms, models, and interesting projects in th
ONE is short for "ONE for all"

## News
- [2025.09.15] We upgrade diffusers to v0.33.1 and transformers to v4.50.1 based on MindSpore. QwenImage, FluxKontext, Wan2.2, OmniGen2 and more than 20 generative models are now supported.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There are a few typos and inconsistencies in the new announcement. For better readability and consistency, "FluxKontext" should be "Flux Kontext", "Wan2.2" should be "Wan 2.2", and "OmniGen2" should be "OmniGen 2".

Suggested change
- [2025.09.15] We upgrade diffusers to v0.33.1 and transformers to v4.50.1 based on MindSpore. QwenImage, FluxKontext, Wan2.2, OmniGen2 and more than 20 generative models are now supported.
- [2025.09.15] We upgrade diffusers to v0.33.1 and transformers to v4.50.1 based on MindSpore. QwenImage, Flux Kontext, Wan 2.2, OmniGen 2 and more than 20 generative models are now supported.

- [2025.04.10] We release [v0.3.0](https://github.com/mindspore-lab/mindone/releases/tag/v0.3.0). More than 15 SoTA generative models are added, including Flux, CogView4, OpenSora2.0, Movie Gen 30B , CogVideoX 5B~30B. Have fun!
- [2025.02.21] We support DeepSeek [Janus-Pro](https://huggingface.co/deepseek-ai/Janus-Pro-7B), a SoTA multimodal understanding and generation model. See [here](examples/janus)
- [2024.11.06] [v0.2.0](https://github.com/mindspore-lab/mindone/releases/tag/v0.2.0) is released

## Quick tour

To install v0.3.0, please install [MindSpore 2.5.0](https://www.mindspore.cn/install) and run `pip install mindone`
We recommend to install the latest version from the `master` branch based on MindSpore 2.6.0:

Alternatively, to install the latest version from the `master` branch, please run.
```
git clone https://github.com/mindspore-lab/mindone.git
cd mindone
pip install -e .
```

We support state-of-the-art diffusion models for generating images, audio, and video. Let's get started using [Stable Diffusion 3](https://huggingface.co/stabilityai/stable-diffusion-3-medium) as an example.
We support state-of-the-art diffusion models for generating images, audio, and video. Let's get started using [Flux Kontext](https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev) as an example.

**Hello MindSpore** from **Stable Diffusion 3**!
**Hello MindSpore** from **Flux**!
<!-- TODO: add Flux Kontext or QwenImage running result -->

<div>
<img src="https://github.com/townwish4git/mindone/assets/143256262/8c25ae9a-67b1-436f-abf6-eca36738cd17" alt="sd3" width="512" height="512">
<img src="https://github.com/townwish4git/mindone/assets/143256262/8c25ae9a-67b1-436f-abf6-eca36738cd17" alt="flux_kontext" width="512" height="512">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The alt text for the image has a typo. flux_kontext should be Flux Kontext for better accessibility and consistency.

Suggested change
<img src="https://github.com/townwish4git/mindone/assets/143256262/8c25ae9a-67b1-436f-abf6-eca36738cd17" alt="flux_kontext" width="512" height="512">
<img src="https://github.com/townwish4git/mindone/assets/143256262/8c25ae9a-67b1-436f-abf6-eca36738cd17" alt="Flux Kontext" width="512" height="512">

</div>

```py
import mindspore
from mindone.diffusers import StableDiffusion3Pipeline
import mindspore as ms
from mindone.diffusers import FluxKontextPipeline
from mindone.diffusers.utils import load_image
import numpy as np

pipe = StableDiffusion3Pipeline.from_pretrained(
"stabilityai/stable-diffusion-3-medium-diffusers",
mindspore_dtype=mindspore.float16,
pipe = FluxKontextPipeline.from_pretrained(
"black-forest-labs/FLUX.1-Kontext-dev", mindspore_dtype=ms.bfloat16
)
prompt = "A cat holding a sign that says 'Hello MindSpore'"
image = pipe(prompt)[0][0]
image.save("sd3.png")

image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/yarn-art-pikachu.png").convert("RGB")
prompt = "Make Pikachu hold a sign that says 'MindSpore ONE', yarn art style, detailed, vibrant colors"
image = pipe(
image=image,
prompt=prompt,
guidance_scale=2.5,
generator=np.random.default_rng(42),
)[0][0]
image.save("flux-kontext.png")
```

### run hf diffusers on mindspore
- mindone diffusers is under active development, most tasks were tested with mindspore 2.5.0 on Ascend Atlas 800T A2 machines.
- compatibale with hf diffusers 0.32.2
- mindone diffusers is under active development, most tasks were tested with mindspore 2.6.0 on Ascend Atlas 800T A2 machines.
- compatibale with hf diffusers 0.33.1. diffusers 0.35 is under development.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's a typo in the word "compatible".

Suggested change
- compatibale with hf diffusers 0.33.1. diffusers 0.35 is under development.
- compatible with hf diffusers 0.33.1. diffusers 0.35 is under development.


| component | features
| :--- | :--
| [pipeline](https://github.com/mindspore-lab/mindone/tree/master/mindone/diffusers/pipelines) | support text-to-image,text-to-video,text-to-audio tasks 160+
| [pipeline](https://github.com/mindspore-lab/mindone/tree/master/mindone/diffusers/pipelines) | support text-to-image,text-to-video,text-to-audio tasks 240+
| [models](https://github.com/mindspore-lab/mindone/tree/master/mindone/diffusers/models) | support audoencoder & transformers base models same as hf diffusers 50+
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's a typo in the word "autoencoder".

Suggested change
| [models](https://github.com/mindspore-lab/mindone/tree/master/mindone/diffusers/models) | support audoencoder & transformers base models same as hf diffusers 50+
| [models](https://github.com/mindspore-lab/mindone/tree/master/mindone/diffusers/models) | support autoencoder & transformers base models same as hf diffusers 50+

| [schedulers](https://github.com/mindspore-lab/mindone/tree/master/mindone/diffusers/schedulers) | support diffusion schedulers (e.g., ddpm and dpm solver) same as hf diffusers 35+

### supported models under mindone/examples

<!-- TODO: update the links after PR merged-->

| task | model | inference | finetune | pretrain | institute |
| :--- | :--- | :---: | :---: | :---: | :-- |
| Text/Image-to-Image | [qwen_image](https://github.com/mindspore-lab/mindone/pull/1288) 🔥🔥🔥 | ✅ | ✖️ | ✖️ | Alibaba |
| Text/Image-to-Image | [flux_kontext](https://github.com/mindspore-lab/mindone/blob/master/docs/diffusers/api/pipelines/flux.md) 🔥🔥🔥 | ✅ | ✖️ | ✖️ | Black Forest Labs |
| Text/Image/Speech-to-Video | [wan2_2](https://github.com/mindspore-lab/mindone/pull/1243) 🔥🔥🔥 | ✅ | ✖️ | ✖️ | Alibaba |
| Text/Image-to-Image | [omni_gen](https://github.com/mindspore-lab/mindone/blob/master/examples/omnigen) 🔥🔥 | ✅ | ✅ | ✖️ | Vector Space Lab|
| Text/Image-to-Image | [omni_gen2](https://github.com/mindspore-lab/mindone/blob/master/examples/omnigen2) 🔥🔥 | ✅ | ✖️ | ✖️ | Vector Space Lab |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The model names in this table are inconsistent and could be improved for readability. For example, qwen_image could be Qwen Image, flux_kontext could be Flux Kontext, wan2_2 could be Wan 2.2, omni_gen could be OmniGen, and omni_gen2 could be OmniGen 2. This would make the table easier to read and consistent with how models are usually named.

Suggested change
| Text/Image-to-Image | [qwen_image](https://github.com/mindspore-lab/mindone/pull/1288) 🔥🔥🔥 || ✖️ | ✖️ | Alibaba |
| Text/Image-to-Image | [flux_kontext](https://github.com/mindspore-lab/mindone/blob/master/docs/diffusers/api/pipelines/flux.md) 🔥🔥🔥 || ✖️ | ✖️ | Black Forest Labs |
| Text/Image/Speech-to-Video | [wan2_2](https://github.com/mindspore-lab/mindone/pull/1243) 🔥🔥🔥 || ✖️ | ✖️ | Alibaba |
| Text/Image-to-Image | [omni_gen](https://github.com/mindspore-lab/mindone/blob/master/examples/omnigen) 🔥🔥 ||| ✖️ | Vector Space Lab|
| Text/Image-to-Image | [omni_gen2](https://github.com/mindspore-lab/mindone/blob/master/examples/omnigen2) 🔥🔥 || ✖️ | ✖️ | Vector Space Lab |
| Text/Image-to-Image | [Qwen Image](https://github.com/mindspore-lab/mindone/pull/1288) 🔥🔥🔥 || ✖️ | ✖️ | Alibaba |
| Text/Image-to-Image | [Flux Kontext](https://github.com/mindspore-lab/mindone/blob/master/docs/diffusers/api/pipelines/flux.md) 🔥🔥🔥 || ✖️ | ✖️ | Black Forest Labs |
| Text/Image/Speech-to-Video | [Wan 2.2](https://github.com/mindspore-lab/mindone/pull/1243) 🔥🔥🔥 || ✖️ | ✖️ | Alibaba |
| Text/Image-to-Image | [OmniGen](https://github.com/mindspore-lab/mindone/blob/master/examples/omnigen) 🔥🔥 ||| ✖️ | Vector Space Lab|
| Text/Image-to-Image | [OmniGen 2](https://github.com/mindspore-lab/mindone/blob/master/examples/omnigen2) 🔥🔥 || ✖️ | ✖️ | Vector Space Lab |

| Image-to-Video | [hunyuanvideo-i2v](https://github.com/mindspore-lab/mindone/blob/master/examples/hunyuanvideo-i2v) 🔥🔥 | ✅ | ✖️ | ✖️ | Tencent |
| Text/Image-to-Video | [wan2.1](https://github.com/mindspore-lab/mindone/blob/master/examples/wan2_1) 🔥🔥🔥 | ✅ | ✖️ | ✖️ | Alibaba |
| Text-to-Image | [cogview4](https://github.com/mindspore-lab/mindone/blob/master/examples/cogview) 🔥🔥🔥 | ✅ | ✖️ | ✖️ | Zhipuai |
Expand Down