Skip to content

QuenithAI/T2I-Generation-Paper-List

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Awesome Text-to-Image Generation by QuenithAI

A curated collection of papers, models, and resources for the field of Text-to-Image Generation.

Awesome   PRs Welcome   Issues Welcome

Note

This repository is proudly maintained by the frontline research mentors at QuenithAI (应达学术). It aims to provide the most comprehensive and cutting-edge map of papers and technologies in the field of Text-to-Image generation.

Your contributions are also vital—feel free to open an issue or submit a pull request to become a collaborator of this repository. We expect your participation!

If you require expert 1-on-1 guidance on your submissions to top-tier conferences and journals, we invite you to contact us via WeChat or E-mail.


本仓库由 「应达学术」(QuenithAI) 的一线科研导师团队倾力打造并持续维护,旨在为您呈现文生图领域最全面、最前沿的论文。

您的贡献对我们和社区来说至关重要——我们诚邀有志之士通过 open an issuesubmit a pull request 来成为这个项目的合作者之一,期待您的加入!

如果您在冲刺科研顶会的道路上需要专业的1V1指导,欢迎通过微信邮件联系我们

⚡ Latest Updates

📚 Table of Contents


📜 Papers & Models

✍️ Survey Papers

⇧ Back to ToC

🖼️ Text-to-Image Generation

✨ 2025

✅ Published Papers

  • [CVPR 2025] PreciseCam: Precise Camera Control for Text-to-Image Generation
    Paper Project Page GitHub Hugging Face

  • [CVPR 2025] Type‑R: Automatically Retouching Typos for Text‑to‑Image Generation
    Paper GitHub Hugging Face

  • [CVPR 2025] Compass Control: Multi Object Orientation Control for Text‑to‑Image Generation
    Paper Project Page

  • [CVPR 2025] Generative Photography: Scene‑Consistent Camera Control for Realistic Text‑to‑Image Synthesis
    Paper Project Page GitHub Hugging Face

  • [CVPR 2025] One‑Way Ticket: Time‑Independent Unified Encoder for Distilling Text‑to‑Image Diffusion Models
    Paper GitHub Hugging Face

  • [CVPR 2025] Text Embedding is Not All You Need: Attention Control for Text‑to‑Image Semantic Alignment with Text Self‑Attention Maps
    Paper Project Page GitHub

  • [CVPR 2025] Towards Uncertainty: Understanding and Quantifying Uncertainty for Text‑to‑Image Generation
    Paper GitHub

  • [CVPR 2025] Responsible Diffusion: Plug‑and‑Play Interpretable Responsible Text‑to‑Image Generation via Dual‑Space Multi‑faceted Concept Control
    Paper Project Page GitHub

  • [CVPR 2025] Make It Count: Text‑to‑Image Generation with an Accurate Number of Objects
    Paper Project Page GitHub

  • [CVPR 2025] MCCD: Multi‑Agent Collaboration‑based Compositional Diffusion for Complex Text‑to‑Image Generation
    Paper

  • [CVPR 2025] Debias‑SD: Rethinking Training for De‑biasing Text‑to‑Image Generation: Unlocking the Potential of Stable Diffusion
    Paper

  • [CVPR 2025] ShapeWords: Guiding Text‑to‑Image Synthesis with 3D Shape‑Aware Prompts
    Paper Project Page GitHub Hugging Face

  • [CVPR 2025] SnapGen: Taming High‑Resolution Text‑to‑Image Models for Mobile Devices with Efficient Architectures and Training
    Paper Project Page

  • [CVPR 2025] STORM: Spatial Transport Optimization by Repositioning Attention Map for Training‑Free Text‑to‑Image Synthesis
    Paper Project Page GitHub

  • [CVPR 2025] Focus‑N‑Fix: Region‑Aware Fine‑Tuning for Text‑to‑Image Generation
    Paper

  • [CVPR 2025] SILMM: Self‑Improving Large Multimodal Models for Compositional Text‑to‑Image Generation
    Paper Project Page GitHub

  • [CVPR 2025] GLoCE: Localized Concept Erasure for Text‑to‑Image Diffusion Models Using Training‑Free Gated Low‑Rank Adaptation
    Paper Project Page GitHub

  • [CVPR 2025] Self‑Cross Guidance: Self‑Cross Diffusion Guidance for Text‑to‑Image Synthesis of Similar Subjects
    Paper Project Page GitHub

  • [CVPR 2025] Noise Diffusion: Enhancing Semantic Faithfulness in Text‑to‑Image Synthesis
    Paper GitHub

  • [CVPR 2025] PromptSampler: Learning to Sample Effective and Diverse Prompts for Text‑to‑Image Generation
    Paper GitHub

  • [CVPR 2025] STEREO: A Two‑Stage Framework for Adversarially Robust Concept Erasing from Text‑to‑Image Diffusion Models
    Paper GitHub

  • [CVPR 2025] MinorityPrompt: Minority‑Focused Text‑to‑Image Generation via Prompt Optimization
    Paper GitHub

  • [CVPR 2025] DistillT5: Scaling Down Text Encoders of Text‑to‑Image Diffusion Models
    Paper GitHub

  • [CVPR 2025] TIU: The Illusion of Unlearning: The Unstable Nature of Machine Unlearning in Text‑to‑Image Diffusion Models
    Paper GitHub

  • [CVPR 2025] Fuse‑DiT: Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text‑to‑Image Synthesis
    Paper GitHub Hugging Face

  • [CVPR 2025] Detect‑and‑Guide: Self‑regulation of Diffusion Models for Safe Text‑to‑Image Generation via Guideline Token Optimization
    Paper

  • [CVPR 2025] Multi‑Group T2I: Multi‑Group Proportional Representations for Text‑to‑Image Models
    Paper GitHub

  • [CVPR 2025] VODiff: Controlling Object Visibility Order in Text‑to‑Image Generation
    Paper GitHub

  • [CVPR 2025] Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator
    Paper Project Page GitHub Hugging Face

  • [CVPR 2025] Six‑CD: Benchmarking Concept Removals for Text-to-image Diffusion Models
    Paper GitHub

  • [CVPR 2025] ConceptGuard: Continual Personalized Text-to-Image Generation with Forgetting and Confusion Mitigation
    Paper

  • [CVPR 2025] ChatGen: Automatic Text-to-Image Generation From FreeStyle Chatting
    Paper Project Page GitHub Hugging Face

  • [ICLR 2025] Improving Long‑Text Alignment: Improving Long‑Text Alignment for Text‑to‑Image Diffusion Models
    Paper GitHub

  • [ICLR 2025] ITTA: Information Theoretic Text‑to‑Image Alignment
    Paper GitHub

  • [ICLR 2025] Meissonic: Revitalizing Masked Generative Transformers for Efficient High‑Resolution Text‑to‑Image Synthesis
    Paper Project Page GitHub Hugging Face

  • [ICLR 2025] PaRa: Personalizing Text‑to‑Image Diffusion via Parameter Rank Reduction
    Paper

  • [ICLR 2025] Fluid: Scaling Autoregressive Text‑to‑image Generative Models with Continuous Tokens
    Paper

  • [ICLR 2025] Prompt‑Pruning: Not All Prompts Are Made Equal – Prompt‑based Pruning of Text‑to‑Image Diffusion Models
    Paper GitHub Hugging Face

  • [ICLR 2025] Denoising AR Transformers: Denoising Autoregressive Transformers for Scalable Text‑to‑Image Generation
    Paper

  • [ICLR 2025] Progressive Compositionality: Progressive Compositionality in Text‑to‑Image Generative Models
    Paper Project Page GitHub

  • [ICLR 2025] Classifier Scores: Mining your own secrets: Diffusion Classifier Scores for Continual Personalization of Text‑to‑Image Diffusion Models
    Paper Project Page

  • [ICLR 2025] Engagement: Measuring and Improving Engagement of Text‑to‑Image Generation Models
    Paper Project Page

  • [ICLR 2025] Residual Gate Eraser: Concept Pinpoint Eraser for Text‑to-image Diffusion Models via Residual Attention Gate
    Paper GitHub

  • [ICLR 2025] Random Seeds: Enhancing Compositional Text‑to‑Image Generation with Reliable Random Seeds
    Paper GitHub

  • [ICLR 2025] One‑Prompt‑One‑Story: Free‑Lunch Consistent Text‑to‑Image Generation Using a Single Prompt
    Paper Project Page GitHub

  • [ICLR 2025] You Only Sample Once: Taming One‑Step Text‑to‑Image Synthesis by Self‑Cooperative Diffusion GANs
    Paper Project Page GitHub Hugging Face

  • [ICLR 2025] Copyright Revisiting: Rethinking Artistic Copyright Infringements in the Era of Text‑to‑Image Generative Models
    Paper

  • [ICLR 2025] Concept Combination Erasing: Erasing Concept Combination from Text‑to‑Image Diffusion Model
    Paper

  • [ICLR 2025] Cross‑Attention Patterns: Cross‑Attention Head Position Patterns Can Align with Human Visual Concepts in Text‑to‑Image Generative Models
    Paper GitHub

  • [ICLR 2025] TIGeR: Unifying Text‑to‑Image Generation and Retrieval with Large Multimodal Models
    Paper Project Page GitHub

  • [ICLR 2025] DGQ: Distribution‑Aware Group Quantization for Text‑to‑Image Diffusion Models
    Paper GitHub

  • [ICLR 2025] Jacobi Decoding: Accelerating Auto‑regressive Text‑to‑Image Generation with Training‑free Speculative Jacobi Decoding
    Paper GitHub

  • [ICLR 2025] PT‑T2I/V: An Efficient Proxy‑Tokenized Diffusion Transformer for Text‑to‑Image/Video Task
    Paper Project Page GitHub

  • [ICLR 2025] Gecko Evaluation: Revisiting Text‑to‑Image Evaluation with Gecko: on Metrics, Prompts, and Human Rating
    Paper GitHub

  • [ICLR 2025] SANA: Efficient High‑Resolution Text‑to‑Image Synthesis with Linear Diffusion Transformers
    Paper Project Page GitHub Hugging Face

  • [ICLR 2025] Rectified Flow: Text‑to‑Image Rectified Flow as Plug‑and‑Play Priors
    Paper GitHub

  • [ICLR 2025] Human Feedback Filtering: Automated Filtering of Human Feedback Data for Aligning Text‑to‑Image Diffusion Models
    Paper

  • [ICLR 2025] SAFREE: Training‑Free and Adaptive Guard for Safe Text‑to‑Image and Video Generation
    Paper Project Page GitHub

  • [ICLR 2025] IterComp: Iterative Composition‑Aware Feedback Learning from Model Gallery for Text‑to‑Image Generation
    Paper GitHub Hugging Face

  • [ICLR 2025] ScImage: How good are multimodal large language models at scientific text‑to‑image generation?
    Paper Hugging Face

  • [ICLR 2025] Score Distillation: Guided Score Identity Distillation for Data‑Free One‑Step Text‑to‑Image Generation
    Paper GitHub Hugging Face

  • [ICLR 2025] Causal Variation: Evaluating Semantic Variation in Text‑to‑Image Synthesis: A Causal Perspective
    Paper GitHub

💡 Pre-Print Papers

✨ 2024

✅ Published Papers

  • [CVPR 2024] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
    Paper GitHub

  • [CVPR 2024] InstanceDiffusion: Instance-level Control for Image Generation
    Paper Project Page GitHub

  • [CVPR 2024] ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations
    Paper Project Page GitHub Hugging Face

  • [CVPR 2024] Instruct-Imagen: Image Generation with Multi-modal Instruction
    Paper

  • [CVPR 2024] Continuous 3D Words: Learning Continuous 3D Words for Text-to-Image Generation
    Paper GitHub

  • [CVPR 2024] HanDiffuser: Text-to-Image Generation With Realistic Hand Appearances
    Paper

  • [CVPR 2024] Rich Human Feedback: Rich Human Feedback for Text-to-Image Generation
    Paper

  • [CVPR 2024] MarkovGen: Structured Prediction for Efficient Text-to-Image Generation
    Paper

  • [CVPR 2024] Customization Assistant: Customization Assistant for Text-to-image Generation
    Paper

  • [CVPR 2024] ADI: Learning Disentangled Identifiers for Action-Customized Text-to-Image Generation
    Paper Project Page

  • [CVPR 2024] UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs
    Paper

  • [CVPR 2024] Interpret Diffusion: Self-Discovering Interpretable Diffusion Latent Directions for Responsible Text-to-Image Generation
    Paper

  • [CVPR 2024] Tailored Visions: Enhancing Text-to-Image Generation with Personalized Prompt Rewriting
    Paper GitHub

  • [CVPR 2024] CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster Image Generation
    Paper Project Page GitHub Hugging Face

  • [CVPR 2024] Arbitrary‑Scale Diffusion: Arbitrary-Scale Image Generation and Upsampling using Latent Diffusion Model and Implicit Neural Decoder
    Paper

  • [CVPR 2024] Human-Centric Priors: Towards Effective Usage of Human-Centric Priors in Diffusion Models for Text-based Human Image Generation
    Paper

  • [CVPR 2024] ElasticDiffusion: Training-free Arbitrary Size Image Generation
    Paper Project Page GitHub

  • [CVPR 2024] CosmicMan: A Text-to-Image Foundation Model for Humans
    Paper Project Page GitHub

  • [CVPR 2024] PanFusion: Taming Stable Diffusion for Text to 360° Panorama Image Generation
    Paper Project Page GitHub

  • [CVPR 2024] Intelligent Grimm: Open-ended Visual Storytelling via Latent Diffusion Models
    Paper Project Page GitHub

  • [CVPR 2024] Scalability: On the Scalability of Diffusion-based Text-to-Image Generation
    Paper

  • [CVPR 2024] MuLAn: A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation
    Paper Project Page Hugging Face

  • [CVPR 2024] Multi-dimensional Preferences: Learning Multi-dimensional Human Preference for Text-to-Image Generation
    Paper

  • [CVPR 2024] Dynamic Prompts: Dynamic Prompt Optimizing for Text-to-Image Generation
    Paper

  • [CVPR 2024] Reinforcement Diversification: Training Diffusion Models Towards Diverse Image Generation with Reinforcement Learning
    Paper

  • [CVPR 2024] HypercGAN: Adversarial Text to Continuous Image Generation
    Paper Project Page

  • [CVPR 2024] EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models
    Paper GitHub

  • [ECCV 2024] LaVi‑Bridge: Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation
    Paper Project Page GitHub

  • [ECCV 2024] DiffPNG: Exploring Phrase-Level Grounding with Text-to-Image Diffusion Model
    Paper GitHub

  • [ECCV 2024] SPRIGHT: Getting it Right: Improving Spatial Consistency in Text-to-Image Models
    Paper Project Page GitHub

  • [ECCV 2024] IndicTTI: Navigating Text-to-Image Generative Bias across Indic Languages
    Paper Project Page

  • [ECCV 2024] Safeguard T2I: Safeguard Text-to-Image Diffusion Models with Human Feedback Inversion
    Paper

  • [ECCV 2024] Reality-and-Fantasy: The Fabrication of Reality and Fantasy: Scene Generation with LLM-Assisted Prompt Interpretation
    Paper Project Page GitHub

  • [ECCV 2024] RECE: Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models
    Paper GitHub

  • [ECCV 2024] StyleTokenizer: Defining Image Style by a Single Instance for Controlling Diffusion Models
    Paper GitHub

  • [ECCV 2024] PEA-Diffusion: Parameter-Efficient Adapter with Knowledge Distillation in non-English Text-to-Image Generation
    Paper GitHub

  • [ECCV 2024] Skewed Relations T2I: Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation
    Paper GitHub

  • [ECCV 2024] Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation
    Paper

  • [ECCV 2024] MobileDiffusion: Instant Text-to-Image Generation on Mobile Devices
    Paper

  • [ECCV 2024] PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
    Paper Project Page GitHub

  • [ECCV 2024] CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion
    Paper GitHub

  • [ICLR 2024] Patched Diffusion Models: Patched Denoising Diffusion Models For High-Resolution Image Synthesis
    Paper GitHub

  • [ICLR 2024] Relay Diffusion: Unifying diffusion process across resolutions for image synthesis
    Paper GitHub

  • [ICLR 2024] SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis
    Paper GitHub

  • [ICLR 2024] Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis
    Paper GitHub

  • [ICLR 2024] PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
    Paper Project Page GitHub Hugging Face

  • [SIGGRAPH 2024] RGB↔X: Image Decomposition and Synthesis Using Material- and Lighting-aware Diffusion Models
    Paper Project Page

  • [AAAI 2024] Semantic-aware Augmentation: Semantic-aware Data Augmentation for Text-to-image Synthesis
    Paper

  • [AAAI 2024] Abstract Concepts: Text-to-Image Generation for Abstract Concepts
    Paper

💡 Pre-Print Papers

✨ 2023

✅ Published Papers

  • [CVPR 2023] GigaGAN: Scaling Up GANs for Text-to-Image Synthesis
    Paper Project Page GitHub

  • [CVPR 2023] ERNIE-ViLG 2.0: Improving Text-to-Image Diffusion Model With Knowledge-Enhanced Mixture-of-Denoising-Experts
    Paper

  • [CVPR 2023] Shifted Diffusion: Shifted Diffusion for Text-to-image Generation
    Paper GitHub

  • [CVPR 2023] GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis
    Paper GitHub

  • [CVPR 2023] Specialist Diffusion: Plug-and-Play Sample-Efficient Fine-Tuning of Text-to-Image Diffusion Models to Learn Any Unseen Style
    Paper GitHub

  • [CVPR 2023] Verifiable Evaluation: Toward Verifiable and Reproducible Human Evaluation for Text-to-Image Generation
    Paper

  • [CVPR 2023] RIATIG: Reliable and Imperceptible Adversarial Text-to-Image Generation with Natural Prompts
    Paper GitHub

  • [CVPR 2023] Custom Diffusion: Multi-Concept Customization of Text-to-Image Diffusion
    Paper Project Page GitHub

  • [ICCV 2023] DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning
    Paper GitHub

  • [NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation
    Paper GitHub

  • [NeurIPS 2023] RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths
    Paper Project Page

  • [NeurIPS 2023] Linguistic Binding: Linguistic Binding in Diffusion Models: Enhancing Attribute Correspondence through Attention Map Alignment
    Paper GitHub

  • [NeurIPS 2023] DenseDiffusion: Dense Text-to-Image Generation with Attention Modulation
    Paper GitHub

  • [ICLR 2023] Structured Diffusion Guidance: Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis
    Paper GitHub

  • [ICML 2023] StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis
    Paper Project Page GitHub

  • [ICML 2023] Muse: Text-To-Image Generation via Masked Generative Transformers
    Paper Project Page GitHub

  • [ICML 2023] UniDiffusers: One Transformer Fits All Distributions in Multi-Modal Diffusion at Scale
    Paper GitHub

  • [ACM MM 2023] SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models
    Paper GitHub

  • [ACM MM 2023] ControlStyle: Text-Driven Stylized Image Generation Using Diffusion Priors
    Paper

  • [SIGGRAPH 2023] Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models
    Paper Project Page GitHub Hugging Face

💡 Pre-Print Papers

⇧ Back to ToC

🕹️ Conditional Image Generation

✨ 2025

✅ Published Papers

  • [AAAI 2025] Simple-ControlNet: Simplifying Control Mechanism in Text-to-Image Diffusion
    Paper GitHub Hugging Face

  • [AAAI 2025] EMControl: Adding Conditional Control to Text-to-Image Diffusion Models via EM
    Paper Project Page GitHub Hugging Face

  • [AAAI 2025] Local Conditional Controlling for Text-to-Image Diffusion Models
    Paper

  • [AAAI 2025] VersaGen: Versatile Visual Control for Text-to-Image Diffusion
    Paper GitHub

  • [AAAI 2025] Fair Text-to-Image Diffusion via Fair Mapping
    Paper

  • [ICLR 2025] IFAdapter: Instance Feature Control for Grounded T2I
    Paper Project Page GitHub Hugging Face

  • [ICLR 2025] LayerFusion / Harmonized Multi-Layer T2I (Foreground+Background)
    Paper Project Page

  • [ICLR 2025] Enhancing Compositional T2I with Reliable Random Seeds
    Paper GitHub

💡 Pre-Print Papers

✨ 2024

✅ Published Papers

  • [CVPR 2024] PLACE: Adaptive Layout‑Semantic Fusion for Semantic Image Synthesis
    Paper GitHub

  • [CVPR 2024] One‑Shot Structure‑Aware Stylized Image Synthesis: One‑Shot Structure‑Aware Stylized Image Synthesis
    Paper GitHub

  • [CVPR 2024] Attention Refocusing: Grounded Text‑to‑Image Synthesis with Attention Refocusing
    Paper Project Page GitHub Hugging Face

  • [CVPR 2024] CFLD: Coarse‑to‑Fine Latent Diffusion for Pose‑Guided Person Image Synthesis
    Paper GitHub

  • [CVPR 2024] DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception
    Paper

  • [CVPR 2024] CAN: Condition‑Aware Neural Network for Controlled Image Generation
    Paper

  • [CVPR 2024] SceneDiffusion: Move Anything with Layered Scene Diffusion
    Paper

  • [CVPR 2024] Zero‑Painter: Training‑Free Layout Control for Text‑to‑Image Synthesis
    Paper GitHub

  • [CVPR 2024] MIGC: Multi‑Instance Generation Controller for Text‑to‑Image Synthesis
    Paper Project Page GitHub

  • [CVPR 2024] FreeControl: Training‑Free Spatial Control of Any Text‑to‑Image Diffusion Model with Any Condition
    Paper GitHub

  • [ECCV 2024] PreciseControl: Enhancing Text‑To‑Image Diffusion Models with Fine‑Grained Attribute Control
    Paper Project Page GitHub

  • [ECCV 2024] AnyControl: Create Your Artwork with Versatile Control on Text‑to‑Image Generation
    Paper GitHub

  • [NeurIPS 2024] Ctrl‑X: Controlling Structure and Appearance for Text‑To‑Image Generation Without Guidance
    Paper Project Page GitHub

  • [ICLR 2024] PCDMs: Advancing Pose‑Guided Image Synthesis with Progressive Conditional Diffusion Models
    Paper GitHub

  • [WACV 2024] Layout Control with Cross‑Attention Guidance: Training‑Free Layout Control with Cross‑Attention Guidance
    Paper Project Page GitHub Hugging Face

  • [AAAI 2024] SSMG: Spatial‑Semantic Map Guided Diffusion Model for Free‑form Layout‑to‑image Generation
    Paper

  • [AAAI 2024] Attention Map Control: Compositional Text‑to‑Image Synthesis with Attention Map Control of Diffusion Models
    Paper GitHub

💡 Pre-Print Papers

✨ 2023

✅ Published Papers

  • [CVPR 2023] GLIGEN: Open-Set Grounded Text-to-Image Generation
    Paper Project Page GitHub Hugging Face

  • [CVPR 2022] Autoregressive Image Generation: Using Residual Quantization
    Paper GitHub

  • [CVPR 2023] SpaText: Spatio-Textual Representation for Controllable Image Generation
    Paper Project Page

  • [CVPR 2022] Text to Image Generation with Semantic-Spatial Aware GAN: Text to Image Generation with Semantic-Spatial Aware GAN
    Paper

  • [CVPR 2023] ReCo: Region-Controlled Text-to-Image Generation
    Paper GitHub

  • [CVPR 2023] LayoutDiffusion: Controllable Diffusion Model for Layout-to-Image Generation
    Paper GitHub

  • [ICLR 2023] Ctrl-U: Robust Conditional Image Generation via Uncertainty-aware Reward Modeling
    Paper Project Page GitHub

  • [ICCV 2023] ControlNet: Adding Conditional Control to Text-to-Image Diffusion Models
    Paper GitHub Hugging Face

  • [ICCV 2023] SceneGenie: Scene Graph Guided Diffusion Models for Image Synthesis
    Paper

  • [ICCV 2023] ZestGuide: Zero-Shot Spatial Layout Conditioning for Text-to-Image Diffusion Models
    Paper

  • [ICML 2023] Composer: Creative and Controllable Image Synthesis with Composable Conditions
    Paper Project Page GitHub

  • [ICML 2023] MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation
    Paper Project Page GitHub Hugging Face

  • [SIGGRAPH 2023] Sketch-Guided Text-to-Image Diffusion Models: Sketch-Guided Text-to-Image Diffusion Models
    Paper Project Page GitHub

  • [NeurIPS 2023] Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models
    Paper Project Page GitHub

  • [NeurIPS 2023] Prompt Diffusion: In-Context Learning Unlocked for Diffusion Models
    Paper Project Page GitHub

  • [WACV 2023] More Control for Free!: Image Synthesis with Semantic Diffusion Guidance
    Paper

  • [ACM MM 2023] LayoutLLM-T2I: Eliciting Layout Guidance from LLM for Text-to-Image Generation
    Paper

💡 Pre-Print Papers

⇧ Back to ToC

🎨 Personalized Image Generation

✨ 2025

✅ Published Papers

  • [CVPR 2025] SerialGen: Personalized Image Generation by First Standardization Then Personalization
    Paper Project Page GitHub

  • [CVPR 2025] PatchDPO: Patch-level DPO for Finetuning-free Personalized Image Generation
    Paper Project Page GitHub Hugging Face

  • [CVPR 2025] DreamCache: Finetuning-Free Lightweight Personalized Image Generation via Feature Caching
    Paper Project Page GitHub

  • [NeurIPS 2025] MS-Diffusion: Multi-Subject Zero-shot Image Personalization with Layout Guidance
    Paper Project Page GitHub Hugging Face

  • [NeurIPS 2025] ClassDiffusion: More Aligned Personalization Tuning with Explicit Class Guidance
    Paper Project Page GitHub

  • [NeurIPS 2025] DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation
    Paper Project Page GitHub Hugging Face

  • [NeurIPS 2025] TweedieMix: Improving Multi-Concept Fusion for Diffusion-based Image/Video Generation
    Paper Project Page GitHub

💡 Pre-Print Papers

✨ 2024

✅ Published Papers

  • [CVPR 2024] Cross Initialization: Personalized Text‑to‑Image Generation
    Paper

  • [CVPR 2024] When StyleGAN Meets Stable Diffusion: a W+ Adapter for Personalized Image Generation
    Paper Project Page GitHub

  • [CVPR 2024] Style Aligned: Image Generation via Shared Attention
    Paper Project Page GitHub

  • [CVPR 2024] InstantBooth: Personalized Text‑to‑Image Generation without Test‑Time Finetuning
    Paper Project Page

  • [CVPR 2024] High Fidelity: Person‑centric Subject‑to‑Image Synthesis
    Paper

  • [CVPR 2024] RealCustom: Narrowing Real Text Word for Real‑Time Open‑Domain Text‑to‑Image Customization
    Paper Project Page 🤗 Hugging Face

  • [CVPR 2024] DisenDiff: Attention Calibration for Disentangled Text‑to‑Image Personalization
    Paper GitHub

  • [CVPR 2024] FreeCustom: Tuning‑Free Customized Image Generation for Multi‑Concept Composition
    Paper Project Page GitHub

  • [CVPR 2024] Personalized Residuals: for Concept‑Driven Text‑to‑Image Generation
    Paper

  • [CVPR 2024] Subject‑Agnostic Guidance: Improving Subject‑Driven Image Synthesis
    Paper

  • [CVPR 2024] JeDi: Joint‑Image Diffusion Models for Finetuning‑Free Personalized Text‑to‑Image Generation
    Paper

  • [CVPR 2024] Influence Watermarks: Countering Personalized Text‑to‑Image Generation
    Paper

  • [CVPR 2024] PIA: Your Personalized Image Animator via Plug‑and‑Play Modules in Text‑to‑Image Models
    Paper Project Page GitHub

  • [CVPR 2024] SSR‑Encoder: Encoding Selective Subject Representation for Subject‑Driven Generation
    Paper GitHub

  • [ECCV 2024] Be Yourself: Bounded Attention for Multi‑Subject Text‑to‑Image Generation
    Paper Project Page

  • [ECCV 2024] Powerful and Flexible: Personalized Text‑to‑Image Generation via Reinforcement Learning
    Paper GitHub

  • [ECCV 2024] TIGC: Tuning‑Free Image Customization with Image and Text Guidance
    Paper Project Page GitHub

  • [ECCV 2024] MasterWeaver: Taming Editability and Face Identity for Personalized Text‑to‑Image Generation
    Paper Project Page GitHub

  • [NeurIPS 2024] RectifID: Personalizing Rectified Flow with Anchored Classifier Guidance
    Paper GitHub

  • [NeurIPS 2024] AttnDreamBooth: Towards Text‑Aligned Personalized Image Generation
    Paper Project Page GitHub

  • [AAAI 2024] Decoupled Textual Embeddings: for Customized Image Generation
    Paper

💡 Pre-Print Papers

✨ 2023

✅ Published Papers

  • [CVPR 2023] Custom Diffusion: Multi-Concept Customization of Text-to-Image Diffusion
    Paper Project Page GitHub Hugging Face

  • [CVPR 2023] DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
    Paper Project Page GitHub Hugging Face

  • [ICCV 2023] ELITE: Encoding Visual Concepts into Textual Embeddings for Customized Text-to-Image Generation
    Paper Project Page GitHub Hugging Face

  • [ICLR 2023] Textual Inversion: An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
    Paper Project Page GitHub Hugging Face

  • [SIGGRAPH Asia 2023] Break-A-Scene: Extracting Multiple Concepts from a Single Image
    Paper Project Page GitHub

  • [SIGGRAPH 2023] Encoder‑Based Domain Tuning: Encoder‑Based Domain Tuning for Fast Personalization of Text‑to‑Image Models
    Paper Project Page GitHub

  • [SIGGRAPH 2023] LayerDiffusion: Layered Controlled Image Editing with Diffusion Models
    Paper Project Page GitHub Hugging Face

💡 Pre-Print Papers

⇧ Back to ToC

✂️ Image Editing

✨ 2025

✅ Published Papers

  • [CVPR 2025] FDS: Frequency‑Aware Denoising Score for Text‑Guided Latent Diffusion Image Editing
    Paper Project Page GitHub

  • [CVPR 2025] Reference‑Based 3D‑Aware Image Editing with Triplanes
    Paper Project Page GitHub

  • [CVPR 2025] MoEdit: On Learning Quantity Perception for Multi‑object Image Editing
    Paper

  • [ICLR 2025] Lightning‑Fast Image Inversion and Editing for Text‑to‑Image Diffusion Models
    Paper Project Page GitHub

  • [ICLR 2025] Multi‑Reward as Condition for Instruction‑based Image Editing
    Paper GitHub

  • [ICLR 2025] HQ‑Edit: A High‑Quality Dataset for Instruction‑based Image Editing
    Paper Project Page GitHub Hugging Face

  • [ICLR 2025] CLIPDrag: Combining Text‑based and Drag‑based Instructions for Image Editing
    Paper GitHub

  • [ICLR 2025] Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations
    Paper Project Page GitHub

  • [ICLR 2025] PostEdit: Posterior Sampling for Efficient Zero‑Shot Image Editing
    Paper GitHub

  • [ICLR 2025] OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision
    Paper Project Page GitHub Hugging Face

💡 Pre-Print Papers

✨ 2024

✅ Published Papers

  • [CVPR 2024] InfEdit: Inversion‑Free Image Editing with Natural Language
    Paper Project Page GitHub

  • [CVPR 2024] CrossSelfAttention: Towards Understanding Cross and Self‑Attention in Stable Diffusion for Text‑Guided Image Editing
    Paper GitHub

  • [CVPR 2024] DAC: Doubly Abductive Counterfactual Inference for Text‑based Image Editing
    Paper GitHub

  • [CVPR 2024] FoI: Focus on Your Instruction: Fine‑grained and Multi‑instruction Image Editing by Attention Modulation
    Paper GitHub

  • [CVPR 2024] CDS: Contrastive Denoising Score for Text‑guided Latent Diffusion Image Editing
    Paper Project Page

  • [CVPR 2024] DragDiffusion: Harnessing Diffusion Models for Interactive Point‑based Image Editing
    Paper Project Page GitHub

  • [CVPR 2024] DiffEditor: Boosting Accuracy and Flexibility on Diffusion‑based Image Editing
    Paper GitHub

  • [CVPR 2024] FreeDrag: Feature Dragging for Reliable Point‑based Image Editing
    Paper GitHub

  • [CVPR 2024] Learnable Regions: Text‑Driven Image Editing via Learnable Regions
    Paper Project Page GitHub

  • [CVPR 2024] LEDITS++: Limitless Image Editing using Text‑to‑Image Models
    Paper Project Page GitHub Hugging Face

  • [CVPR 2024] SmartEdit: Exploring Complex Instruction‑based Image Editing with Large Language Models
    Paper Project Page GitHub

  • [CVPR 2024] Edit One for All: Interactive Batch Image Editing
    Paper Project Page GitHub

  • [CVPR 2024] DiffMorpher: Unleashing the Capability of Diffusion Models for Image Morphing
    Paper Project Page GitHub

  • [CVPR 2024] TiNO‑Edit: Timestep and Noise Optimization for Robust Diffusion‑Based Image Editing
    Paper GitHub

  • [CVPR 2024] Person in Place: Generating Associative Skeleton‑Guidance Maps for Human‑Object Interaction Image Editing
    Paper Project Page GitHub

  • [CVPR 2024] Referring Image Editing: Object‑level Image Editing via Referring Expressions
    Paper

  • [CVPR 2024] Prompt Augmentation: Prompt Augmentation for Self‑supervised Text‑guided Image Manipulation
    Paper

  • [CVPR 2024] StyleFeatureEditor: The Devil is in the Details — StyleFeatureEditor for Detail‑Rich StyleGAN Inversion and High Quality Image Editing
    Paper GitHub

  • [ECCV 2024] RegionDrag: Fast Region‑Based Image Editing with Diffusion Models
    Paper Project Page GitHub

  • [ECCV 2024] TurboEdit: Instant Text‑Based Image Editing
    Paper Project Page

  • [ECCV 2024] InstructGIE: Towards Generalizable Image Editing
    Paper Project Page

  • [ECCV 2024] StableDrag: Stable Dragging for Point‑based Image Editing
    Paper Project Page

  • [ECCV 2024] Eta Inversion: Designing an Optimal Eta Function for Diffusion‑based Real Image Editing
    Paper GitHub

  • [ECCV 2024] SwapAnything: Enabling Arbitrary Object Swapping in Personalized Image Editing
    Paper Project Page GitHub

  • [ECCV 2024] Guide‑and‑Rescale: Self‑Guidance Mechanism for Effective Tuning‑Free Real Image Editing
    Paper GitHub

  • [ECCV 2024] FreeDiff: Progressive Frequency Truncation for Image Editing with Diffusion Models
    Paper GitHub

  • [ECCV 2024] Lazy Diffusion Transformer: Lazy Diffusion Transformer for Interactive Image Editing
    Paper Project Page

  • [ECCV 2024] ByteEdit: Boost, Comply and Accelerate Generative Image Editing
    Paper Project Page

  • [ICLR 2024] MGIE: Guiding Instruction‑based Image Editing via Multimodal Large Language Models
    Paper Project Page GitHub

  • [ICLR 2024] SDE‑Drag: The Blessing of Randomness — SDE Beats ODE in General Diffusion‑based Image Editing
    Paper Project Page GitHub

  • [ICLR 2024] Motion Guidance: Diffusion‑Based Image Editing with Differentiable Motion Estimators
    Paper Project Page GitHub

  • [ICLR 2024] OIR: Object‑Aware Inversion and Reassembly for Image Editing
    Paper Project Page GitHub

  • [ICLR 2024] Noise Map Guidance: Inversion with Spatial Context for Real Image Editing
    Paper

  • [AAAI 2024] TIC: Tuning‑Free Inversion‑Enhanced Control for Consistent Image Editing
    Paper

  • [AAAI 2024] BARET: Balanced Attention based Real Image Editing driven by Target‑text Inversion
    Paper

  • [AAAI 2024] CacheEdit: Accelerating Text‑to‑Image Editing via Cache‑Enabled Sparse Diffusion Inference
    Paper

  • [AAAI 2024] High‑Fidelity Editing: High‑Fidelity Diffusion‑based Image Editing
    Paper

  • [AAAI 2024] AdapEdit: Spatio‑Temporal Guided Adaptive Editing Algorithm for Text‑Based Continuity‑Sensitive Image Editing
    Paper GitHub

  • [AAAI 2024] TexFit: Text‑Driven Fashion Image Editing with Diffusion Models
    Paper

💡 Pre-Print Papers

✨ 2023

✅ Published Papers

  • [CVPR 2023] Diffusion Disentanglement: Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models
    Paper Project Page GitHub

  • [CVPR 2023] SINE: SINgle Image Editing with Text-to-Image Diffusion Models
    Paper Project Page GitHub

  • [CVPR 2023] Imagic: Text-Based Real Image Editing with Diffusion Models
    Paper Project Page Hugging Face

  • [CVPR 2023] InstructPix2Pix: Learning to Follow Image Editing Instructions
    Paper Project Page GitHub Hugging Face

  • [CVPR 2023] Null-text Inversion: Null-text Inversion for Editing Real Images using Guided Diffusion Models
    Paper Project Page GitHub

  • [ICCV 2023] MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing
    Paper Project Page GitHub

  • [ICCV 2023] Local Prompt Mixing: Localizing Object-level Shape Variations with Text-to-Image Diffusion Models
    Paper Project Page GitHub Hugging Face

  • [ICLR 2022] SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations
    Paper Project Page GitHub

💡 Pre-Print Papers

⇧ Back to ToC

🔄 Unified Generation and Understanding

✨ 2025

✅ Published Papers

  • [CVPR 2025] OmniFlow: Any‑to‑Any Generation with Multi‑Modal Rectified Flows
    Paper GitHub

  • [CVPR 2025] TokenFlow: Unified image tokenizer for multimodal understanding and generation
    Paper Project Page GitHub Hugging Face

  • [CVPR 2025] UNIC‑Adapter: Unified Image‑instruction Adapter with Multi‑modal Transformer for Image Generation
    Paper Project Page GitHub Hugging Face

  • [CVPR 2025] MergeVQ: A Unified Framework for Visual Generation and Representation with Token Merging and Quantization
    Paper Project Page GitHub Hugging Face

  • [ICLR 2025] Show‑o: One Single Transformer to Unify Multimodal Understanding and Generation
    Paper GitHub Project Page

  • [ICLR 2025] Transfusion: Predict the Next Token and Diffuse Images with One Multi‑Modal Model
    Paper GitHub

  • [CVPRW 2025] UniToken: Harmonizing Multimodal Understanding and Generation through Unified Visual Encoding
    Paper GitHub

💡 Pre-Print Papers

✨ 2024

✅ Published Papers

  • [CVPR 2024] TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation
    Paper GitHub

  • [CVPR 2024] Unified‑IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio and Action
    Paper GitHub

  • [CVPR 2024] Emu2: Generative Multimodal Models are In‑Context Learners
    Paper Project Page GitHub Hugging Face

  • [ICLR 2024] LWM: World Model on Million‑Length Video And Language With Blockwise RingAttention
    Paper Project Page GitHub Hugging Face

  • [ICLR 2024] VILA‑U: a Unified Foundation Model Integrating Visual Understanding and Generation
    Paper Project Page GitHub Hugging Face

  • [ICLR 2024] DreamLLM: Synergistic Multimodal Comprehension and Creation
    Paper Project Page GitHub Hugging Face

  • [ICLR 2024] LaVIT: Unified Language‑Vision Pretraining in LLM with Dynamic Discrete Visual Tokenization
    Paper GitHub Hugging Face

  • [ICLR 2024] Emu: Generative Pretraining in Multimodality
    Paper GitHub Project Page Hugging Face

  • [ICLR 2024] SEED‑LLaMA: Making LLaMA SEE and Draw with SEED Tokenizer
    Paper GitHub Project Page Hugging Face

  • [ICML 2024] Video‑LaVIT: Unified Video‑Language Pre‑training with Decoupled Visual‑Motional Tokenization
    Paper GitHub Project Page Hugging Face

💡 Pre-Print Papers


🗂️ Datasets

Dataset Name Year Modalities Task Paper Link
Oxford-120 Flowers 2008 Text, Image Text-to-Image Generation Paper Website
CUB-200-2011 2011 Text, Image Text-to-Image Generation Paper Website
MS COCO 2014 Text, Image Text-to-Image Generation Paper Website
LAION-5B 2022 Text, Image Text-to-Image Generation Paper Website
DiffusionDB 2022 Text, Image Text-to-Image Generation Paper Website
T2I‑FactualBench 2024 Text, Image Text-to-Image Generation Paper Website
EvalMuse‑40K 2024 Text, Image, Rating Text-to-Image Generation Paper Website
T2I‑CompBench++ 2025 Text Text-to-Image Generation Paper Website
Gecko Evaluation 2025 Text, Image Text-to-Image Generation Paper Website
T2I‑ReasonBench 2025 Text, Image Text-to-Image Generation Paper Website
ImageNet 2009 Image, Class Label Class-Conditional Generation Paper Website
CIFAR-10 2009 Image, Class Label Class-Conditional Generation Paper Website
LSUN 2015 Image, Class/Scene Label Class-Conditional Generation Paper Website
7Bench 2025 Text, Image, Bounding Box Conditional Image Generation Paper Website
EditInspector 2024 Text, Image, Human-Annotated Brush Conditional Image Generation, Text-to-Image Generation Paper Website
Cityscapes 2016 Image, Segmentation Map Conditional Image Generation (Segmentation-based) Paper Website
ADE20K 2017 Image, Segmentation Map Conditional Image Generation (Segmentation-based) Paper Website
COCO-Stuff 2017 Image, Segmentation Map Conditional Image Generation (Segmentation-based) Paper Website
EditVal 2023 Text, Image Image Editing Paper Website
MagicBrush 2023 Text, Image Image Editing Paper Website
ImgEdit 2025 Text, Image Image Editing Paper Website
Six‑CD 2025 Text, Image Image Editing Paper Website
LMM4Edit (EBench‑18K) 2025 Text, Question-Answer Pair, Image Image Editing Paper Website
InstructPix2Pix Dataset 2022 Text, Image Image Editing (Instruction-Based) Paper Website
HIVE 2024 Text, Image, Human Feedback Image Editing (Instruction-Based) Paper Website
HQ-Edit 2024 Text, Image Image Editing (Instruction-Based) Paper Website
AnyEdit 2025 Text, Image Image Editing (Instruction-Based) Paper Website
HQ‑Edit 2025 Text, Image Image Editing (Instruction-Based) Paper Website
OmniEdit 2025 Text, Image Image Editing (Instruction-Based) Paper Website
VectorEdits 2025 Text, SVG Image Image Editing (Instruction-Based) Paper Website
ComplexBench‑Edit 2025 Text (Multi-Step Instruction), Image Image Editing (Instruction-Based) Paper Website
GPT‑IMAGE‑EDIT‑1.5M 2025 Text, Image Image Editing (Instruction-Based) Paper Website
CustomConcept-101 2022 Text, Image Personalized Image Generation (Multi-Subject) Paper Website
DreamEditBench 2023 Text, Image Personalized Image Generation Paper Website
DreamBench++ 2024 Text, Image Personalized Image Generation Paper Website

⇧ Back to ToC


🎓 About Us

QuenithAI is a professional organization composed of top researchers, dedicated to providing high-quality 1-on-1 research mentoring for university students worldwide. Our mission is to help students bridge the gap from theoretical knowledge to cutting-edge research and publish their work in top-tier conferences and journals.

Maintaining this Awesome Text-to-Image Generation list requires significant effort, just as completing a high-quality paper requires focused dedication and expert guidance. If you're looking for one-on-one support from top scholars on your own research project, to quickly identify innovative ideas and make publications, we invite you to contact us ASAP.

➡️ Contact us via WeChat or E-mail to start your research journey.


「应达学术」(QuenithAI) 是一家由顶尖研究者组成,致力于为全球高校学生提供高质量1V1科研辅导的专业机构。我们的使命是帮助学生培养出色卓越的科研技能,在顶级会议和期刊上发表自己的成果。

维护一个GitHub调研仓库需要巨大的精力,正如完成一篇高质量的论文一样,离不开专注的投入和专业的指导。如果您希望在自己的研究项目中,获得来自顶尖学者的一对一支持,我们诚邀您与我们取得联系。

➡️ 欢迎通过 微信邮件 联系我们,开启您的科研之旅。

⇧ Back to ToC


🤝 Contributing

Contributions are welcome! Please see our Contribution Guidelines for details on how to add new papers, correct information, or improve the repository.

About

Tracking the latest and greatest research papers on text-to-image generation.

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published