A web application that uses Google's Gemini 2.0 Flash experimental image generation model to create images based on text prompts and reference images.
- Generate multiple images with a single text prompt
- Optionally include reference images to guide the style and content
- Run multiple generations in parallel for variations
- All generated images are automatically saved with timestamped filenames
The project is organized as follows:
├── app.py # Main application entry point
├── utils/ # Utility modules
│ ├── __init__.py # Package initialization
│ ├── async_utils.py # Async operations utilities
│ ├── file_utils.py # File and directory utilities
│ ├── gemini_utils.py # Gemini API interaction utilities
│ ├── logging_utils.py # Logging utilities
│ └── ui_utils.py # Gradio UI utilities
├── outputs/ # Generated images output directory
├── requirements.txt # Project dependencies
└── .env # Environment variables (API keys)
- Create a
.env
file with your Gemini API key:
GEMINI_API_KEY=your_api_key_here
- Install the required packages:
pip install -r requirements.txt
- Run the application:
python app.py
For custom number of reference image inputs:
python app.py --num-images 6
To create a shareable link:
python app.py --share
- Enter a detailed text prompt describing the image you want to generate
- (Optional) Upload one or more reference images to guide the generation
- Set the number of parallel generations (1-8)
- Click "Generate Images"
- All images will be saved in the "outputs" directory
- Enter your prompt text describing the image you want to generate
- Optionally upload up to 4 reference images to guide the style or content
- Adjust the slider for how many parallel generations you want (1-8)
- Click "Generate Images" to start the process
- The generated images will appear below the input form in a grid layout
The app uses Gemini 2.0 Flash experimental image generation model and can generate up to 8 variations simultaneously.
The app supports batch processing through the "Number of parallel generations" slider:
- Select a value between 1 and 8 to generate that many versions of your prompt simultaneously
- All images are generated in parallel using the same prompt and reference images
- Results are displayed in a grid layout with up to 4 images per row
- Text responses (if any) will be collected in the "Generation Details" section
This feature is useful for:
- Exploring different variations of the same prompt
- Finding the best result among several options
- Saving time compared to generating images one by one
The app supports several command-line arguments:
--num-images N
: Change the number of reference image inputs (default: 4)--share
: Create a shareable public link (useful for demos)
Example:
# Run with 2 reference image inputs
python app.py --num-images 2
# Run with 6 reference image inputs
python app.py --num-images 6
# Create a public shareable link
python app.py --share
The app includes several example prompts to get you started. Click on any example to load it into the prompt field.
- The Gemini 2.0 model has a maximum context size, so very large images might cause errors
- Generation can take a while depending on server load and how many parallel runs you request
- Running 8 generations simultaneously may be resource-intensive depending on your system
- The model may occasionally return text responses instead of images for certain prompts