Skip to content

Designed and trained a customized CycleGAN architecture with integrated perceptual loss to produce Monet-style artistic transformations.

Notifications You must be signed in to change notification settings

igoldshm/MonetGAN-model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

72 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MonetGAN - Monet-Style Painting Generation

This repository contains an implementation of a Generative Adversarial Network (GAN) aimed at transforming photos into Monet-style paintings. The model was developed for the MonetGAN challenge on Kaggle. The goal of the challenge was to use a GAN to take input photos and generate Monet-style paintings. The challenge serves as an excellent introduction to GANs and image-to-image translation tasks.

Real Photo Generated Monet Painting
Input smaple image from domain A Identity lambda = 0.5, Cycle_lambda = 10, adversarial_lambda = 1

Installation

  • Python 3.x
  • PyTorch 1.x
  • torchvision
  • tqdm
  • matplotlib
  • Pillow

To install the dependencies, you can create a virtual environment and install the requirements using the following command:

pip install -r requirements.txt

Clone the Repository

git clone https://github.com/igoldshm/MonetGAN

Challenges

We trained our model using the Monet/Photo dataset from Kaggle. During preprocessing, we observed a significant imbalance between the number of real-world photos (7,028) and Monet-style paintings (300). This imbalance can affect the performance of the model by making the photo discriminator disproportionately strong (as it being trained on more samples). As a result, the training of the Monet-to-photo generator may suffer, and the effectiveness of the cycle consistency loss can be reduced, since the network struggles to maintain a balanced bidirectional mapping.

Solution

To address the dataset imbalance, we used a balanced sampling strategy that ensures each training batch contains an equal number of photos and Monet-style images. We paired the two datasets and configured the dataloader to retrieve one photo–painting pair per iteration (real_A, real_B).

Model

In this project we used CycleGAN as our base architecture.

Generator implementation

In CycleGAN, there are two generators: one that converts real photos into fake Monet-style paintings (photo → Monet), and another that performs the reverse (Monet → photo). Both generators have the same architecture, consisting of an encoder and a decoder. Between them, after the encoder downsamples the input image into a lower-resolution feature map, there are 9 ResNet blocks that refine this feature map. These blocks help preserve spatial information while transforming the style and content, preparing it for reconstruction by the decoder.

Discriminator implementation

We build a discriminator architecture based on CycleGAN paper.

Loss calculation

Generator loss

The generator loss is the sum of four different losses:

  • Adversarial loss - the aim of the generator is to fool the discriminator -> pred(fake monet)=1 (MSELoss)
  • Identity loss - Monet → Monet should remain unchanged (L1Loss)
  • Cycle Consistency Loss - Loss for reconstructing photo from fake Monet. Real → Monet → Real (L1Loss)
  • Perceptual Loss – Developed a customized CycleGAN model integrated with a perceptual loss function based on a pretrained VGG19 architecture, enhancing the model’s ability to preserve high-level features during style transfer (such as brush strokes), in contrast to the lower-level pixel comparisons performed by identity, cycle-consistency, and adversarial losses.
LAMBDA optimization
  • We have been expirimenting with different values for the weights of each loss (LAMBDA) to get the best visual outcome. See visual quality experimentation and Quantitive evaluation.

Discriminator loss

The discriminator loss is the adversarial loss - pred(fake monet)=0 (MSELoss)

Training

  • Generator optimizer: Adam
  • Discriminator optimizer: Adam
  • Learning rate: 0.0002 (learining rate decay after 100 epochs)
  • Epochs = 200
  • LAMBDA_CYCLE = 1 (Weight for cycle consistency loss)
  • LAMBDA_IDENTITY = 1 (Weight for identity loss)

Results

Loss function tunning

Identity loss weight (lambda)

In our project, we experimented with different identity loss lambda values to find the optimal setting for the best visual results. We observed that increasing the lambda value caused the generated samples to resemble the original images from domain A more closely, while decreasing it produced outputs that were more stylized and resembled Monet paintings (domain B). This behavior aligned with the general assumption about the role of the identity loss function, to prevent over-stylizing an input image if it is already in the correct target domain.

Results Preview

Real Photo Identity lambda = 0.5
Input smaple image from domain A Identity lambda = 0.5, Cycle_lambda = 10, adversarial_lambda = 1
Identity lambda = 1.5 Identity lambda = 4.5
Identity lambda = 1.5, Cycle_lambda = 10, adversarial_lambda = 1 Identity lambda = 4.5, Cycle_lambda = 10, adversarial_lambda = 1

⚠️ Low Identity Lambda Warning

Setting identity_lambda = 0.5 is insufficient to preserve structural details during training. This results in image degradation, visible as black blobs or structure collapse (see white arrows in the right top image, lambda = 0.5). The generator ignores the original image's structure or key features and replaces those areas with "safe" pixels that can fool the discriminator more easily.

Conclusion

After testing various values, we found that the best visual results were achieved when the identity loss weight, lambda_identity, was set to 1.5.

License

This project is licensed under the MIT License.

Acknowledgments

  • The CycleGAN model is based on the paper Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks (https://arxiv.org/abs/1703.10593).
  • Special thanks to the contributors of the PyTorch library and other open-source tools that made this project possible.
  • Thanks to the MonetGAN challenge on Kaggle for providing an engaging way to learn and experiment with GANs

About

Designed and trained a customized CycleGAN architecture with integrated perceptual loss to produce Monet-style artistic transformations.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published