This project is a complete implementation of Neural Style Transfer (NST) using a pretrained transformer network, enabling both image and video stylization using a fast, real-time feedforward model.
It includes:
- Style transfer for single images and videos
- A Streamlit-based GUI for a user-friendly experience
- CLI-based scripted support for training and inference pipelines
- Configurable settings like image width, temporal smoothing, and batch processing
- Downloaders for pretrained models and training datasets
Feature | Description |
---|---|
Image NST | Upload or choose images, apply artistic styles using a fast transformer net |
Video NST | Upload or choose videos, with optional temporal smoothing |
Streamlit UI | Intuitive web UI for both image and video stylization |
CLI Support | Script-based style transfer using configurable arguments |
Custom Model Training | Train your own models using MS-COCO or any dataset |
.
βββ app.py # Streamlit GUI application
βββ image_nst_script.py # Script for stylizing images
βββ video_nst_script.py # Script for stylizing videos
βββ model_training_script.py # Model training entrypoint
βββ models/
β βββ definitions/
β β βββ transformer_net.py # Transformer feedforward network
β β βββ perceptual_loss_net.py # VGG16-based perceptual loss extractor
β βββ binaries/ # Pretrained .pth models
βββ utils/
β βββ utils.py # Shared preprocessing, postprocessing, I/O, and dataset utils
β βββ app_utils.py # Utility helpers for Streamlit app
β βββ pretrained_models_downloader.py # Script to download pre-trained style models
β βββ training_dataset_downloader.py # Script to download and extract COCO dataset
βββ data/
βββ input/ # Input images and videos
βββ styles/ # Styling base images
βββ output/ # Stylized results
git clone https://github.com/your-username/neural-style-transfer.git
cd neural-style-transfer
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
This must be run before using the GUI or CLI to stylize:
python utils/pretrained_models_downloader.py
This will download pretrained .pth
files and place them in models/binaries/
.
To train your own style model, download the MS-COCO dataset:
python utils/training_dataset_downloader.py
This downloads and extracts the COCO dataset under data/train/
.
streamlit run app.py
- Image Tab: Upload or select an image, choose a model, apply style, and download result.
- Video Tab: Upload or select a video, choose a model, optionally tune smoothing, and download result.
Train your own model with a content-style dataset:
python model_training_script.py --dataset_path ./data/train --style_image ./styles/starry_night.jpg --epochs 2 --batch_size 4 --style_weight 5e5 --content_weight 1e0
python image_nst_script.py --content_input lion.jpg --model_name mosaic.pth --img_width 512
python video_nst_script.py --input_video sample.mp4 --model_name mosaic.pth --img_width 500 --smoothing_alpha 0.3
- Streamlit GUI with two tabs: Image and Video
- For image:
- Uses
stylize_static_image(config, return_pil=True)
and shows original + styled image
- Uses
- For video:
- Uses
stylize_video(config)
and applies frame-wise style with smoothing
- Uses
- Defines
stylize_static_image(config, return_pil=False)
- Loads model, processes either:
- A single image (returns PIL optionally)
- A directory (batch image processing)
- Frame-by-frame video processing using OpenCV
- Applies style using
TransformerNet
- Uses
cv2.addWeighted()
if smoothing is enabled - Saves stylized video
- Loads COCO dataset and chosen style image
- Computes perceptual loss using VGG
- Optimizes
TransformerNet
- Supports live TensorBoard logs
- Feedforward CNN
- Structure:
- Conv β IN β ReLU
- 5 Residual Blocks
- Upsample + Conv + IN + ReLU
- Outputs stylized image in one pass
- Loads pretrained VGG16 from torchvision
- Extracts intermediate features (e.g., relu1_2, relu2_2, relu3_3) for loss computation
Core helpers:
prepare_img(path, width, device)
β tensorpost_process_image(tensor)
β RGB imagesave_and_maybe_display_image(config, img)
β save logicSimpleDataset
β supports batch image processingframe_to_tensor()
andtensor_to_frame()
for video
pil_to_bytes(pil_image)
β converts PIL object for Streamlit download
- Downloads multiple pretrained
.pth
style models from known URLs - Saves them into
models/binaries/
- Mandatory before GUI or scripts can be run
- Downloads and unzips MS-COCO dataset
- Extracts
train2014.zip
intodata/train/train2014/
Model File | Style |
---|---|
vg_starry_night.pth |
Vincent van Goghβs Starry Night |
candy.pth |
Bright pastel stroke style |
Place these inside:
models/binaries/
Input Image | Style | Output |
---|---|---|
Uploaded image | Starry Night | Stylized version |
Instead of running the full app immediately, you can explore the project using the interactive Jupyter notebooks:
General_NST_Notebook.ipynb
: explains and implements Johnson's Fast Neural Style Transfer using PyTorchImage_NST_Notebook.ipynb
: demonstrates neural style transfer on images using a feedforward Transformer networkVideo_NST_Notebook.ipynb
: applies a feedforward neural style transfer model to a videoNST_Model_Training_Notebook.ipynb
: demonstrates how to train a Transformer network for fast neural style transfer
- Add batch image GUI support
- Utlize Temporal Aware Networks instead of the current FastFeed Model for video stylization
- Add Semantic Segmentation feature for videos
This project is licensed under the MIT License.
- Based on Perceptual Losses for Real-Time Style Transfer
- Uses
torchvision.models.vgg16
for perceptual loss - Portions of code and implementation adapted and inspired by Aleksa GordiΔ from his excellent repository:
gordicaleksa/pytorch-neural-style-transfer-johnson
Chaitanya Malani
Email: contact@chaitanymalani.com