A high-performance PyTorch implementation for character-level text generation using LSTM networks, optimized for GPU training with mixed precision and large batch sizes.
This project implements an optimized LSTM-based character-level text generator trained on Shakespeare's works. The model achieves 55.2% accuracy with 3.5x faster training through advanced GPU optimizations and mixed precision training.
- 🚀 3.5x faster training with mixed precision (AMP)
- 🎭 High-quality Shakespeare-like text generation
- ⚡ 5.8M parameter model with optimized architecture
- 🔧 GPU memory optimization for large batch sizes
- 📊 Comprehensive evaluation metrics and analysis
- Features
- Performance
- Architecture
- Project Structure
- Quick Start
- Command Line Options
- Optimizations
- Training Results
- Generated Text Examples
- System Requirements
- Notes
- Future Improvements
- Mixed Precision Training (AMP) - 2x faster training
- Large Batch Sizes - 512 samples per batch for better GPU utilization
- Optimized Model Architecture - 5.8M parameters with 3 LSTM layers
- GPU Memory Optimization - Efficient memory management
- High-Quality Text Generation - Generates Shakespeare-like text
- Training Speed: 6.4 samples/sec (3.5x faster than original)
- Model Size: 5.8M parameters (20x larger than original)
- Accuracy: 55.2% (2.8% improvement)
- GPU Utilization: Optimized for RTX 4070 with 12GB VRAM
- Embedding Layer: 256 dimensions
- LSTM Layers: 3 layers with 512 hidden units each
- Dropout: 0.2 for regularization
- Output Layer: Dense layer with softmax activation
Q2/
├── main.py # Main execution script
├── requirements.txt # Dependencies
├── README.md # This file
├── shakespeare_data.pkl # Preprocessed dataset
├── shakespeare.txt # Raw Shakespeare text
├── src/
│ ├── data_loader.py # Data loading and preprocessing
│ └── trainer.py # Optimized training utilities
├── models/
│ ├── rnn_model.py # Optimized LSTM model
│ └── shakespeare_rnn_optimized_optimized.pth # Trained model
└── plots/
└── shakespeare_rnn_optimized_optimized_training_history.png
# Activate virtual environment
source "/home/umer-farooq/Desktop/Uni/Gen AI/Assignment 1/genai_env_linux/bin/activate"
# Install dependencies
pip install torch numpy matplotlib scikit-learn
# Train with default settings (10 epochs, batch size 512)
python3 main.py --mode train
# Train with custom settings
python3 main.py --mode train --epochs 20 --batch-size 1024 --learning-rate 0.001
# Generate text with trained model
python3 main.py --mode generate --max-chars 1000 --temperature 0.8
# Generate with custom seed phrase
python3 main.py --mode generate --seed-phrase "Once upon a time" --max-chars 500
# Run complete pipeline (preprocess + train + generate)
python3 main.py --mode full --epochs 5 --batch-size 512
Option | Default | Description |
---|---|---|
--mode |
full |
Execution mode: preprocess , train , generate , or full |
--epochs |
10 |
Number of training epochs |
--batch-size |
512 |
Batch size for training |
--learning-rate |
0.001 |
Learning rate for optimizer |
--hidden-size |
512 |
Hidden size for LSTM layers |
--num-layers |
3 |
Number of LSTM layers |
--model-name |
shakespeare_rnn_optimized |
Name for saving the model |
--seed-phrase |
"To be or not to be" |
Seed phrase for text generation |
--max-chars |
1000 |
Maximum characters to generate |
--temperature |
0.8 |
Temperature for text generation |
-
Mixed Precision Training (AMP)
- Uses 16-bit precision for 2x speed boost
- Maintains 32-bit precision for accuracy
- Automatic loss scaling
-
Large Batch Sizes
- 512 samples per batch (vs 64 in original)
- Better GPU utilization
- More stable gradients
-
Optimized Model Architecture
- Larger embedding dimensions (256 vs 128)
- More LSTM layers (3 vs 2)
- Larger hidden dimensions (512 vs 128)
- Better weight initialization
-
GPU Memory Optimization
- Data moved to GPU once at start
- No CPU-GPU transfers during training
- Efficient memory management
After 3 epochs of training:
- Training Loss: 1.3799
- Validation Loss: 1.5663
- Accuracy: 55.2%
- Training Time: ~4 minutes
- GPU Memory Usage: 1.9GB
The model generates high-quality Shakespeare-like text with:
- Proper character names (ANGELO, ISABELLA, LUCIO)
- Dramatic dialogue structure
- Coherent sentence flow
- Shakespearean vocabulary and style
- GPU: NVIDIA RTX 4070 or better (12GB+ VRAM recommended)
- CUDA: Version 12.8+
- PyTorch: Version 2.8.0+
- Python: 3.12+
- RAM: 8GB+ recommended
- The model uses character-level tokenization (65 unique characters)
- Training data: Tiny Shakespeare dataset (1.1M characters)
- Model saves automatically after training
- Training plots are generated and saved to
plots/
directory - Generated text quality improves with more training epochs
- Increase batch size to 1024-2048 for even better GPU utilization
- Implement learning rate scheduling
- Add gradient accumulation for very large batch sizes
- Use data parallelism for multi-GPU training
- Implement early stopping to prevent overfitting