This repository provides a real-time solution for automated pallet counting in operational warehouses using deep learning ensemble models. Our system predicts the total, CHEP, and EPAL pallet counts from RGB images through a multi-target regression approach with explainable AI.
Manual pallet counting is still widely used despite being error-prone and labor-intensive. We propose an ensemble of CNN backbones to replace this with a vision-based system that is:
- Accurate: MAE as low as 1.15 pallets
- Explainable: Grad-CAM for model transparency
Model | MAE (Total) | MAE (CHEP) | MAE (EPAL) | RΒ² |
---|---|---|---|---|
EfficientNet-B3 | 1.401 | 0.991 | 1.009 | 0.840 |
ResNet-50 | 1.918 | 1.682 | 1.475 | 0.729 |
ConvNeXt-Tiny | 1.696 | 0.721 | 1.426 | 0.702 |
Ensemble | 1.151 | 0.918 | 0.857 | 0.875 |
- Backbones: EfficientNet-B3, ResNet-50, ConvNeXt-Tiny
- Task: Multi-target regression for
[CHEP, EPAL]
with total = CHEP + EPAL - Loss: Smooth L1
- Explainability: Grad-CAM visualizations on final conv layers
- Framework: PyTorch
- Image size: 224Γ224
- Augmentations: Crop, flip, affine, color jitter, blur, compression, dropout
- Optimizer: AdamW with One-Cycle LR
A proprietary dataset of 130 real-world warehouse images with manual annotations. Due to privacy concerns, the dataset is not publicly available. Contact for potential access or collaboration.
βββ CNN-Pallet.ipynb
βββ dataset/
β βββ images/
β βββ labels.csv
βββ README.md
Grad-CAM visualizations show attention regions for each model:
- EfficientNet focuses on stack edges and shadows
- ResNet highlights CHEP markings and textures
- ConvNeXt captures broader stack areas with robustness to lighting
- Expand dataset across multiple facilities
- Integrate RGB-D or stereo vision for depth-aware counting
- Explore temporal modeling with video streams