A lightweight Python utility for estimating the computational complexity of PyTorch models. It hooks into a model's forward pass to count floating point operations (FLOPs), number of activations, memory usage, frames per second (FPS), and trainable parameters.
-
Name:
flopsmeter -
Language: Python 3.10+
-
Dependencies:
torch 2.2.1+(PyTorch)
This package helps deep learning practitioners quickly gauge the computational cost of their PyTorch models, aiding in model optimization, benchmarking, and resource planning.
-
FLOPs Estimation — Supports convolution, normalization, pooling, activation, and more.
-
Activation Count — Measures total activations produced in a forward pass.
-
Memory Usage — Estimates memory footprint (in MB) during training.
-
FPS (Frames per Second) — Benchmarks inference speed.
-
Trainable Parameters — Calculates total learnable weights.
-
Module Exclusion Alerts — Warns if unsupported layers are skipped.
The following PyTorch layers are currently supported by flopsmeter:
nn.Conv1d,nn.Conv2d,nn.Conv3dnn.ConvTranspose1d,nn.ConvTranspose2d,nn.ConvTranspose3dnn.LazyConv1d,nn.LazyConv2d,nn.LazyConv3dnn.LazyConvTranspose1d,nn.LazyConvTranspose2d,nn.LazyConvTranspose3d
nn.BatchNorm1d,nn.BatchNorm2d,nn.BatchNorm3dnn.LazyBatchNorm1d,nn.LazyBatchNorm2d,nn.LazyBatchNorm3dnn.SyncBatchNormnn.InstanceNorm1d,nn.InstanceNorm2d,nn.InstanceNorm3dnn.LazyInstanceNorm1d,nn.LazyInstanceNorm2d,nn.LazyInstanceNorm3dnn.GroupNorm,nn.LayerNorm,nn.LocalResponseNorm
nn.ELU,nn.ReLU,nn.ReLU6,nn.LeakyReLU,nn.PReLU,nn.RReLU,nn.GELU,nn.SELUnn.Tanh,nn.Tanhshrink,nn.Hardtanh,nn.Sigmoid,nn.LogSigmoid,nn.SiLU,nn.Mish,nn.Hardswishnn.Softplus,nn.Softshrink,nn.Softsign,nn.Hardsigmoid,nn.Hardshrink,nn.Thresholdnn.GLU,nn.Softmin,nn.Softmax,nn.Softmax2d,nn.LogSoftmax,nn.AdaptiveLogSoftmaxWithLoss
nn.MaxPool1d,nn.MaxPool2d,nn.MaxPool3dnn.AvgPool1d,nn.AvgPool2d,nn.AvgPool3dnn.FractionalMaxPool2d,nn.FractionalMaxPool3dnn.AdaptiveMaxPool1d,nn.AdaptiveMaxPool2d,nn.AdaptiveMaxPool3dnn.AdaptiveAvgPool1d,nn.AdaptiveAvgPool2d,nn.AdaptiveAvgPool3dnn.LPPool1d,nn.LPPool2d
nn.Linear,nn.LazyLinear,nn.Bilinear
nn.Dropout,nn.Dropout1d,nn.Dropout2d,nn.Dropout3dnn.AlphaDropout,nn.FeatureAlphaDropout
nn.Upsamplewithmode:nearest,linear,bilinear,bicubic,trilinearnn.UpsamplingNearest2d,nn.UpsamplingBilinear2d
nn.Identity,nn.Flatten,nn.PixelShuffle,nn.PixelUnshufflenn.ChannelShuffle,nn.ZeroPad*,nn.ConstantPad*,nn.ReflectionPad*,nn.ReplicationPad*,nn.CircularPad*
More layers may be supported in the future.
Note: Unsupported layers will be ignored during FLOPs calculation.
Install via pip:
pip install flopsmeter(Alternatively, copy the Complexity_Calculator class file into your project.)
import torch
import torch.nn as nn
from flopsmeter import Complexity_Calculator
# Example: A Simple CNN Model
class SimpleCNN(nn.Module):
def __init__(self):
super().__init__()
self.conv = nn.Conv2d(3, 16, kernel_size = 3)
self.bn = nn.BatchNorm2d(16)
self.relu = nn.ReLU()
def forward(self, x):
x = self.relu(self.bn(self.conv(x)))
return x
# Initialize calculator with dummy input shape (C, H, W)
calculator = Complexity_Calculator(model = SimpleCNN(), dummy = (3, 224, 224), device = torch.device('cuda'))
# Print Complexity Report
calculator.log(order = 'G', num_input = 1, batch_size = 16)-
model (
torch.nn.Module): Your PyTorch model. -
dummy (
tuple[int]): Input tensor shape for a single sample. For 2D input:(C, H, W); for 3D:(D, C, H, W); for 1D:(L, D). -
device (
torch.device, optional): Computation device ('cpu'or'cuda'). Defaults to CPU.
Generate and print a detailed report:
-
order (
Literal['G','M','k']): Scale for FLOPs (Giga,Mega,kilo). -
num_input (
int): How many inputs to simulate concurrently (for multi-input models). -
batch_size (
int): Size of the input batch used to estimate memory.
Result Log:
-----------------------------------------------------------------------------------------------
G FLOPs | G FLOPS | M Acts | FPS | Memory (MB) | Params
-----------------------------------------------------------------------------------------------
1.397 | 109.197 | 67.19 | 78.176 | 8,201 | 88,591,464
-
FLOPs: Floating Point Operations — the total number of mathematical operations performed during a single forward pass.
-
FLOPS: Floating Point Operations Per Second — how many FLOPs the model can process per second (a measure of speed).
-
Acts: Total number of elements in all intermediate feature maps produced during a forward pass. This roughly indicates how much data the model processes internally and helps estimate memory usage and training cost time.
-
FPS: Frames Per Second — how many input samples the model can process per second during inference.
-
Memory (MB): Estimated GPU memory usage during training, based on the number of activations.
-
Params: Total number of trainable parameters in the model.
Warning Log:
A warning will be printed if any modules are skipped in FLOPs estimation. For example:
***********************************************************************************************
Warning !! Above Estimations Ignore Following Modules !! The FLOPs Would be Underestimated !!
***********************************************************************************************
{'StochasticDepth', 'Permute'}
A warning block prints any unsupported modules that were excluded from FLOPs calculation.
-
Hook Registration: Recursively attaches forward hooks to all submodules.
-
FLOPs Computation: Implements formulas for convolutions, normalization, pooling, activations, etc.
-
Warm-up & Timing: Runs 100 warm-up passes, then times 100 forward passes for stable metrics.
-
Memory Estimation: Based on activation count and tensor element size.
-
This tool is currently focused on CNN-based models for computer vision. Transformer-based models (e.g., Vision Transformers, Swin Transformers) are not yet supported in FLOPs estimation.
-
Unsupported modules are recorded in
exclude—you may need to extend formulas for custom layers. -
Memory estimation is rough and assumes no activation checkpointing or optimizer states.
MIT License. Feel free to modify and distribute.