So what does does mobile blocks save
- Brief Intro
- Simple ConvNet Block
- MobileNet Block
- MobileNetv2 Block
- Comparison
- Doing all the above with torchvision classes
This blog assumes the reader have a some understanding of mobileNet.This is just a lazy illustration of how much different mobileNet block save. The actual papers have the real numbers. If you want to know about the model please go through the papers MobileNetV1 MobileNetV2. In this we will briefly see how much parameters and floating point operations are required by a normal convolution block, mobileNetV1 and mobileNetV2 for the same input size to produce the same output. We will use torchinfo library for getting the summaries.
#collapse-hide
import torch
import torch.nn as nn
from torchinfo import summary
import numpy as np
# we will use the same input and outputs for all the conv blocks and mobile blocks
input_filters = 64
output_filters = 128
input_size = (3,input_filters,224,224)
#collapse-hide
def printInputAndOutput(model,input_filters=64):
rand_tensor = torch.rand((3,input_filters,224,224))
out = model(rand_tensor)
print("Input shape = ", rand_tensor.shape)
print("Output shap =", out.shape)
simple_convBlock = nn.Sequential(nn.Conv2d(in_channels=input_filters,out_channels=output_filters,kernel_size=3,stride=2,
padding=1,bias=False),nn.BatchNorm2d(output_filters),
nn.ReLU(inplace=True))
printInputAndOutput(simple_convBlock)
summary(simple_convBlock,input_size=input_size,col_names=["kernel_size", "output_size", "num_params", "mult_adds"])
The main idea is to use depth wise convolution to reduce the parameters and floating point operations required. For more info please read the paper or watch this tutorial by Prof Maziar Raissi
mobileNetBlock = nn.Sequential(
#DEPTHWISE CONV
#we get the depthwise convolution by specifying groups same as in_channels
nn.Conv2d(in_channels=input_filters,out_channels=input_filters,kernel_size=3,
stride=2,padding=1,groups=input_filters,bias=False),
nn.BatchNorm2d(input_filters),
nn.ReLU(inplace=True),
#POINTWISE CONV
nn.Conv2d(in_channels=input_filters,out_channels=output_filters,kernel_size=1,
stride=1,padding=0,bias=False),
nn.BatchNorm2d(output_filters),
nn.ReLU(inplace=True)
)
printInputAndOutput(mobileNetBlock)
summary(mobileNetBlock,input_size=input_size,col_names=["kernel_size", "output_size", "num_params", "mult_adds"])
The idea here is to add a residual connection and with this better perfomance was obtained with a slight increase in number of parameters. For more info please read the paper or watch this tutorial by Prof Maziar Raissi
class MobileNetv2Block(nn.Module):
def __init__(self,in_channels,out_channels,expand_ratio,stride=1):
super(MobileNetv2Block,self).__init__()
self.conv1x1Begin = nn.Sequential(
nn.Conv2d(in_channels,in_channels*expand_ratio,kernel_size=1,stride=1,bias=False),
nn.BatchNorm2d(in_channels*expand_ratio),
nn.ReLU6(inplace=True))
self.convDepthWise = nn.Sequential(
nn.Conv2d(in_channels*expand_ratio,in_channels*expand_ratio,kernel_size=3,stride=stride,padding=1,groups=in_channels*expand_ratio,bias=False),
nn.BatchNorm2d(in_channels*expand_ratio),
nn.ReLU6(inplace=True)
)
self.conv1x1Last = nn.Sequential(
nn.Conv2d(in_channels*expand_ratio,out_channels,kernel_size=1,stride=1,bias=False),
nn.BatchNorm2d(out_channels),
nn.ReLU6(inplace=True))
self.stride = stride
self.use_res_connect = self.stride == 1 and in_channels == out_channels
def forward(self,x):
input_ = x
x = self.conv1x1Begin(x)
x = self.convDepthWise(x)
x = self.conv1x1Last(x)
if self.use_res_connect:
return x+input_
else:
return x
mobileNetV2Block = MobileNetv2Block(64,128,2,2)
printInputAndOutput(mobileNetV2Block)
summary(mobileNetV2Block,input_size=input_size,col_names=["kernel_size", "output_size", "num_params", "mult_adds"])
Now we can compare the summaries of each block. From the above cells we can observe that the inputs and output shapes remains the same
1)SimpleConvBlock
Total params: 73,984
Trainable params: 73,984
Non-trainable params: 0
Total mult-adds (G): 2.77
2)MobileNetV1
Total params: 9,152
Trainable params: 9,152
Non-trainable params: 0
Total mult-adds (M): 329.96
3)MobileNetV2
Total params: 26,496
Trainable params: 26,496
Non-trainable params: 0
Total mult-adds (G): 1.89
If you look at the outputs of torchinfo you can see that the estimated total size is more for mobileNets than simpleConv block this isbecause we need to store 2 times the intermediate values during training , but this wont be a problem for inference, during inference we only need to store the parameters and architecture, and thus looking above we can see that way fewer parameters and total number of multiplications and additions needed is also low which helps in faster inference. If you want more info please read the papers which are well written. If you want to read about how the torchinfo works please read this blog by Jacob C. Kimmel
Actually all the above were taken from torchvision only and we can do the same easily with torchvision classes as shown below . All credits are to the amazing torchvision library
from torchvision.models.mobilenetv2 import MobileNetV2, InvertedResidual,ConvNormActivation
#we have to put the expand_ratio as one which will reduce this to a simple mobilenetV1 block
TorchMobileNetV1Block = InvertedResidual(64,128,stride=2,expand_ratio=1)
TorchMobileNetV2Block = InvertedResidual(64,128,stride=2,expand_ratio=2)
printInputAndOutput(TorchMobileNetV1Block)
printInputAndOutput(TorchMobileNetV2Block)
summary(TorchMobileNetV1Block,input_size=input_size,col_names=["kernel_size", "output_size", "num_params", "mult_adds"])
summary(TorchMobileNetV2Block,input_size=input_size,col_names=["kernel_size", "output_size", "num_params", "mult_adds"])