An IOU explanation and implementaion walk through

In this blogpost i will explain what is IOU, where is it used , how is it implemented

What is IOU

IOU is pretty much clear by the name intersection over union. The formula is

  • IOU = Area of Intersection / Area of union
  • Area of union = First Box Area + Second Box Area -Intersection Area

How is it implemented(basic)

Here i will show a simple implementation in pytorch.If you look at the below picture we will get a basic idea of how to get the intersection between two boxes, the rest are simple

For the basic implementation of this can be found in this nice blogpost and from that is basic implemenation is like this

#collapse-hide
def batch_iou(a, b, epsilon=1e-5):
    """ Given two arrays `a` and `b` where each row contains a bounding
        box defined as a list of four numbers:
            [x1,y1,x2,y2]
        where:
            x1,y1 represent the upper left corner
            x2,y2 represent the lower right corner
        It returns the Intersect of Union scores for each corresponding
        pair of boxes.

    Args:
        a:          (numpy array) each row containing [x1,y1,x2,y2] coordinates
        b:          (numpy array) each row containing [x1,y1,x2,y2] coordinates
        epsilon:    (float) Small value to prevent division by zero

    Returns:
        (numpy array) The Intersect of Union scores for each pair of bounding
        boxes.
    """
    # COORDINATES OF THE INTERSECTION BOXES
    x1 = np.array([a[:, 0], b[:, 0]]).max(axis=0)
    y1 = np.array([a[:, 1], b[:, 1]]).max(axis=0)
    x2 = np.array([a[:, 2], b[:, 2]]).min(axis=0)
    y2 = np.array([a[:, 3], b[:, 3]]).min(axis=0)

    # AREAS OF OVERLAP - Area where the boxes intersect
    width = (x2 - x1)
    height = (y2 - y1)

    # handle case where there is NO overlap
    width[width < 0] = 0
    height[height < 0] = 0

    area_overlap = width * height

    # COMBINED AREAS
    area_a = (a[:, 2] - a[:, 0]) * (a[:, 3] - a[:, 1])
    area_b = (b[:, 2] - b[:, 0]) * (b[:, 3] - b[:, 1])
    area_combined = area_a + area_b - area_overlap

    # RATIO OF AREA OF OVERLAP OVER COMBINED AREA
    iou = area_overlap / (area_combined + epsilon)
    return iou

Where is it used and how to implement for that use case

But the above implementation assumes that both the bounding boxes have the same set of batches,which is rarely the case. IOU is mainly used in object detection tasks.

  1. We will have a set of anchors for each position in the feature map,for eg say if we have a feature map of shape 5x5 and there are 3 anchors per position then there will be 5x5x3=75 total anchors
  2. The Ground trouth boxes for that feature map may be much less the number of anchors
  3. We need to find the matching anchors to the bounding boxes, so we can select that portion of the feature map for the downstream predictions.

Implementing for the above use case

Basically when we get two boxes say

a- B,M,4 -- the anchor boxes after reshaping(B,A,H,W,4) where A is number of anchors

b- B,N,4 --the real bboxes. N is the max number of boxes in certain image and the other images will be padded with -1.

we need to compute iou between a and b so each box in a is compare with each box in b. So we should make N copies of copies of each box in a to be compare with N bboxes. Also if we want to vectorise this operation then we need to make M copies of b. So the final dimensions will be

a - B,M,N,4 b - B,M,N,4

Now we can say like each slice of the both a and b can be compared

import torch
#say the given anchors and bboxes are in shape x_top,y_top,x_btm,y_btm
sample_anchors = torch.tensor([[[[[5.,5,15,15], [25,25,35,35],[1,1,9,9]]]]]) #only 1 batch
bboxes = torch.tensor([[[1.,1,11,11], [20,20,30,30]]]) 
B = bboxes.shape[0]
no_of_bboxes = bboxes.shape[1]
print('sample anchors \n', sample_anchors,'\n')
print('sample bboxes \n', bboxes,'\n')
print('sample number of anchors shape ',sample_anchors.shape)
print('sample bboxes shape ',bboxes.shape,'\n')
sample anchors 
 tensor([[[[[ 5.,  5., 15., 15.],
           [25., 25., 35., 35.],
           [ 1.,  1.,  9.,  9.]]]]]) 

sample bboxes 
 tensor([[[ 1.,  1., 11., 11.],
         [20., 20., 30., 30.]]]) 

sample number of anchors shape  torch.Size([1, 1, 1, 3, 4])
sample bboxes shape  torch.Size([1, 2, 4]) 

Here we need to compare the 3 anchor boxes with the two bboxes, first we reshape the anchors to be of shape batch,total_anchors,4,

we need to compute iou between sample_anchors and bboxes so each of the 3 anchors are compared with the bboxes which is 2 here. So for vectorized implementation we should make 3 copies of copies of each anchor in sample_anchors to be compare with 2 bboxes. Also if we should make 3 copies of b to aid in vectorized implementation. So the final dimensions will be

  • sample_anchors - B,3,2,4
  • b=boxes - B,3,2,4
sample_anchors = sample_anchors.reshape(B,-1,4)
no_of_anchors = sample_anchors.shape[1]
sample_anchors = sample_anchors.unsqueeze(2).expand(-1,-1,no_of_bboxes,-1)
print(sample_anchors)
print(sample_anchors.shape)
tensor([[[[ 5.,  5., 15., 15.],
          [ 5.,  5., 15., 15.]],

         [[25., 25., 35., 35.],
          [25., 25., 35., 35.]],

         [[ 1.,  1.,  9.,  9.],
          [ 1.,  1.,  9.,  9.]]]])
torch.Size([1, 3, 2, 4])
bboxes = bboxes.unsqueeze(1).expand(-1,no_of_anchors,-1,-1)
print(bboxes)
print(bboxes.shape)
tensor([[[[ 1.,  1., 11., 11.],
          [20., 20., 30., 30.]],

         [[ 1.,  1., 11., 11.],
          [20., 20., 30., 30.]],

         [[ 1.,  1., 11., 11.],
          [20., 20., 30., 30.]]]])
torch.Size([1, 3, 2, 4])
#first we need to find the intersection for that width and height of the intersection area
#this inturn can be obtained by finding the lefttop and bottom corner cordinates and subtracting them

left_top = torch.max(sample_anchors[:,:,:,:2],bboxes[:,:,:,:2])
right_bottom = torch.min(sample_anchors[:,:,:,2:],bboxes[:,:,:,2:])
delta = right_bottom - left_top
print(delta)
tensor([[[[  6.,   6.],
          [ -5.,  -5.]],

         [[-14., -14.],
          [  5.,   5.]],

         [[  8.,   8.],
          [-11., -11.]]]])
#The first element of delta is width and the next element is height, we can remove negative values 
#since this will be boxes that are not intersecting 
#(remember the the image top left if (0,0) and bottom y is positive downwards)
delta[delta<0]=0
#now find the intersection area
interesection_area = delta[:,:,:,0]*delta[:,:,:,1]
print(interesection_area)
print(interesection_area.shape)
tensor([[[36.,  0.],
         [ 0., 25.],
         [64.,  0.]]])
torch.Size([1, 3, 2])

A small picture represntation is tried below,we can see that first and 3rd anchors intersect with first bounding box while the 2nd anchor intersect with the next one

From the intersection area above we can see that the where there are no itersection the area is zero and thus in this case the first and last anchor mathces with the first bbox while the second anchor mathces with the second one

#now we need to find the Area of union which is 
#Area of union = First Box Area + Second Box Area -Intersection Area
sample_anchors_area = (sample_anchors[:,:,:,2]-sample_anchors[:,:,:,0])*(sample_anchors[:,:,:,3] -
                                                                        sample_anchors[:,:,:,1])
bbox_area = (bboxes[:,:,:,2] - bboxes[:,:,:,0]) * (bboxes[:,:,:,3] - bboxes[:,:,:,1])
iou = interesection_area/(sample_anchors_area+bbox_area - interesection_area)
print(iou)
print(iou.shape)
tensor([[[0.2195, 0.0000],
         [0.0000, 0.1429],
         [0.6400, 0.0000]]])
torch.Size([1, 3, 2])

so the final iou matrix will have shape (Batch,no_of_anchors,no_of_bboxes)

Downstream usage of this iou

This iou matrix will be used for calculation the regression offsets, negative anchors,ground truth class . The other place where iou is used is for mean Average Precision at the end which if possible i will explain in another post

Complete code

Below i will provide a small code for implementing this in a batch

def IOU(anchors,bboxes):
    #anchors B,A,H,W,4
    #bboxes B,N,4
    B = anchors.shape[0]
    anchors = anchors.reshape(B,-1,4)
    M,N = anchors.shape[1],bboxes.shape[1]
    
    #expanding
    anchors = anchors.unsqueeze(2).expand(-1,-1,N,-1)
    bboxes = bboxes.unsqueeze(1).expand(-1,M,-1,-1)
    
    left_top = torch.max(anchors[:,:,:,:2],bboxes[:,:,:,:2])
    right_bottom = torch.min(anchors[:,:,:,2:],bboxes[:,:,:,2:])
    
    delta = right_bottom - left_top
    delta[delta<0] = 0
    
    intersection_area = delta[:,:,:,0]*delta[:,:,:,1]
    
    anchors_area = (anchors[:,:,:,2]-anchors[:,:,:,0])*(anchors[:,:,:,3] -anchors[:,:,:,1])
    bbox_area = (bboxes[:,:,:,2] - bboxes[:,:,:,0])* (bboxes[:,:,:,3] - bboxes[:,:,:,1])
    iou = interesection_area/(anchors_area+bbox_area - interesection_area)
    
    return iou