Explaining IoU
For the basic implementation of this can be found in this nice blogpost and from that is basic implemenation is like this
#collapse-hide
def batch_iou(a, b, epsilon=1e-5):
""" Given two arrays `a` and `b` where each row contains a bounding
box defined as a list of four numbers:
[x1,y1,x2,y2]
where:
x1,y1 represent the upper left corner
x2,y2 represent the lower right corner
It returns the Intersect of Union scores for each corresponding
pair of boxes.
Args:
a: (numpy array) each row containing [x1,y1,x2,y2] coordinates
b: (numpy array) each row containing [x1,y1,x2,y2] coordinates
epsilon: (float) Small value to prevent division by zero
Returns:
(numpy array) The Intersect of Union scores for each pair of bounding
boxes.
"""
# COORDINATES OF THE INTERSECTION BOXES
x1 = np.array([a[:, 0], b[:, 0]]).max(axis=0)
y1 = np.array([a[:, 1], b[:, 1]]).max(axis=0)
x2 = np.array([a[:, 2], b[:, 2]]).min(axis=0)
y2 = np.array([a[:, 3], b[:, 3]]).min(axis=0)
# AREAS OF OVERLAP - Area where the boxes intersect
width = (x2 - x1)
height = (y2 - y1)
# handle case where there is NO overlap
width[width < 0] = 0
height[height < 0] = 0
area_overlap = width * height
# COMBINED AREAS
area_a = (a[:, 2] - a[:, 0]) * (a[:, 3] - a[:, 1])
area_b = (b[:, 2] - b[:, 0]) * (b[:, 3] - b[:, 1])
area_combined = area_a + area_b - area_overlap
# RATIO OF AREA OF OVERLAP OVER COMBINED AREA
iou = area_overlap / (area_combined + epsilon)
return iou
But the above implementation assumes that both the bounding boxes have the same set of batches,which is rarely the case. IOU is mainly used in object detection tasks.
- We will have a set of anchors for each position in the feature map,for eg say if we have a feature map of shape 5x5 and there are 3 anchors per position then there will be 5x5x3=75 total anchors
- The Ground trouth boxes for that feature map may be much less the number of anchors
- We need to find the matching anchors to the bounding boxes, so we can select that portion of the feature map for the downstream predictions.
Basically when we get two boxes say
a- B,M,4
-- the anchor boxes after reshaping(B,A,H,W,4) where A is number of anchors
b- B,N,4
--the real bboxes. N is the max number of boxes in certain image and the other images will be padded with -1.
we need to compute iou between a
and b
so each box in a
is compare with each box
in b
. So we should make N copies of copies of each box in a
to be compare with
N bboxes
. Also if we want to vectorise this operation then we need to make M
copies of b
. So the final dimensions will be
a - B,M,N,4
b - B,M,N,4
Now we can say like each slice of the both a
and b
can be compared
import torch
#say the given anchors and bboxes are in shape x_top,y_top,x_btm,y_btm
sample_anchors = torch.tensor([[[[[5.,5,15,15], [25,25,35,35],[1,1,9,9]]]]]) #only 1 batch
bboxes = torch.tensor([[[1.,1,11,11], [20,20,30,30]]])
B = bboxes.shape[0]
no_of_bboxes = bboxes.shape[1]
print('sample anchors \n', sample_anchors,'\n')
print('sample bboxes \n', bboxes,'\n')
print('sample number of anchors shape ',sample_anchors.shape)
print('sample bboxes shape ',bboxes.shape,'\n')
Here we need to compare the 3 anchor boxes with the two bboxes, first we reshape the anchors to be of shape batch,total_anchors,4
,
we need to compute iou between sample_anchors
and bboxes
so each of the 3
anchors are
compared with the bboxes which is 2
here. So for vectorized implementation we should make 3 copies of copies of each anchor in sample_anchors
to be compare with
2 bboxes
. Also if we should make 3
copies of b
to aid in vectorized implementation. So the final dimensions will be
sample_anchors - B,3,2,4
b=boxes - B,3,2,4
sample_anchors = sample_anchors.reshape(B,-1,4)
no_of_anchors = sample_anchors.shape[1]
sample_anchors = sample_anchors.unsqueeze(2).expand(-1,-1,no_of_bboxes,-1)
print(sample_anchors)
print(sample_anchors.shape)
bboxes = bboxes.unsqueeze(1).expand(-1,no_of_anchors,-1,-1)
print(bboxes)
print(bboxes.shape)
#first we need to find the intersection for that width and height of the intersection area
#this inturn can be obtained by finding the lefttop and bottom corner cordinates and subtracting them
left_top = torch.max(sample_anchors[:,:,:,:2],bboxes[:,:,:,:2])
right_bottom = torch.min(sample_anchors[:,:,:,2:],bboxes[:,:,:,2:])
delta = right_bottom - left_top
print(delta)
#The first element of delta is width and the next element is height, we can remove negative values
#since this will be boxes that are not intersecting
#(remember the the image top left if (0,0) and bottom y is positive downwards)
delta[delta<0]=0
#now find the intersection area
interesection_area = delta[:,:,:,0]*delta[:,:,:,1]
print(interesection_area)
print(interesection_area.shape)
A small picture represntation is tried below,we can see that first and 3rd anchors intersect with first bounding box while the 2nd anchor intersect with the next one
From the intersection area above we can see that the where there are no itersection the area is zero and thus in this case the first and last anchor mathces with the first bbox while the second anchor mathces with the second one
#now we need to find the Area of union which is
#Area of union = First Box Area + Second Box Area -Intersection Area
sample_anchors_area = (sample_anchors[:,:,:,2]-sample_anchors[:,:,:,0])*(sample_anchors[:,:,:,3] -
sample_anchors[:,:,:,1])
bbox_area = (bboxes[:,:,:,2] - bboxes[:,:,:,0]) * (bboxes[:,:,:,3] - bboxes[:,:,:,1])
iou = interesection_area/(sample_anchors_area+bbox_area - interesection_area)
print(iou)
print(iou.shape)
so the final iou matrix will have shape (Batch,no_of_anchors,no_of_bboxes)
def IOU(anchors,bboxes):
#anchors B,A,H,W,4
#bboxes B,N,4
B = anchors.shape[0]
anchors = anchors.reshape(B,-1,4)
M,N = anchors.shape[1],bboxes.shape[1]
#expanding
anchors = anchors.unsqueeze(2).expand(-1,-1,N,-1)
bboxes = bboxes.unsqueeze(1).expand(-1,M,-1,-1)
left_top = torch.max(anchors[:,:,:,:2],bboxes[:,:,:,:2])
right_bottom = torch.min(anchors[:,:,:,2:],bboxes[:,:,:,2:])
delta = right_bottom - left_top
delta[delta<0] = 0
intersection_area = delta[:,:,:,0]*delta[:,:,:,1]
anchors_area = (anchors[:,:,:,2]-anchors[:,:,:,0])*(anchors[:,:,:,3] -anchors[:,:,:,1])
bbox_area = (bboxes[:,:,:,2] - bboxes[:,:,:,0])* (bboxes[:,:,:,3] - bboxes[:,:,:,1])
iou = interesection_area/(anchors_area+bbox_area - interesection_area)
return iou