akashBlog - Simple Online RealTime Tracking

Overview

This is an overview of the implementation details of SORT tracking algorithm. The official implemenation of the paper is present in this repo . The paper pretty much explains it straightforward , i will be more walking through the implemenation details. In a top level SORT is a tracking algorithm which falls in the class of tracking by detection and the detection, assoaciation , tracking cycle is happening in the 2D image domain.

Detection

Sort is a tracking by detection algorithm. So the quality of the tracking will inturn depends on the quality of detector. In the officiall implementation repo, the author have already porvided detections from the MOT Benchmark. So in the implementation present in the repo we can treat the detector as a blackbox and use the detection information already present.

Motion Model and why we need it

So we get detections in the current frame and we need to some how associate it to the detections from the previous frame. Seems like a place where we can put a Kalman filter into good use. So we use a kalman filter with constant velocity motion model and then we will treat the detections as measurements. The official implementation uses filterpy which is a python library with different kalman filter implementations.

State Variables in the constant velocity model

So what are the state variables

u –> the horizontal pixel location of center of target
v –> the vertical pixel location of the center of target
s –> area (width_bbox*height_bbox)
r –> aspect ration (width_bbox/height_bbox)

We assume a constant velocity model and also assume the aspect ratio also remains constant, our process and measurement models will be based on that.
The state variables are [u,v,s,r,u_dot,v_dot,s_dot] where u_dot,v_dot and s_dot represent the corresponding velocities. Since we are assuming a constant velocity model we can use the normal newtons equations.

u = u + u_dot * t
v = v + v_dot * t
s = s + s_dot * t
r = r
u_dot = u_dot
v_dot = v_dot
s_dot = s_dot

The final process model will look like
x_(t+1) = F * x_(t) + ProcessNoise , For constant velocity model like above the process model will look something like shown below. As its shown from the output of the dot product, we get what we expected

from sympy import symbols , Matrix
u,v,s,r,u_dot,v_dot,s_dot = symbols('u,v,s,r,u_dot,v_dot,s_dot')
F  = Matrix([[1,0,0,0,1,0,0],[0,1,0,0,0,1,0],[0,0,1,0,0,0,1],[0,0,0,1,0,0,0],  [0,0,0,0,1,0,0],[0,0,0,0,0,1,0],[0,0,0,0,0,0,1]])
x_ = Matrix([u,v,s,r,u_dot,v_dot,s_dot])
out = F.dot(x_)

\(\displaystyle \left[\begin{matrix}1 & 0 & 0 & 0 & 1 & 0 & 0\\0 & 1 & 0 & 0 & 0 & 1 & 0\\0 & 0 & 1 & 0 & 0 & 0 & 1\\0 & 0 & 0 & 1 & 0 & 0 & 0\\0 & 0 & 0 & 0 & 1 & 0 & 0\\0 & 0 & 0 & 0 & 0 & 1 & 0\\0 & 0 & 0 & 0 & 0 & 0 & 1\end{matrix}\right]\)

out

[u + u_dot, v + v_dot, s + s_dot, r, u_dot, v_dot, s_dot]

Measurement Model

We use the variables u,v,s and r as measurements. We get these from the bounding box coordinates of each detections. So we need the measurement model to convert form the state space to the measurement space and the model is very simple 4x7 matrix with

H = Matrix([[1,0,0,0,0,0,0],[0,1,0,0,0,0,0],[0,0,1,0,0,0,0],[0,0,0,1,0,0,0]])
H

\(\displaystyle \left[\begin{matrix}1 & 0 & 0 & 0 & 0 & 0 & 0\\0 & 1 & 0 & 0 & 0 & 0 & 0\\0 & 0 & 1 & 0 & 0 & 0 & 0\\0 & 0 & 0 & 1 & 0 & 0 & 0\end{matrix}\right]\)

out = H.dot(x_)
out

[u, v, s, r]

So thats about the kalman filter motion and measurement model and regarding the process noise since we are not observing velocties they are given high variances in the process matrix

Core Sort Loop

#create instance of the SORT tracker , the min_hits are the minimum times the object needed to be redetected to be considered as a valid object
# max_age is the maximum age above which the object is ignored
mot_tracker = Sort(max_age=args.max_age, 
                   min_hits=args.min_hits,
                   iou_threshold=args.iou_threshold) 


# we loop through each frame
for frame in range(int(seq_dets[:,0].max())):
frame += 1 #detection and frame numbers begin at 1
dets = seq_dets[seq_dets[:, 0]==frame, 2:7]
dets[:, 2:4] += dets[:, 0:2] #convert to [x1,y1,w,h] to [x1,y1,x2,y2]
total_frames += 1

# this part is only needed if we are displaying
if(display):
  fn = os.path.join('mot_benchmark', phase, seq, 'img1', '%06d.jpg'%(frame))
  im =io.imread(fn)
  ax1.imshow(im)
  plt.title(seq + ' Tracked Targets')

start_time = time.time()
#The the tracker update, the kalman predict and update are done within this update method.
trackers = mot_tracker.update(dets)
cycle_time = time.time() - start_time
total_time += cycle_time

Here we initially create an instance of the tracker and then loop through each frame and get the detections in those frames ,those detections are passed to the update method of the SORT. One point to note here is that within this update method the actual predict and update of the kalman is called.

Update Sort

For each unmatched detections an new kalmanfitler will be created,and in the very first loop all the kalman tracks will be created from the detections since we are not having any trackers to match against, but from the next frame onwards these trackers will be used to match against them. When a new kalman filter object is created for an unmatched detection the kalmans state vector (the first four states which we get from measurement) is with the intial measurement itself.
If already initialized trackers are there the predict method for each of the existing trackers is called which is explained in detail below.

for t, trk in enumerate(trks):
    pos = self.trackers[t].predict()[0]
    trk[:] = [pos[0], pos[1], pos[2], pos[3], 0]
    if np.any(np.isnan(pos)):
        to_del.append(t)

predict method of Sort

def predict(self):
    """
    Advances the state vector and returns the predicted bounding box estimate.
    """
    if((self.kf.x[6]+self.kf.x[2])<=0):
      self.kf.x[6] *= 0.0
    self.kf.predict()
    self.age += 1
    if(self.time_since_update>0):
      self.hit_streak = 0
    self.time_since_update += 1
    self.history.append(convert_x_to_bbox(self.kf.x))
    return self.history[-1]

Initially we check for negative area and if so we set the rate of change of area as zero
Then we do the kalman predict method which is F@x_state, and here the covariance also gets udpated.
Then we increase the age of the tracker by one and check for time since the last update was called, if it was called in the last cycle we set the hit_streak to 0.
Increase the time_since_update by 1. We set this back to zero in the update method of the kalman, this is means to know the if we had a valid kalman update for this tracker or not.
We return the converted bounding box from measurement space x_center,y_center,scale,aspect ratio to x_top,y_top,x_bottom,y_bottom.

Now we associate the predicted trackers to detections.
Associate predicted tracks to detections in the current frame

def associate_detections_to_trackers(detections,trackers,iou_threshold = 0.3): 
"""
Assigns detections to tracked object (both represented as bounding boxes)

Returns 3 lists of matches, unmatched_detections and unmatched_trackers
"""
# if the trackers is empty(happens in the begining of the cycle), we return all detections as unmatched
if(len(trackers)==0):
    return np.empty((0,2),dtype=int), np.arange(len(detections)), np.empty((0,5),dtype=int)

#here we find the iou between the detections and tracker
iou_matrix = iou_batch(detections, trackers)

#the iou_matrix will be a one with shape detection_number x tracker_number
if min(iou_matrix.shape) > 0:
    a = (iou_matrix > iou_threshold).astype(np.int32)

    #if all detection is only associated with only one tracker we can simply return where
    if a.sum(1).max() == 1 and a.sum(0).max() == 1:
        matched_indices = np.stack(np.where(a), axis=1)
    else:
        #if more than one traker is associated with any detection we use the hungarian algo and find the indexes
        matched_indices = linear_assignment(-iou_matrix)
else:
    matched_indices = np.empty(shape=(0,2))

unmatched_detections = []
#here we loop through the detections and see if there any unmatched detections
for d, det in enumerate(detections):
    if(d not in matched_indices[:,0]):
        unmatched_detections.append(d)
unmatched_trackers = []

 #here we loop through the trackers  and see if there any unmatched detections
for t, trk in enumerate(trackers):
    if(t not in matched_indices[:,1]):
        unmatched_trackers.append(t)

#filter out matched with low IOU
matches = []
for m in matched_indices:
    if(iou_matrix[m[0], m[1]]<iou_threshold):
        unmatched_detections.append(m[0])
        unmatched_trackers.append(m[1])
else:
    matches.append(m.reshape(1,2))
if(len(matches)==0):
    matches = np.empty((0,2),dtype=int)
else:
    matches = np.concatenate(matches,axis=0)

return matches, np.array(unmatched_detections), np.array(unmatched_trackers)

a. We first find the iou_matrix , this will have detections along the row and trackers along the column  
b.Now if every row and column of the iou matrix only have one value above the iou_threshold then that row,col pair will be the match with with row for      detection id and col for tracked id.
c.But if more than one mathces are there in every column then we do the linear_assignment using the hungarian alogrithm
d.Then we check for the unmatched detection by seeing if there are any rows in the iou matches without the detection.
e.Similarly we look for the unmatched tracks and see if there any column in the iou matches that are not there.
f.Then we check for the iou_threshold and see and add to mathces and non matches accordingly , then finally return the matches((det,tracker) as (row,col)), unmatched_detection,unmatched_trackers

Now update each of the tracker with the corresponding matched detections , the update method is explained in detail below. update

def update(self,bbox):
    """
    Updates the state vector with observed bbox.
    """
    self.time_since_update = 0
    self.history = []
    self.hits += 1
    self.hit_streak += 1
    self.kf.update(convert_bbox_to_z(bbox))

a.Initially we set the time since update to zero.
b.Then we set the history as an empty list
c.Then we increase the hits by 1
d. Then we call the kalman update but first have the change the bounding box form x_top,y_top,x_bottom,y_bottom to the x_center,y_center,scale,aspectRation

Now we reverse the trackers and get the state, then we check if time_since_update is < 1 , we set it to zero in the update part so here we are checking whether we have done update and only if we have done an update we append it to the output, also we check if the hit_streak(which also increase by one in the update method) is greater than the minimum hit streak unless its the begining frames.

i = len(self.trackers)
for trk in reversed(self.trackers):
    d = trk.get_state()[0]
    if (trk.time_since_update < 1) and (trk.hit_streak >= self.min_hits or self.frame_count <= self.min_hits):
         ret.append(np.concatenate((d,[trk.id+1])).reshape(1,-1)) # +1 as MOT benchmark requires positive
    i -= 1
    # remove dead tracklet
    if(trk.time_since_update > self.max_age):
          self.trackers.pop(i)
if(len(ret)>0):
      return np.concatenate(ret)
return np.empty((0,5))

We remove the dead trackers, meaning trackers that have not been assigned to any detections, by checking the time_since_update, the time_since_update is set to zero in the udpate method and is incremented in the predict method, so if we are only doing prediction without any update the time_since_update will increase and pass the maximum age and we will pop it from the trackers
Finally we concatenate the detections and give them as out.
Like this we keep updating looping through each frame and detection in it and the same detections should ideally have the same id until they disappear from the frame.