Description
Supplementary Info
Lucas-Kanade Tracker
A Lucas Kanade tracker uses a warp function W(x;p) to align an image I to a template T. The optimal parameters p∗ of the warp W(x;p) are found by minimizing loss L. The loss L is the pixel-wise sum of square difference between the warped image I(W(x;p)) and template T, refer eq 1 and 2.
Note: We denote pixel locations by x, so I(x) is the pixel value (eg. rgb) at location x in image I. For simplicity, I and T are treated as column vectors instead of a matrix (like linearized matrices!). W(x;p) is the point obtained by warping x. I(W(x;p)) is the pixel value (eg. rgb) after warping.
L = X[T(x) − I(W(x;p))]2 (1)
x
p∗ = argminL (2)
p
In general, this is a difficult non-linear optimization of L in p for even simple warps like affine warps, but the Lucas-Kanade tracker circumvents this by using a key idea – forward additive alignment. The idea assumes we already have a very close estimate p0 of the correct warp p∗, then we can assume that a small linear change ∆p is enough to get the best alignment i.e. p∗ = p0 +∆p. This is the forward additive form of the warp. The loss L can then be written as:
L = X[T(x) − I (W(x;p0 + ∆p))]2
x (3)
∆p∗ = argminL
∆p (4)
p∗ = p0 + ∆p∗
Expanding this to the first order with Taylor Series: (5)
(x;p (6)
x
Here the Jacobian of I , is the vector containing the horizontal and vertical gradient at pixel location x. Rearranging the Taylor expansion, it can be rewritten as a typical least squares approximation ∆p∗ = argmin||A∆p−b||2, note we pull out a negative sign thanks to the squaring,
∆p
∆p∗ = argmin p (7)
x
This can be solved with ∆p∗ = (ATA)−1ATb where ATA is the Hessian H:
(ATA) = H = X (8)
x
(9)
b = T(x) − I(W(x;p0)) (10)
Once ∆p∗ is computed, the best estimate warp p∗ can be estimated using eq 5. However, in practice, our initial estimate p0 can be well off the optimal p∗, thus violating the forward additive alignment assumption. As a fix, we perform the optimization L in an iterative fashion using an error threshold ϵ as described in algorithm 2 from [?].
Algorithm 1 The Lucas-Kanade Algorithm
(0) p ← p0
(1) Iterate until ||∆p|| ≤ ϵ:
(2) Warp I with W(x;p) to compute I(W(x;p))
(3) Compute the error image E(x) = T(x) − I(W(x;p))
(4) Warp the gradient ∇I with W(x;p)
(5) Evaluate the Jacobian ∂∂Wp at (x;p)
(6) Compute the steepest descent image ∇I∂∂Wp
T
(7) Compute the Hessian matrix H
(8) Compute ∆p
(9) Update the parameters p ← p + ∆p
Matthews-Baker Tracker
The key to the efficiency is switching the role of the image I and the template T in the algorithm. In contrast to the Lucas-Kanade tracker (warp image I to template T), the Matthews-Baker tracker warps the template T to the image I. This key difference leads to the computational gains because unlike the image I which changes with each frame of the video, the template T remains fixed throughout the video. If we can write the Hessian and Jacobian involved in the optimization of L in terms of the template T instead of image I, then we only need to compute them once at the start of the tracking.
Here is the inverse compositional form of the Matthew-Baker tracker given without the proof of equivalence to the forward additive form of the Lucas-Kanade tracker. Please refer [?] for the proof. Given an initial estimate p0, we want to find the ∆p∗ to minimize L as follows,
L = X[T(W(x;∆p)) − I(W(x;p0))]2
x (11)
∆p∗ = argminL (12)
∆p
This completes our role switch of the template T and the image I, please note that ∆p are the parameters of the warp W applied to T and p0 are the parameters of the warp W applied to I.
Another key difference of Matthews-Baker tracker to the Lucas-Kanade tracker is – how do we combine the two warps W(x;p0) and W(x;∆p)? In the Lucas-Kanade tracker we combined the two warps by simply adding parameter p0 to another parameter ∆p, and produce a new warp W(x;p + ∆p). Specifically, p∗ = p0 + ∆p∗, W(x;p∗) = W(x;p0) + W(x;∆p∗). In Matthews-Baker tracker, we use another way of combination through composition as follows,
W(x;p∗) = W(x;p0) ◦ W(x;∆p∗)−1 (13)
If we are using affine warps, then the warps can be implemented as matrix multiplication, then we can simplify eq 13 as follows
W(x;p∗) = W(W(x;∆p∗)−1;p0)
W(x;p∗) = W(p0)W(∆p∗)−1x
Let us simplify L as before using first order linear approximation of T(W(x;∆p)). Please note, for tracking a template T, the summation over x is performed only over the pixels lying inside T.
L ≈ X (14)
x where .
We now proceed to find a closed form solution to ∆p∗ by equating the derivative ∂∂∆Lp to zero.
∂
(15)
x
Setting to zero, switching from summation to vector notation and solving for ∆p we get
∆p = H−1JT [I(W(x;p0)) − T] (16)
where J is the Jacobian of T(W(x;∆p)), J = ∇T∂∂Wp0, H is the approximated Hessian H = JTJ and
I(W(x;p0)) is the warped image. Note that for a given template, the Jacobian J and Hessian H are independent of p0. This means they only need to be computed once and then they can be reused during the entire tracking sequence.
Once ∆p∗ has been solved for, it needs to be inverted and composed with p0 to get the new warp parameters for the next iteration.
W(x;p0) ← W(x;p0) ◦ W(x;∆p∗)−1 (17)
The next iteration solves Equation 16 starting with the new value of p0. Possible termination criteria include the absolute value of ∆p∗ falling below some value or running for some fixed number of iterations.
Algorithm 2 The Matthews-Baker Algorithm
(0) p ← p0
(1) Pre-compute
(1) Evaluate the gradient of ∇T of the template T(x)
(2) Evaluate the Jacobian ∂∂Wp at (x;0)
(3) Compute the steepest descent images ∇T∂∂Wp
(4) Compute the Hessian matrix using J JTJ
(5)
(6) Iterate until ||∆p|| ≤ ϵ:
(7) Warp I with W(x;p) to compute I(W(x;p))
(8) Compute the error image E(x) = I(W(x;p)) − T(x)
T
(9) Compute ∆p
(10) Update the parameters p ← p ◦ (∆p)−1
Reviews
There are no reviews yet.