OBJECT TRACKING SYSTEMS AND METHODS
An object tracking method may include: receiving frames of data containing image information of an object; performing an object segmentation to obtain an object motion result; and using the object motion result to conduct an object tracking. In particular, the object segmentation may include: extracting motion vectors from the frames of data; estimating a global motion using the motion vectors; and subtracting the global motion from the motion vectors to generate an object motion result.
Latest Patents:
This application is a divisional of U.S. application Ser. No. 11/419,600, filed May 22, 2006, which claims the benefit of U.S. Provisional Application No. 60/754,915 filed Dec. 29, 2005, both of which are hereby incorporated herein in their entirety.
BACKGROUND OF THE INVENTION1. Field of the Invention
The present invention relates to object tracking systems and methods. More particularly, the present invention relates to object tracking systems and methods capable of identifying one or more objects from images and tracking the movement of the object(s).
2. Background
Object tracking general refers to the technique of identifying one or more objects in an image or a series of images, including a video sequence, for various purposes. As an example, object tracking can apply to security, surveillance, and personnel, process, or production management applications. Typically, tracking methods can be divided into two main classes—bottom-up and top-down approaches. Under a bottom-up approach, an image is segmented into objects, which are used for object tracking. In contrast, a top-down approach generates object hypotheses and tries to verify them using the image contents. Mean-shift and particle filter are two common object tracking methods using the top-down approach.
In many applications, object representation may become an important part for an object tracking process. For example, a feature space, such as color histograms, edges or contour, may be chosen to describe a target, which typically may come from the first image of a series of images or a video. A color histogram can represent a target for object tracking as it achieves robustness against non-rigidity, rotation, and partial occlusion. In some examples, an elliptical area may be used as a tracking area, which may surround an object to be tracked. In some cases, to reduce computational complexity during a real-time processing, m-bin histograms may be used. In one example, the color histogram distribution p(y) at location y inside an elliptic region may be determined by the following:
where nh represents the number of pixels in the region and δ denotes the Kronecker delta function. The parameter h is used to adapt the size of the region. The normalization factor
ensures that Σu=1n
To increase the reliability of the color distribution, smaller weights may be assigned to the pixels that are further away from the ellipse center as in Eq. (2).
A similarity function may define or identify the similarity between two targets. As an example, the Bhattacharyya distance is a similarity function used to measure the similarity between two color histogram probability distributions. It can be expressed:
d(p,q)=√{square root over (1−ρ[p,q])}, ρ(p,q)=Σu=1m√{square root over (puqt)}, (3)
where d(•) is the Bhattacharyya distance, ρ(•) is the Bhattacharyya parameter, m is the number of bins, and pu and qu respectively represent u-bin histogram probabilities of a candidate target and an initial target model.
Mean shift is generally a recursive object tracking method. To locate an object in each frame, mean shift starts from the position of the tracking result in the previous frame and then follows a direction of increasing similarity function to identify the next recursion starting point. Recursion usually terminates when the gradient value approaches or becomes zero, with the point of termination as the tracking result, i.e. the new location of the object being tracked. The steps identified below illustrate an example of an iterative procedure of mean shift tracking method.
Given the target model {qu}u=1 . . . m and its location y0 in the previous frame.
1. Initialize the location of the target in current frame with y0.
2. Calculate the weight according to Eq. (4).
3. Find the next location y1 of the target candidate according to Eq. (5).
4. If ∥y1−y0∥<ε, stop; else set y0=y1 and go to step 2.
Under such approach, color histograms may be used to characterize a target and a Bhattacharyya distance function may be used to measure the similarity between two distributions. A target candidate most similar to the initial target model should have the smallest distance value. Minimizing the Bhattacharyya distance d=(1−ρ(y))0.5 is equivalent to maximizing the Bhattacharyya coefficient ρ(y). Using Taylor expression around the value pu(y0), the linear approximation of the Bhattacharyya coefficient is obtained as:
Apply Bayes rule to Eq. (4) may lead to the following equation:
To minimize the distance, the second term may be maximized, with the first term being independent of y. The kernel is recursively moved from the current location y0 to the new location y1 according to the relation:
where g(x)=−k(x). The definitions of the these equations are illustrated by D. Comaniciu et al. in “Kernel-based object tracking,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 25, no. 5, pp. 564-577, May 2003. Mean shift is a recursive method and the recursive time for each tracking process is usually small. However, the initial state of each process is based on the last tracking result. Under certain conditions, the approach may cause error propagation, especially when the previous tracking result is not correct or accurate.
Particle filter technique represents a different approach. As an example, the technique may involve choosing new target candidates from the previous target candidates based on their weights in the preceding frame. Target candidates with high weights may be repeatedly selected so that a candidate with a higher weight may be chosen more than one time. Additionally, those new target candidates are updated with some feature vectors to ensure that they would be more similar to the initial target model and to give them suitable weights according to their similarity to the initial target model. Finally, the tracking result usually includes the target candidates and their weights, which would be used in next frame for choosing new target candidates.
Assume that xt represents the modeled object at time t and the vector Xt{x1, . . . , xt} is the history of the modeled object. In the same way, zt is the set of image features at time t and the history set of image features is Zt{z1, . . . , zt}. Observations zt are assumed to be independent, both mutually and with respect to the dynamical process. This may be expressed probabilistically as follows:
The conditional state-density pt at time t may be:
pt(xt)≡p(xt|Zt). (9)
Apply Bayes rule to Eq. (9) may lead to the following equation:
p(x|z)=kp(z|x)p(x). (10)
In one example, because the probability p(z|x) is sufficiently complex so p(z|x) cannot be evaluated simply in a closed form, iterative sampling techniques may be used. We generate a random variant x from a distribution p(x) that approximates the posterior p(z|x). First, a sample-set {s1, . . . , sn} is generated from the prior density p(x) with probability πi, where
The value xi chosen in this fashion has a distribution which approximates the posterior p(x|z) increasingly accurately as N increase. The steps identified below illustrate an example of an iterative procedure of a particle filter approach. A similar example is described by K. Nummiaro et al. in “An adaptive color-based particle filter,” Image and Vision Computing, vol. 21, pp. 99-110, 2003.
Given the sample set St−1 and initial object model.
-
- 1. Select N samples from the set St−1 with weight πt−1:
- (a) Calculate the normalized cumulative probabilities Ct−11, let Ct−10=0 and Ct−1n=Ct−1(n−1)=πt−1(n).
- (b) Generate a random number rε[0,1].
- (c) Find the smallest j for which Ct−1j>r.
- (d) Set st(n)=st−1j.
- 2. Update target candidate states with some feature vectors.
- 3. Give suitable weight for new candidate according to the similarity between initial target model and candidate.
- 4. Estimate the mean state of the set St, E[St]=Σn=1Nπt(n)st(n).
- 1. Select N samples from the set St−1 with weight πt−1:
Compared with the mean shift technique, the tracking results of a particle filter technique are updated during tracking process based on the target candidates instead of the last tracking results. In general, particle filter technique may present a more robust object tracking method when many target candidates are used. However, depending on the implementation, it may increase the computational complexity and require a tradeoff between efficiency and accuracy.
A hybrid tracker technique combining mean shift and particle filter was also proposed. The first step of this technique is to generate target candidates and re-sample these candidates. The second step applies mean shift technique independently to each target candidate until all target candidates are stabilized. The third step recalculates the weight for each target candidate using Bhattacharyya distance. Finally, the average is calculated to obtain tracking result. Because all target candidates are stabilized, the number of target candidates could be reduced without losing accuracy.
BRIEF SUMMARY OF THE INVENTIONExamples consistent with the invention may provide an object tracking method. The method may include: receiving frames of data containing image information of an object; performing an object segmentation to obtain an object motion result; and using the object motion result to conduct an object tracking. In particular, the object segmentation may include: extracting motion vectors from the frames of data; estimating a global motion using the motion vectors; and subtracting the global motion from the motion vectors to generate an object motion result.
Examples consistent with the invention may provide another object tracking method. The method may include: receiving frames of data containing information of an object; performing an object segmentation based on motion vectors of the series of frames of data to generate an object segmentation result with the effect of an estimated global motion removed from the object segmentation result; and conducting a similarity analysis of the object segmentation result and an initial object model.
Examples consistent with the present invention may also provide an object tracking system, the system may include: a data receiving device for receiving frames of data containing image information of an object; an object segmentation processor for performing an object segmentation; and an object tracking processor for conducting an object tracking based on the object motion result. In particular, the object segmentation performed by the object segmentation processor may include: extracting motion vectors from the frames of data; estimating a global motion using the motion vectors; and subtracting the global motion from the motion vectors to generate an object motion result.
The above summary, as well as the following detailed description of a preferred embodiment of the invention, will be better understood when read in conjunction with the following drawings. For the purpose of illustrating the invention, there is shown in the drawings illustrative examples of the invention. It should be understood that the invention is not limited to the precise arrangements and instrumentalities shown. In the drawings:
Certain terminology is used herein for convenience only and is not to be taken as a limitation on the present invention. In the drawings, the same reference letters are employed for designating the same elements throughout the several figures.
Examples of the invention provide segmentation-guided object tracking systems and methods. An exemplary method combines compressed-domain motion-based segmentation and particle filtering for tracking one or more objects. In one example, global motion parameters may be estimated to distinguish local object movements from camera movements so as to obtain a rough object mask. Based on the rough segmentation results, a particle filter may be used with a small number of target candidates to refine the tracking results.
In some examples, compressed-domain segmentation based on motion vectors is first performed to obtain the rough segmentation masks of the moving objects. Because the segmentation method may be based on motion vectors, it can avoid the possible confusion due to the cluttered scenes. A label set method may then be used to determine the number of objects and the corresponding location of each object in each frame. After the rough segmentation process, the similarity between the extracted object(s) and the initial target model may be calculated to determine the reliability of the segmentation result. If the segmentation result is considered reliable, the starting state of tracking method is based on the segmentation result. Otherwise, the object is supposed to be static and we may use the last tracking result as the beginning state. Guided with a reliable initial object location, the method needs only a much smaller number of target candidates for applying a particle filter approach to reach a tracking result. In some examples, this reduces the computational complexity of a system without sacrificing the robustness of the tracking results.
Referring to the example illustrated in
In some examples, when multiple objects are being tracked, the method illustrated in
where Nc and Np represent the numbers of objects in the current and previous frames, respectively. The value of Rij would become 1 if object i and object j in two consecutive frames overlap with each other; otherwise Rij would be 0. Table 1 below illustrates an example of the SOC and SOR values corresponding to different object states, includes an object leaving a frame, an object entering a frame, merging of multiple objects into a single object, and splitting of one object into multiple objects.
In one example, after receiving a compressed video stream, such as one in an MPEG-4 or other compressed formats, a system may extract block motion vectors from the compressed video to perform object segmentation. The block size in one example is 8×8. A simplified affine motion model may be applied to estimate global motion from these motion vectors. In one example, the simplified affine model is modeled by four parameters, i.e. (a, b, c, d), and it can be written as
wherein the parameters a and b control the scaling and rotation, c and d control the translation, and (x, y) and (x′, y′) respectively represent the position in the previous and current frames.
In one example, it is assumed that a total of M motion vectors with corresponding positions (xi, yi) and (xi′, yi′), i=1, . . . , M, in adjacent frames are provided to the affine motion model to estimate the four parameters by solving the following over-determined linear system:
Motion vectors with global motion and local motion vectors may be used to estimate the four parameters. Because of local motion, it would make the parameters of affine motion model incorrect. In the least squares algorithm, we may apply the least squares estimation first and then compute the fitting errors Ui=axi−byi+c−xi, Vi=ayi+bxi+d−yi for every data point. The standard deviation of error statistics is used to identify the data points as outliers and discard the outliers from the data set. This process can be repeated until there is no new outlier being identified. Thus, the converged motion parameters may represent the global motion, such as the motion of a moving or rotating camera, well. Using the known the global motion, we may extract object motion(s) for each block. Finally, a system may cluster the object segmentation results for multi-object tracking, which may be characterized as the label set process described above.
As illustrated above, a similarity analysis may be applied to evaluate the correctness of object segmentation result to measure the similarity between the segmentation region and the initial target model. In one example, the color histogram is used to characterize the target objects. The Bhattacharyya distance is used to measure the similarity between two color histogram distributions. If the Bhattacharyya distance between the two distributions of the segmented object and the initial object model is less than a threshold, the segmentation is considered reliable. Otherwise, the result may be regarded as unreliable or less reliable. The threshold for determining the correctness of segmentation results can be obtained empirically. As an example, a threshold value of 0.425 may be used. Because the segmentation result is larger than object area, we may randomly select N target candidates having a radius larger than that of the initial target model. The number N=15 is used in exemplary computer simulations.
In one example, we may apply a particle filter to refine the tracking results. A small number of target candidates may be used because a rough segmentation result has been obtained. It randomly selects N target candidates according to segmentation results if the results are correct. In contrast, the last tracking results are referred to if the segmentation results are incorrect. The number of target candidates here may be obtained empirically.
In some examples, a system may be designed or configured to perform object tracking based on the exemplary methods illustrated above. For example, an object tracking system may include a device, such as a data receiving interface, for receiving frames of data in a coded format containing image information of an object; an object segmentation processor for performing an object segmentation; and an object tracking processor for conducting an object tracking based on the object motion result. As an example, the coded format may be any of the formats that compress raw or original data of images or video in a format that reduces the size of the original data. In one example, format such as one of the MPEG (Motion Picture Expert Group) standards, mpg, mov, rm, wmv, etc. may be used.
Specifically, as an example, one or more steps of the method described above can be implemented with a system, such as a computer system or a system having a processor.
Accordingly, a data receiving interface, which is for receiving frames of data containing image information of an object, can be one of video or image input 46, an interface with network 48, or an interface with computer-readable medium 44. And an object segmentation processor, which is for performing an object segmentation, can be implemented with processor or computer 42. In one example, processor or computer 42 may be configured, by hardware, software, firmware, or processor instructions, to perform functions such as extracting motion vectors from the frames of data; estimating a global motion using the motion vectors; and subtracting the global motion from the motion vectors to generate an object motion processor. An object tracking processor, which is for conducting an object tracking based on the object motion result, can also be implemented with processor or computer 42.
Specifically, the object segmentation done by the object segmentation processor may include: extracting motion vectors from the frames of data; estimating a global motion using the motion vectors; and subtracting the global motion from the motion vectors to generate an object motion result. In some examples, the object segmentation processor may estimate the global motion by distinguishing local object motions from camera motions to derive a rough object mask, as illustrated above. Also, the object segmentation processor may estimate the global motion by applying a simplified affine motion model to process the motion vectors. In some examples, the object segmentation processor may further perform a label set process after generating the object motion result. Specifically, the label set process may include distinguishing the object from a second object for conducting multiple-object tracking. Similar to the method illustrated above, the object tracking processor may conduct the object tracking by performing a similarity analysis between sampling points identified based on the object motion result and an initial object model of the frames of data. Also, the object tracking processor may conduct the object tracking by refining the object motion result by applying particle filtering. As described above, in one example, the frames of data may belong to a portion of a compressed video stream and the object segmentation processor may perform the object segmentation in a compressed domain.
The object tracking system as illustrated above may be configured in various ways. For example, the object segmentation processor and the object tracking processor each may be implemented with an embedded processor or a function-specific hardware. Alternatively, one or both of the object segmentation processor and the object tracking processor may be implemented with one processor, such as a single processor configured with instructions to perform one or both of the object segmentation and the object tracking functions.
Referring to
Referring to
Applying the technique identified above, we may use a few exemplary video sequences for performance evaluation. In particular, four video sequences respectively showing a swimming fish, two fruits, a transparent object, and a moving apple, and respectively including 39, 30, 46, and 15 frames are used in one example. Object tracking results using known means shift, particle filter, and hybrid tracker methods are also applied for comparison analysis. In one example, the implementation of separate methods were done by computer program codes coded using C language and those codes were executed with a computing system with a Pentium® 4 1.8 G. Other types of implementations or systems may be used depending on various design, system, or cost considerations. In one example, the following equation is used for measuring the accuracy of tracking results:
where Mtref represents the ground-truth of the tth frame, Mttrack represents the tracked object masks of the tth frame, and (x, y) represent the index of a pixel.
Table 2 below illustrates an example for comparing the average computation time of four tracking methods for all of the test sequences. A larger frame-per-second value generally suggests a better performance speedwise. As illustrated by Table 2, the mean shift may be the fastest one among four methods. However, the experiments discussed above suggested that the object tracking capability of mean shift is poor. In contrast, the proposed implementation may provide fairly well tracking capability and is generally faster than particle filter and hybrid tracker methods in this example.
As illustrated above, examples of the invention provide segmentation-guided object tracking methods and systems. In one example, the motion vectors are firstly extracted as features from compressed-video for video object segmentation. The global motion operation is performed using the extracted motion vectors to obtain a rough object mask. The proposed invention may measure the similarity between the segmentation result and an initial target model using Bhattacharyya distance to decide the value of beginning state. Finally, a particle filter with a relatively small number of target candidates may be used to refine the tracking result. Experimental results in one example suggest that an exemplary implementation using the proposed technique can achieve good tacking accuracy and reliability and may be comparable to or more advantageous than the mean shift, particle filter, and hybrid tracker methods.
It will be appreciated by those skilled in the art that changes could be made to the examples described above without departing from the broad inventive concept thereof. It is understood, therefore, that this invention is not limited to the particular examples disclosed, but it is intended to cover modifications within the spirit and scope of the present invention.
Claims
1. An object tracking method comprising:
- receiving frames of data containing compressed-domain information of an object;
- performing an object segmentation based on motion vectors of the series of frames of data in a compressed domain to generate an object segmentation result with the effect of an estimated global motion removed from the object segmentation result; and
- conducting a similarity analysis of the object segmentation result and an initial object model.
2. The method of claim 1, further comprising refining the object segmentation result by applying particle filtering.
3. The method of claim 1, wherein performing the object segmentation comprises:
- extracting the motion vectors from the frames of data;
- estimating the global motion using the motion vectors;
- subtracting the global motion from the motion vectors to generate the object segmentation result.
4. The method of claim 3, further comprises performing a label set process after generating the object segmentation result, wherein the label set process comprises distinguishing the object from a second object for conducting multiple-object tracking.
5. The method of claim 3, wherein the frames of data belong to a portion of a compressed video stream and performing the object segmentation comprises performing the object segmentation in the compressed domain.
Type: Application
Filed: Apr 21, 2010
Publication Date: Aug 12, 2010
Applicant:
Inventors: Chia-Wen Lin (Chiayi County), Zhi-Hong Ling (Kaohsiung City), Kual-Zheng Lee (Hsinchu County)
Application Number: 12/764,563
International Classification: G06K 9/00 (20060101);