METHOD, SYSTEM AND APPARATUS FOR PERFORMING RE-IDENTIFICATION IN IMAGES
A method of performing person re-identification for images captured by at least two camera pairs operating with different environmental factors. Descriptors representing characteristics of objects corresponding to a person in the images are clustered. A probability distribution of the clustered descriptors is determined. A coupling map for the images is determined based on the probability distribution. A cross-correlation between at least two of the coupling maps is determined. A similarity of the images captured by the camera pairs is determined according to the cross-correlation. Person re-identification is performed for the different environmental factors using the descriptors, based on the determined similarity.
The present invention relates generally to image processing and, in particular, to matching objects between two captured images to determine whether a candidate object is an object of interest. The present invention also relates to a method, apparatus and system for performing person re-identification for images captured by at least two camera pairs, and to a computer program product including a computer readable medium having recorded thereon a computer program for performing person re-identification for images captured by at least two camera pairs.
BACKGROUNDPublic venues such as shopping centres, parking lots and train stations are increasingly subject to surveillance using large-scale networks of video cameras. Application domains of large-scale video surveillance include security, safety, traffic management and business analytics. In one example application from the security domain, a security officer may want to view a video feed containing a particular suspicious person in order to identify undesirable activities. In another example from the business analytics domain, a shopping centre may wish to track customers across multiple cameras in order to build a profile of shopping habits.
Many surveillance applications require methods, known as “video analytics”, to detect, track, match and analyse multiple objects of interest across multiple camera views. In one example, referred to as a “hand-off” application, object matching is used to persistently track multiple objects across first and second cameras with overlapping fields of view. In another example application, referred to as “re-identification”, object matching is used to locate a specific object of interest across multiple cameras in the network with non-overlapping fields of view.
Cameras at different locations may have different viewing angles and work under different lighting conditions, such as indoor and outdoor. The different viewing angles and lighting conditions may cause the visual appearance of a person to change significantly between different camera views. In addition, a person may appear in a different orientation in different camera views, such as facing towards or away from the camera, depending on the placement of the camera relative to the flow of pedestrian traffic. Robust person re-identification in the presence of appearance change due to camera viewing angle, lighting and person orientation is difficult.
A person re-identification (ReID) model consists of an appearance descriptor extraction and a distance metric model. An appearance descriptor is a feature vector representing the appearance of a person. An appearance descriptor is a derived value or set of derived values determined from the pixel values in an image of a person. An appearance descriptor may be directly extracted from an image. One example of an appearance descriptor is a histogram of colour values. Another example of an appearance descriptor is a histogram of quantized image gradient responses. An appearance descriptor extractor may also be learned from a set of training images containing different persons using a supervised learning method or an unsupervised learning method. For example, a deep convolutional neural network may be learned in a supervised manner to separate training images based on the persons' identities. An appearance descriptor is then derived from one or more top layers of the learned deep neural network. A deep neural network may also be learned in an unsupervised manner to reconstruct input training images without any knowledge of persons' identities. An appearance descriptor is then derived from one or more top layers of the learned deep neural network.
Given a person's image in a camera view, a distance metric model may be used to determine the distances from the given image to a set of images in another camera view. The image with the smallest distance to the given image is considered as a closest match. The performance of person re-identification depends on the distance metric selected. General-purpose distance metrics, e.g., Euclidean distance and cosine distance, are commonly used by a distance metric model. A distance metric model may also be learned from a training dataset using a supervised learning method or an unsupervised method. In most known supervised and unsupervised learning methods, a projection is learned from appearance descriptors extracted from pairs of training images of people captured from a pair of cameras. In each pair of images, the first image is captured from the first camera and the second image is captured from the second camera. During the matching process, the learned projection is used to project appearance descriptors to a subspace and calculate the distances between the projected appearance descriptors. Supervised learning methods require training images to be labelled as “positive” or “negative” training images. Pairs of images of the same person are “positive” training images. Pairs of images of different persons are “negative” training images. Unsupervised learning methods do not require labelled training images. Supervised and unsupervised learning methods fail when the distribution of appearance descriptors corresponding to training images is vastly different from the distribution of appearance descriptors corresponding to testing images. The training images are referred to as source domain images and the testing images are referred to as target domain images. Further, the disparity in the distributions of appearance descriptors between the source and target domain is referred to as the domain shift problem. The degree of the disparity in the distributions is referred to as the domain gap. If the domain gap between the source and target domain is large, the domain similarity between the source and target domain is small and a person re-identification model learned on source domain images does not perform well on target domain images. For example,—a person re-identification model is learned on images captured from a pair of cameras in a shopping mall (indoor environment) and then used on images captured from a pair of cameras in a park (outdoor environment), the learned re-identification model will not perform well because the change in appearance in the images caused by the changes in lighting and other environmental conditions deteriorate the performance of the re-identification model.
If the domain gap between the source and target domain is large, the person re-identification model needs to be updated by using the training images collected from cameras in the target domain. To determine whether or not a person re-identification model needs to be updated when the model is deployed to a target domain, a domain gap measure that measures the domain similarity between the source and target domain is required. Given a target domain, a domain gap measure may also be used for selecting a re-identification model from a set of learned re-identification models, which has the highest domain similarity to the target domain. The selected re-identification model is more robust to the domain shift problem than the other re-identification models and the selected re-identification model can be directly deployed to the target domain without any update.
One known method for measuring the domain gap, known as “maximum mean discrepancy (MMD)”, is to determine a distance between the arithmetic means of appearance descriptors from the source and target domain in a reproducing kernel Hilbert space. Maximum mean discrepancy is designed for dealing with the domain shift problem in image classification tasks. Maximum mean discrepancy cannot be directly used for a person re-identification task mainly because a person re-identification task involves images from two pairs of cameras, where one pair of cameras is from source domain and the other pair of cameras is from target domain.
SUMMARYIt is an object of the present invention to substantially overcome, or at least ameliorate, one or more disadvantages of existing arrangements.
Disclosed are arrangements relating to measuring the domain similarity between a source domain and a target domain by using images captured in the source and target domain.
According to one aspect of the present disclosure, there is provided a method of performing person re-identification for images captured by at least two camera pairs operating with different environmental factors, the method comprising:
-
- clustering descriptors representing characteristics of objects corresponding to a person in the images;
- determining a probability distribution of the clustered descriptors;
- determining a coupling map for the images based on the probability distribution;
- determining a cross-correlation between at least two of the coupling maps;
- determining a similarity of the images captured by the camera pairs according to the cross-correlation; and
- performing person re-identification for the different environmental factors using the descriptors, based on the determined similarity.
According to another aspect of the present disclosure, there is provided an apparatus for performing person re-identification for images captured by at least two camera pairs operating with different environmental factors, the apparatus comprising:
-
- means for clustering descriptors representing characteristics of objects corresponding to a person in the images;
- means for determining a probability distribution of the clustered descriptors;
- means for determining a coupling map for the images based on the probability distribution;
- means for determining a cross-correlation between at least two of the coupling maps;
- means for determining a similarity of the images captured by the camera pairs according to the cross-correlation; and
- means for performing person re-identification for the different environmental factors using the descriptors, based on the similarity.
According to still another aspect of the present disclosure, there is provided a system for performing person re-identification for images captured by at least two camera pairs operating with different environmental factors, the system comprising:
-
- a memory for storing data and a computer program;
- a processor coupled to the memory for executing the computer program, the computer program having instructions for:
- clustering descriptors representing characteristics of objects corresponding to a person in the images;
- determining a probability distribution of the clustered descriptors;
- determining a coupling map for the images based on the probability distribution;
- determining a cross-correlation between at least two of the coupling maps;
- determining a similarity of the images captured by the camera pairs according to the cross-correlation; and
- performing person re-identification for the different environmental factors using the descriptors, based on the similarity.
According to still another aspect of the present disclosure, there is provided a computer readable medium having stored on the medium a computer program for performing person re-identification for images captured by at least two camera pairs operating with different environmental factors, the program comprising:
-
- code for clustering descriptors representing characteristics of objects corresponding to a person in the images;
- code for determining a probability distribution of the clustered descriptors;
- code for determining a coupling map for the images based on the probability distribution;
- code for determining a cross-correlation between at least two of the coupling maps;
- code for determining a similarity of the images captured by the camera pairs according to the cross-correlation; and
- code for performing person re-identification for the different environmental factors using the descriptors, based on the similarity.
Other aspects are also disclosed.
One or more embodiments of the invention will now be described with reference to the following drawings, in which:
Where reference is made in any one or more of the accompanying drawings to steps and/or features, which have the same reference numerals, those steps and/or features have for the purposes of this description the same function(s) or operation(s), unless the contrary intention appears.
It is to be noted that the discussions contained in the “Background” section and the section above relating to prior art arrangements relate to discussions of documents or devices which may form public knowledge through their respective publication and/or use. Such discussions should not be interpreted as a representation by the present inventors or the patent applicant that such documents or devices in any way form part of the common general knowledge in the art.
An image, such as image 110 shown in
A “descriptor” or “feature” represents a derived value or set of derived values determined from the pixel values in an image region. One example of an appearance descriptor is a histogram of pixel colours and image gradients within predefined spatial cells of a rectified image. In one example, a feature is a histogram of colour values in the image region. In another example, a feature is an “edge” response value determined by determining an intensity gradient in the region. In yet another example, a feature is a filter response, such as a Gabor filter response, determined by the convolution of pixel values in the region with a filter kernel. Furthermore, a “feature map” assigns a feature value to each pixel in an image region. In one example, a feature map assigns an intensity value to each pixel in an image region. In another example, a feature map assigns a hue value to each pixel in an image region. In yet another example, a feature map assigns a Gabor filter response to each pixel in an image region. Finally, a “feature distribution” refers to the relative frequency of feature values in a feature map, normalized by the total number of feature values. In one arrangement, a feature distribution is a colour histogram (RGB, HSV etc.) as well as Histogram of gradients features. Another example of an appearance signature is a “bag-of-words” model of quantized keypoint descriptors.
A “bounding box” refers to a rectilinear image region enclosing an object in an image captured by a camera. Referring to
The term “foreground mask” refers to a binary image with non-zero values at pixel locations corresponding to an object of interest. A non-zero pixel location in a foreground mask is known as a “foreground pixel”. The term “background pixel” refers to those pixels in an image (or within the corresponding bounding box) that are not foreground pixels. The set of “background pixels” in a “foreground mask” is the “scene”. Referring to
As shown in
The present description provides a method and system for determining a domain gap measure (DGM). Domain gap measure measures the domain similarity between two sets of images without any label information. Each set contains images captured at different times by two cameras within the same domain or images captured by two different camera pairs from two different domains (e.g., one pair from training or source domain and the other from a target domain). If domain gap measure determines a low domain similarity value, the domain gap is large and the person re=identification model trained on the source domain images may need to be updated using new training images from the target domain where the model will be deployed. In other words, the domain gap measure may determine if a person re-identification model is easily deployable or may need more training images from the target domain to improve performance of the person re-identification model.
After the person re-identification model 190 is trained, the person re-identification model 190 is deployed to the target domain 170 to match images in the target domain dataset 181 captured from cameras 135 and 145, which correspond to two non-overlapping viewpoints 130 and 140, respectively. The cameras 135 and 145 are connected to the computer system 150. Before deploying the person re-identification model 190 to the target domain outdoor scene 170, there is a need to determine the domain gap between the source domain 160 and the target domain 170 to determine whether the person re-identification model 190 performs well in the target domain 170. Domain gap measure (DGM) Module 195 measures the domain gap between the source domain indoor scene 160 and the target domain outdoor scene 170 using images from source domain data 180 and target domain data 181. The appearance descriptors of the images are extracted using the person re-identification model 190. A distribution of appearance descriptors extracted from images captured by each individual camera is determined. For example, the distribution of appearance descriptor from the camera 115 is determined using the images captured by the camera 115. Consequently, two distributions are generated for the camera 115 and 125 in the source domain. Another two distributions are generated for the camera 135 and 145 in the target domain. A coupling map between the two distributions of appearance descriptors from the camera 115 and 125 in the source domain, known as “source domain coupling map”, is determined. In a similar manner, a coupling map between the two distributions of appearance descriptors from the camera 135 and 145 in the target domain, known as “target domain coupling map”, is also determined. Then a cross correlation between the source domain coupling map and the target domain coupling map is determined. The cross correlation between the source and target domain coupling maps may be used as a domain gap measure to measure the domain similarity between the source domain 160 and the target domain 170. In one arrangement, a threshold may be used to determine whether two domains are similar. If the similarity value determined by the domain gap measure is smaller than the threshold, the source domain 160 and the target domain 170 are considered as dissimilar. Consequently, the person re-identification model 190 may need to be updated using additional target domain data 181.
As seen in
The computer module 201 typically includes at least one processor unit 205, and a memory unit 206. For example, the memory unit 206 may have semiconductor random access memory (RAM) and semiconductor read only memory (ROM). The computer module 201 also includes an number of input/output (I/O) interfaces including: an audio-video interface 207 that couples to the video display 214, loudspeakers 217 and microphone 280; an I/O interface 213 that couples to the keyboard 202, mouse 203, scanner 226, camera 115 and optionally a joystick or other human interface device (not illustrated); and an interface 208 for the external modem 216 and printer 215. In some implementations, the modem 216 may be incorporated within the computer module 201, for example within the interface 208. The computer module 201 also has a local network interface 211, which permits coupling of the computer system 150 via a connection 223 to a local-area communications network 222, known as a Local Area Network (LAN). As illustrated in
The I/O interfaces 208 and 213 may afford either or both of serial and parallel connectivity, the former typically being implemented according to the Universal Serial Bus (USB) standards and having corresponding USB connectors (not illustrated). Storage devices 209 are provided and typically include a hard disk drive (HDD) 210. Other storage devices such as a floppy disk drive and a magnetic tape drive (not illustrated) may also be used. An optical disk drive 212 is typically provided to act as a non-volatile source of data. Portable memory devices, such optical disks (e.g., CD-ROM, DVD, Blu-ray Disc™), USB-RAM, portable, external hard drives, and floppy disks, for example, may be used as appropriate sources of data to the system 150.
The components 205 to 213 of the computer module 201 typically communicate via an interconnected bus 204 and in a manner that results in a conventional mode of operation of the computer system 150 known to those in the relevant art. For example, the processor 205 is coupled to the system bus 204 using a connection 218. Likewise, the memory 206 and optical disk drive 212 are coupled to the system bus 204 by connections 219. Examples of computers on which the described arrangements can be practised include IBM-PC's and compatibles, Sun Sparcstations, Apple Mac™ or a like computer systems.
Methods to be described may be implemented using the computer system 150 wherein the processes of
The software application programs 233 may be stored in a computer readable medium, including the storage devices described below, for example. The software is loaded into the computer system 150 from the computer readable medium, and then executed by the computer system 150. A computer readable medium having such software or computer program recorded on the computer readable medium is a computer program product. The use of the computer program product in the computer system 150 preferably effects an advantageous apparatus for implementing the described methods.
The software 233 is typically stored in the HDD 210 or the memory 206. The software is loaded into the computer system 150 from a computer readable medium, and executed by the computer system 150. Thus, for example, the software 233 may be stored on an optically readable disk storage medium (e.g., CD-ROM) 225 that is read by the optical disk drive 212. A computer readable medium having such software or computer program recorded on it is a computer program product. The use of the computer program product in the computer system 150 preferably effects an apparatus for practicing the described arrangements.
In some instances, the application programs 233 may be supplied to the user encoded on one or more CD-ROMs 225 and read via the corresponding drive 212, or alternatively may be read by the user from the networks 220 or 222. Still further, the software can also be loaded into the computer system 150 from other computer readable media. Computer readable storage media refers to any non-transitory tangible storage medium that provides recorded instructions and/or data to the computer system 150 for execution and/or processing. Examples of such storage media include floppy disks, magnetic tape, CD-ROM, DVD, Blu-ray™ Disc, a hard disk drive, a ROM or integrated circuit, USB memory, a magneto-optical disk, or a computer readable card such as a PCMCIA card and the like, whether or not such devices are internal or external of the computer module 201. Examples of transitory or non-tangible computer readable transmission media that may also participate in the provision of software, application programs, instructions and/or data to the computer module 201 include radio or infra-red transmission channels as well as a network connection to another computer or networked device, and the Internet or Intranets including e-mail transmissions and information recorded on Websites and the like.
The second part of the application programs 233 and the corresponding code modules mentioned above may be executed to implement one or more graphical user interfaces (GUIs) to be rendered or otherwise represented upon the display 214. Through manipulation of typically the keyboard 202 and the mouse 203, a user of the computer system 150 and the application may manipulate the interface in a functionally adaptable manner to provide controlling commands and/or input to the applications associated with the GUI(s). Other forms of functionally adaptable user interfaces may also be implemented, such as an audio interface utilizing speech prompts output via the loudspeakers 217 and user voice commands input via the microphone 280.
When the computer module 201 is initially powered up, a power-on self-test (POST) program 250 executes. The POST program 250 is typically stored in a ROM 249 of the semiconductor memory 206 of
The operating system 253 manages the memory 234 (209, 206) to ensure that each process or application running on the computer module 201 has sufficient memory in which to execute without colliding with memory allocated to another process. Furthermore, the different types of memory available in the system 150 of
As shown in
The application program 233 includes a sequence of instructions 231 that may include conditional branch and loop instructions. The program 233 may also include data 232 which is used in execution of the program 233. The instructions 231 and the data 232 are stored in memory locations 228, 229, 230 and 235, 236, 237, respectively. Depending upon the relative size of the instructions 231 and the memory locations 228-230, a particular instruction may be stored in a single memory location as depicted by the instruction shown in the memory location 230. Alternately, an instruction may be segmented into a number of parts each of which is stored in a separate memory location, as depicted by the instruction segments shown in the memory locations 228 and 229.
In general, the processor 205 is given a set of instructions which are executed therein. The processor 205 waits for a subsequent input, to which the processor 205 reacts to by executing another set of instructions. Each input may be provided from one or more of a number of sources, including data generated by one or more of the input devices 202, 203, data received from an external source across one of the networks 220, 202, data retrieved from one of the storage devices 206, 209 or data retrieved from a storage medium 225 inserted into the corresponding reader 212, all depicted in
The described arrangements use input variables 254, which are stored in the memory 234 in corresponding memory locations 255, 256, 257. The described arrangements produce output variables 261, which are stored in the memory 234 in corresponding memory locations 262, 263, 264. Intermediate variables 258 may be stored in memory locations 259, 260, 266 and 267.
Referring to the processor 205 of
-
- a fetch operation, which fetches or reads an instruction 231 from a memory location 228, 229, 230;
- a decode operation in which the control unit 239 determines which instruction has been fetched; and
- an execute operation in which the control unit 239 and/or the ALU 240 execute the instruction.
Thereafter, a further fetch, decode, and execute cycle for the next instruction may be executed. Similarly, a store cycle may be performed by which the control unit 239 stores or writes a value to a memory location 232.
Each step or sub-process in the processes of
The described methods may alternatively be implemented in dedicated hardware such as one or more integrated circuits performing the functions or sub functions. Such dedicated hardware may include graphic processors, digital signal processors, or one or more microprocessors and associated memories, and may reside on platforms such as video cameras.
The method 300 starts at receiving step 310, where pairs of training images are received by the system 150 from a source domain, under execution of the processor 205, and may be stored in the memory 206. Referring to
The method 300, after determining the person re-identification model 190 at step 320 using either supervised or unsupervised learning methods, then proceeds to applying step 330. At step 330, the person re-identification model 190 is deployed to a target domain 170, under execution of the processor 205, based on the domain gap measure. The details of the deployment of the model 190 is described in detail in the flow diagram 400 of
In the arrangement of the method 300, a single person re-identification model 190 is determined based on source domain data 180. In another arrangement, numerous person re-identification models may be trained by using training datasets collected from different source domains. For example, referring to
The method 400 starts at extracting step 410, where appearance descriptors are extracted from images 180 collected from the camera 115 and 125 in the source domain 160 using an appearance descriptor extractor in the person re-identification model 190 determined at step 320 of method 300. The appearance descriptors extracted from images 180 may be stored in the memory 206 under execution of the processor 205. In one arrangement, WHOS appearance descriptor extractor is used to determine appearance descriptors from source domain 160. In another arrangement, the appearance descriptor extractor learned from source domain data is used to determine the appearance descriptors from source domain 160.
Then the method 400 proceeds to extracting step 420. At step 420, appearance descriptors are extracted from images 181 collected from the camera 135 and 145 in the target domain 170 using an appearance descriptor extractor in the person re-identification model 190 determined at step 320 of the method 300. The appearance descriptors extracted from images 181 may be stored in the memory 206 under execution of the processor 205. The appearance descriptors determined at step 420 use the same algorithm as step 410.
Then the method 400 proceeds to determining step 430 where a domain gap measure is determined between the source domain 160 and the target domain 170 according to the appearance descriptors determined at step 410 and step 420 using the person re-identification model 190. The domain gap measurement is determined in accordance with the method 500 which will be described in detail below with reference to
After step 430, the method 400 proceeds to determining step 440 where the method 400 determines whether the person re-identification model 190 requires to be updated or an appropriate person re-identification model requires to be selected based on the domain gap measurement determined at step 430. Step 440 will be further described below with reference to
The method 500 starts at determining step 530, where a coupling map C1 is determined, under execution of the processor 205, using appearance descriptors of images from source domain data 180. The coupling map is determined at step 530 in accordance with the method 700 which will be described in detail below with reference to
where C1(i,j) and C2(i,j) represent the values at the entry (i,j) of the coupling map C1 and C2, respectively. The number of rows or columns of each coupling map are denoted by M and N. The standard deviations of the coupling map C1(i,j) and C2(i,j) are represented by σ(C1) and σ(C2), respectively.
Then the method 500 proceeds to determining step 560, where a domain similarity is determined, under execution of the processor 205, based on the cross correlation value determined at step 550. In one arrangement, the correlation score is used as a domain similarity measurement. The correlation value is a positive number between zero (0) and one (1). A small correlation value (e.g., 0.06) indicates that the domain similarity between the source domain 160 and the target domain 170 is low. Therefore, a person re-identification model 190, which is learned on source domain data 180, may not perform well on the target domain data 181. In another arrangement, the correlation score is used to select a person re-identification model from a set of person re-identification models shown in
The method 600 starts at comparing step 610, where if the target domain 170 is similar to the source domain 160, then the method 600 concludes. Otherwise, the method 600 proceeds to step 620. The comparison is made at step 610 based on the domain similarity value determined in accordance with the method 500 described in
At comparing step 610, the domain similarity value is compared against a pre-determined threshold value, under execution of the processor 205. If the domain similarity value is greater than the pre-determined threshold, then the method 600 concludes. Otherwise, if the domain similarity value is less than the predetermined threshold value, then the method 600 proceeds to step 620. The threshold used at step 620 may be selected to be a numerical value between zero (0) and one (1) (e.g., 0.3). The threshold may also be determined based on the domain similarity between two source domains that have similar characteristics. For example, if the domain similarity value between two source domains is 0.8, then the threshold may be selected to be a fraction of the domain similarity value (e.g., 20%). In one arrangement, the data for the two source domains may be determined by dividing the source domain data 180 into two subsets. The two subset are used as two source domain datasets. The method 500 is then performed on the two subsets to compute the domain similarity between the two subsets.
At determining step 620, the method 600 determines if the person re-identification model 190 can be updated using target domain data 181 to improve performance of the model 190. In one arrangement, determining whether a person re-identification model 190 can be updated is based on the availability of target domain data 181, the amount of target domain data 181 needed and the time needed to update the person re-identification model 191. If the target domain data 181 is not available or the amount of target domain data is not sufficient, then the method 600 proceeds to determining step 650. Otherwise, if the target domain data 181 is available, then the method 600 proceeds to collecting step 630.
At step 650, the domain gap measure is determined for all available person re-identification models 1210 and the person re-identification models are ranked in a decreasing order based on the domain similarity score determined in accordance with the method 500. The ranked person re-identification models may be stored in a list in the memory 206 under execution of the processor 205.
From the ranked list of person re-identification models, a top ranked person re-identification model 1230 is selected provided that the domain similarity score of the top ranked person re-identification model 1230 is greater than the predetermined threshold. In one arrangement, several person re-identification models 1210 are trained using source domain data with different characteristics. In one arrangement the characteristics may be based on environmental factors such as sunny day, cloudy day, day time, night time, rainy conditions etc. In another arrangement, the characteristics may be based on location such as indoors, outdoors, type of locations like shopping malls, airports etc. In one arrangement, a large set of such person re-identification models may be available for being deployed to a target domain 170. By matching the environmental factors or other factors associated with the source and target domains, a subset of person re-identification models 1210 trained on different source domain data may be selected and evaluated using the domain gap measure.
At collecting step 630, labelled or unlabelled training images are collected, under execution of the processor 205, depending on whether the person-re-identification model 190 is updated using a supervised learning method or an unsupervised learning method, respectively. The labelled or unlabelled training images are collected at step 630 from the target domain data 181. Then the method 600 proceeds to step 640. At step 640, the person re-identification model 190 is updated using the training images collected at step 630. In one arrangement, if the person re-identification model is trained using a deep convolutional neural network, then the labelled training images from the target domain are used to refine the deep convolutional neural network. In another arrangement, if the person re-identification model is a dictionary learnt using “dictionary learning”, then unlabelled training images from the target domain may be used to update the dictionary. After step 640, the method 600 concludes.
Referring back to
The method 700 of
The method 700 then proceeds to clustering step 720. At step 720, the appearance descriptors input at step 710 are clustered using any suitable clustering method such as K-Means. Referring to
Referring back to
After step 720, the method 700 proceeds to determining step 730 where a probability distribution of the features 930 and 940 are determined under execution of the processor 205. The probability distribution determined at step 730 directly follows from the output of K-Means clustering of feature descriptors as described in the examples 935, 945 or 1310. Referring to
After step 730, the method 700 proceeds to determining step 740 where a coupling map of the two distributions are determined under execution of the processor 205. The determination of a coupling map at step 730 will be described with respect to
The arrangements described are applicable to the computer and data processing industries and particularly for image processing.
The foregoing describes only some embodiments of the present invention, and modifications and/or changes can be made thereto without departing from the scope and spirit of the invention, the embodiments being illustrative and not restrictive.
Claims
1. A method of performing person re-identification for images captured by at least two camera pairs operating with different environmental factors, the method comprising:
- clustering descriptors representing characteristics of objects corresponding to a person in the images;
- determining a probability distribution of the clustered descriptors;
- determining a coupling map for the images based on the probability distribution;
- determining a cross-correlation between at least two of the coupling maps;
- determining a similarity of the images captured by the camera pairs according to the cross-correlation; and
- performing person re-identification for the different environmental factors using the descriptors, based on the determined similarity.
2. The method according to claim 1, further comprising determining common cluster centres.
3. The method according to claim 1, wherein the descriptors are clustered using a K-Means clustering method.
4. The method according to claim 1, wherein clusters are determined in individual feature spaces.
5. The method according to claim 1, wherein clusters are determined in a common feature space.
6. The method according to claim 1, wherein the probability distribution is a histogram.
7. The method according to claim 1, wherein the coupling map is determined for at least two distributions of the descriptors.
8. The method according to claim 1, wherein the cross-correlation is determined based on a threshold.
9. The method according to claim 1, further comprising training a distance metric using the descriptors.
10. The method according to claim 9, wherein the distance metric is trained using a supervised learning method.
11. The method according to claim 9, wherein the distance metric is trained using an unsupervised learning method.
12. An apparatus for performing person re-identification for images captured by at least two camera pairs operating with different environmental factors, the apparatus comprising:
- means for clustering descriptors representing characteristics of objects corresponding to a person in the images;
- means for determining a probability distribution of the clustered descriptors;
- means for determining a coupling map for the images based on the probability distribution;
- means for determining a cross-correlation between at least two of the coupling maps;
- means for determining a similarity of the images captured by the camera pairs according to the cross-correlation; and
- means for performing person re-identification for the different environmental factors using the descriptors, based on the similarity.
13. A system for performing person re-identification for images captured by at least two camera pairs operating with different environmental factors, the system comprising:
- a memory for storing data and a computer program;
- a processor coupled to the memory for executing the computer program, the computer program having instructions for:
- clustering descriptors representing characteristics of objects corresponding to a person in the images;
- determining a probability distribution of the clustered descriptors;
- determining a coupling map for the images based on the probability distribution;
- determining a cross-correlation between at least two of the coupling maps;
- determining a similarity of the images captured by the camera pairs according to the cross-correlation; and
- performing person re-identification for the different environmental factors using the descriptors, based on the similarity.
14. A computer readable medium having stored on the medium a computer program for performing person re-identification for images captured by at least two camera pairs operating with different environmental factors, the program comprising:
- code for clustering descriptors representing characteristics of objects corresponding to a person in the images;
- code for determining a probability distribution of the clustered descriptors;
- code for determining a coupling map for the images based on the probability distribution;
- code for determining a cross-correlation between at least two of the coupling maps;
- code for determining a similarity of the images captured by the camera pairs according to the cross-correlation; and
- code for performing person re-identification for the different environmental factors using the descriptors, based on the similarity.