SYSTEMS AND METHODS FOR LARGE SCALE, HIGH-DIMENSIONAL SEARCHES
Methods and systems for fast, large scale, high-dimensional searches are described. In some embodiments, a method comprises transforming components of a high-dimensional image descriptor into transformed components in a transform domain, allocating one or more bits available within a bit budget to a given transformed component within a first subset of transformed components as a function of a variance of the given transformed component, independently quantizing each transformed component within the first subset of transformed components, generating a compact representation of the high-dimensional image descriptor based, at least in part, on the independently quantized components, and evaluating a nearest neighbor search operation based, at least in part, on the compact representation of the high-dimensional image descriptor.
1. Field of the Invention
This specification relates to computer technologies, and, more particularly, to systems and methods for performing fast, large scale, high-dimensional searches.
2. Description of the Related Art
Finding nearby points among a large set in high dimensions is at the heart of many important applications. These applications include, for example, nearest neighbor classification, similarity search, and feature matching, to name a few.
For instance, consider an image matching application where an input image is compared against a large database of stored images in order to find a match. Each image may be represented by “descriptors,” such as scale-invariant feature transform (“SIFT”) descriptors, Speeded Up Robust Features (“SURF”) descriptors, global image feature (“GIST”) descriptors, or the like. In a typical case, each image may have tens or hundreds of descriptors, and each descriptor may in turn contain hundreds or thousands of dimensions or features. In this type of environment, finding a match invariably involves performing one or more large scale, high-dimensional searches.
Despite prolonged study, the problem of efficiently finding nearby points in high dimensions remains open. This long-standing difficulty in finding an exact nearest neighbor in high dimensions has led to the use of approximate algorithms, as well as domain-specific approaches. Recently, image and video retrieval have been the subject of numerous practical applications. For video retrieval tasks, for example, the number of points to search is usually much larger than can be held in a computer system's memory. This has led to the development of certain compressed representations, each being typically custom-designed for a specific application.
Mathematically, search problems may be generically posed as follows. First, consider a finite set of points X⊂n, |X|=N|, drawn from the probability distribution p(x) defined over n|, where n| refers to an n-dimensional space with real coordinates. Point proximity may then be determined by a metric d(x, x′), where x is a query point and x′ is a point in a database. In this context, two fundamental proximity queries are known as Radial and Nearest-k. A Radial search returns a set of points within a given radius of a query, whereas a Nearest-k search classifies an object based on closest training examples within n|.
One approach to the search problem involves hash-based retrieval. Hash-based retrieval may include performing a quantization operation followed by a look-up operation based on the quantized representation. Quantization aims to identify a unique partition containing x within a finite partitioning of n|, whereas an index look-up attempts to return all x′ εX contained within the given partition. This technique may be used for near-duplicate search, where one can rely on a hash collision even when the representation includes of a large number of bits. However, for proximity searches, such as Radial and Nearest-k, hash-based retrieval becomes ineffective as the sparseness of the code space increases.
SUMMARYThe present specification is related to computer technologies. Certain embodiments of methods and systems disclosed herein may explore the relationship between nearest neighbor techniques and that of signal representation and quantization to enable fast, large scale, high-dimensional searches. These types of searches are often the fundamental components of various applications including, for example, object recognition, 3D modeling, mapping, navigation, gesture recognition, etc.
An illustrative, non-limiting method may provide efficient techniques that employ transform coding, non-uniform data-driven bit allocation, and distortion-reducing or minimizing non-uniform product quantization to create a compact representation for a high-dimensional image descriptor. This compact representation may then be used in a nearest neighbor search operation as part of, for example, a k-nearest neighbor image classification process, an image retrieval process, and/or a local image feature matching process.
In some embodiments, one or more look-up tables may be constructed to speed up a nearest neighbor search operation. These look-up tables may be, for example, one-dimensional look-up tables created a query time. Additionally or alternatively, the look-up tables may be two-dimensional look-up tables created prior to a query. The former may be particularly useful in the context of k-nearest neighbor image classification and retrieval operations, whereas the latter may find applicability in a local image feature matching process or the like.
The effectiveness of the systems and methods disclosed herein is demonstrated in a range of applications, including large scale retrieval, scene classification, feature matching, and image similarity using a non-Euclidean metric. Through experiments on standard data sets, it is shown that these systems and methods are competitive with current state-of-the-art methods, and in fact provide greater speed and effectiveness.
While this specification provides several embodiments and illustrative drawings, a person of ordinary skill in the art will recognize that the present specification is not limited only to the embodiments or drawings described. It should be understood that the drawings and detailed description are not intended to limit the specification to the particular form disclosed, but, on the contrary, the intention is to cover all modifications, equivalents and alternatives falling within the spirit and scope of the claims. The headings used herein are for organizational purposes only and are not meant to be used to limit the scope of the description. As used herein, the word “may” is meant to convey a permissive sense (i.e., meaning “having the potential to”), rather than a mandatory sense (i.e., meaning “must”). Similarly, the words “include,” “including,” and “includes” mean “including, but not limited to.”
DETAILED DESCRIPTION OF EMBODIMENTS IntroductionThis specification first presents an illustrative computer system or device, as well as an illustrative image analysis module that may implement certain embodiments of methods disclosed herein. The specification then provides several techniques for: (1) generating a compact representation of a high-dimensional descriptor; (2) estimating distances among components of the high-dimensional descriptor; and (3) performing fast, high-dimensional searches based at least in part on those estimated distances. The final portion of the specification discusses various applications and experiments where the systems and methods described herein have been employed.
Some of the embodiments disclosed herein are in the field of digital, image processing and computer vision, and therefore are suitable for use in image searches. It should be understood, however, that the techniques described herein are not limited to use with digital image data. Where suitable, these techniques may be employed in any application where other types of high-dimensional searches may be performed, such as, for example and without limitation, medicine (e.g., microarray DNA analysis), Internet portals (e.g., searching among millions or billions of records), financial data (e.g., information about stock exchange data), etc.
In the following detailed description, numerous specific details are set forth to provide a thorough understanding of claimed subject matter. However, it will be understood by a person of ordinary skill in the art in light of this specification that claimed subject matter may be practiced without necessarily being limited to these specific details. In some instances, methods, apparatuses or systems that would be known by a person of ordinary skill in the art have not been described in detail so as not to obscure claimed subject matter.
Some portions of the detailed description which follow are presented in terms of algorithms or symbolic representations of operations on binary digital signals stored within a memory of a specific apparatus or special purpose computing device or platform. In the context of this particular specification, the term specific apparatus or the like includes a general purpose computer once it is programmed to perform particular functions pursuant to instructions from program software. Algorithmic descriptions or symbolic representations are examples of techniques used by those of ordinary skill in the signal processing or related arts to convey the substance of their work to others skilled in the art. An algorithm is here, and is generally, considered to be a self-consistent sequence of operations or similar signal processing leading to a desired result. In this context, operations or processing involve physical manipulation of physical quantities. Typically, although not necessarily, such quantities may take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared or otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to such signals as bits, data, values, elements, symbols, characters, terms, numbers, numerals or the like. It should be understood, however, that all of these or similar terms are to be associated with appropriate physical quantities and are merely convenient labels. Unless specifically stated otherwise, as apparent from the following discussion, it is appreciated that throughout this specification discussions utilizing terms such as “processing,” “computing,” “calculating,” “determining” or the like refer to actions or processes of a specific apparatus, such as a special purpose computer or a similar special purpose electronic computing device. In the context of this specification, therefore, a special purpose computer or a similar special purpose electronic computing device is capable of manipulating or transforming signals, typically represented as physical electronic or magnetic quantities within memories, registers, or other information storage devices, transmission devices, or display devices of the special purpose computer or similar special purpose electronic computing device.
A Computer System or DeviceIn one embodiment, a specialized graphics card or other graphics component 156 may be coupled to the processor(s) 110. The graphics component 156 may include a graphics processing unit (GPU) 170, which in some embodiments may be used to perform at least a portion of the techniques described below. Additionally, the computer system 100 may include one or more imaging devices 152. The one or more imaging devices 152 may include various types of raster-based imaging devices such as monitors and printers. In one embodiment, one or more display devices 152 may be coupled to the graphics component 156 for display of data provided by the graphics component 156.
In one embodiment, program instructions 140 that may be executable by the processor(s) 110 to implement aspects of the techniques described herein may be partly or fully resident within the memory 120 at the computer system 100 at any point in time. The memory 120 may be implemented using any appropriate medium such as any of various types of ROM or RAM (e.g., DRAM, SDRAM, RDRAM, SRAM, etc.), or combinations thereof. The program instructions may also be stored on a storage device 160 accessible from the processor(s) 110. Any of a variety of storage devices 160 may be used to store the program instructions 140 in different embodiments, including any desired type of persistent and/or volatile storage devices, such as individual disks, disk arrays, optical devices (e.g., CD-ROMs, CD-RW drives, DVD-ROMs, DVD-RW drives), flash memory devices, various types of RAM, holographic storage, etc. The storage 160 may be coupled to the processor(s) 110 through one or more storage or I/O interfaces. In some embodiments, the program instructions 140 may be provided to the computer system 100 via any suitable computer-readable storage medium including the memory 120 and storage devices 160 described above.
The computer system 100 may also include one or more additional I/O interfaces, such as interfaces for one or more user input devices 150. In addition, the computer system 100 may include one or more network interfaces 154 providing access to a network. It should be noted that one or more components of the computer system 100 may be located remotely and accessed via the network. The program instructions may be implemented in various embodiments using any desired programming language, scripting language, or combination of programming languages and/or scripting languages, e.g., C, C++, C#, Java™, Perl, etc. The computer system 100 may also include numerous elements not shown in
Image analysis module 200 may be implemented as or within a stand-alone application or as a module of or plug-in for an image processing application. Examples of types of applications in which embodiments of module 200 may be implemented may include, but are not limited to, image (including video) analysis, characterization, search, processing, and/or presentation applications, as well as applications in security or defense, educational, scientific, medical, publishing, digital photography, digital films, games, animation, marketing, and/or other applications in which digital image analysis, characterization, representation, or presentation may be performed. Specific examples of applications in which embodiments may be implemented include, but are not limited to, Adobe® Photoshop® and Adobe® Illustrator®. Module 200 may also be used to display, manipulate, modify, classify, and/or store images, for example to a memory medium such as a storage device or storage medium.
Compact Representation of High-Dimensional Image DescriptorsReferring to
Particularly, method 300 may receive one or more high-dimensional image descriptors at 302. At 304, method 300 may transform components of an image descriptor into transformed components (“transform coding”). At 306, method 300 may allocate bits to a subset of the transformed components (“bit allocation”). At 308, method 300 may quantize the subset of transformed components (“quantization”). Method 300 may then concatenate two or more of the quantized components into a word at 310 to generate a compact representation of the received high-dimensional image descriptor(s).
Still referring to
As noted above, operation 302 of method 300 may receive an image descriptor. In some embodiments, each image descriptor may denote a single image (one-to-one representation). For example, descriptors that are suitable for one-to-one representation include global image feature/spatial envelope (“GIST”) descriptors and the like. In other embodiments, each image (or portion thereof) may be described by multiple local descriptors (one-to-many representation). In those embodiments, each image may have tens or hundreds of descriptors, and each descriptor may in turn contain hundreds or thousands of dimensions or features. Examples of high-dimensional image descriptors suitable for one-to-many representation include the Scale-Invariant Feature Transform (“SIFT”) descriptors, the Speeded Up Robust Features (“SURF”) descriptors, etc.
Each of the remaining operations shown in
In some embodiments, a quantizer operation—such as the one depicted as operation 308 of FIG. 3—may depend on an assumption that the components of a high-dimensional image descriptor x (i.e., a query point in vector format) are statistically independent so that each vector component may be quantized independently. This assumption may be addressed at 304 of method 300 through transform coding. Transform coding may seek a (typically linear) transformation to reduce statistical dependence among the components.
In a non-limiting example, a transform operation may be achieved through principal component analysis (PCA), although a person of ordinary skill in the art will recognize in light of this specification that other suitable alternatives exist. Specifically, operation 304 of may compute eigenvectors and eigenvalues of a training sample covariance, and mean value(s) may be removed. A matrix of eigenvectors is a unitary transformation U that may be applied to some or all points prior to quantization. Given that the statistical dependence among the components is reduced through PCA (or other) transformation, the transformed components may be quantized independently. As such, a product quantizer may be designed for points y=Ux.
As such, the quantizer design problem may be reduced to a set of n independent 1D problems—i.e., each qi may be designed independently to minimize the expected distortion Di=E[di(xi, ci(qi(xi)))]. Because D Σi Di, minimizing each Di independently also minimizes D.
As shown in
As an example,
In cases where operation 304 of
Referring back to
In general, a minimum distortion criterion may be sufficient to design a product quantizer if the number of distinct quantization levels per component is known. Determining the number of levels per component is referred to herein as bit allocation. In some embodiments, bit allocation involves reducing or minimizing:
such that Σib
An exact solution of the distortion equation for general distributions may involve a computationally prohibitive numerical search. In some embodiments, however, it may be assumed that each component is identically distributed after normalizing their variance, and that the per-component distortion functions are identical. In this case, optimal bit allocation may be achieved when:
bi˜log2σi
where σi is the standard deviation of the i-th component. Therefore, σi may be estimated from the training data to allocate bits to each component proportionally.
In some embodiments, it may be preferable that each bi be integer-valued so that the components qi(xi) may be concatenated to encode q(x) as a contiguous bit vector. This may ensure that an overall bit budget is met, while proportionally allocating an integer number of bits to each component. For example, one suitable sequential distribution procedure is shown below:
After the bit allocation operation, it may be that some components are allocated no bits at all, thus resulting in a dimensionality reduction since those particular components may be omitted from z (i.e., the set of integer index values corresponding to quantizer levels as discussed above).
As a non-limiting example, consider a 128D SIFT descriptor compressed to 64 bits.
It may be seen from
In summary, operation 306 may provide a principled method to select a subset of dimensions, and simultaneously allocate bits to the remaining dimensions, given a fixed overall bit budget, while minimizing D.
Component Specific, Non-Uniform QuantizationStill referring to
Q(z)={x:q(x)=z}|,
for z εZ|, and the codebook values associated with each z, c(z) εn|.
The quality of a given quantizer may be measured in terms of its average distortion,
D=E[d(x,c(q(x)))]|,
where the distortion function d can take on a variety of forms. For retrieval, for example, an appropriate distortion function to be minimized may be the metric d(x, x′). In fact, application of the triangle inequality yields:
E[|d(x,x′)−d(x,c(q(x′))|]≦D|,
Therefore, D may be seen as an upper bound on the expected error in estimating inter-point distances when one of the two points is approximated by its quantized codebook value. Consequently, a quantizer that minimizes D for a fixed m may be effective from the standpoint of near neighbor search.
A quantizer that minimizes D subject to the underlying distribution p(x) may be characterized by the following two properties:
1. Q(z)={x:d(x, c(z))≦d(x, c(z′)), ∀z′εZ}, and
2. c(z)=arg minx′ Ex[d(x, x′)|xεQ(z)].
For instance, in certain embodiments, the Lloyd-Max algorithm may be used to obtain a one-dimensional minimum distortion quantizer. This particular algorithm is described, for example, at “Vector quantization and signal compression,” A. Gersho and R. Gray, Kluwer, 1991 and “Quantization,” R. Gray and D. Neuhoff, IEEE Trans. on Inf. Th., 44(6):2325-2383, 1998. Moreover, in some embodiments, a number of quantization levels allocated to a given component may be a function of a statistic of the given component as determined from a training sample.
Distance EstimationIn some embodiments, systems and methods described herein may perform an estimation operation that estimates distances among components of high-dimensional image descriptors described in the preceding section. These estimated distances may enable fast, high-dimensional queries, as will be described in a later section.
Generally, both Radial and Nearest-k queries may involve an estimation of d(x, x′) for each retrieved point x′εX|. In practice, however, it may often be too expensive to physically retrieve the points or to evaluate the exact distances. Thus, in some embodiments, retrieved points may be ranked based at least in part on the distances from the query point to the centroids for each retrieved point.
The centroid for a particular quantization cell may be constructed by inverting the projection as depicted in
dIA(x,x′)=d(x,UTc(q(Ux′))|
Meanwhile, the distance between centroids, measured in the input space, may be denoted as:
dIS(x,x′)=d(UTc(q(Ux))·UTc(q(Ux′)))|
Because dIS may include quantization noise for both points rather than just one, it may be a poorer estimate of d than dIA. However, dIS may have the advantage over dIA of being static for a given quantizer, independent of the query, and consequently can be pre-computed.
For a large number of bits, it may be impractical to enumerate and store pairwise distances between all centroids. However, this problem may be circumvented by computing distances in the transform domain instead of the input domain. Therefore, in some embodiments, dTA and dTS may be calculated as follows:
dTA(x,x′)=d(Ux,c(q(Ux′)))|
and
dTS(x,x′)=d(c(q(Ux)),c(q(Ux′)))|.
Referring back to
At 314, method 300 may perform a multi-dimensional search using the pre-calculated look-up tables to compute distances dTA and/or dTS as discussed in preceding sections. In some embodiments, evaluation of dTA may be very fast using 1D look-up tables constructed at query time. This may be appropriate in a large-scale retrieval setting where the cost of look-up table construction is small compared to the cost of search. Meanwhile, evaluation of dTS may be implemented using static (query independent) 2D lookup tables, and therefore may be more practical in an image matching setting.
In some embodiments, distance evaluation for a single point at operation 314 of method 300 may involve summing look-up table values for the given quantized point, and appending the point index to a row in a pre-allocated 2D output buffer indexed by the quantized distance value. The number of rows in the output buffer may determine the maximum search radius, and the number of columns may limit the total number of points that are kept for a given quantized distance. After passing over the entire data set, the output buffer may be scanned to extract the closest k indices.
As a person of ordinary skill in the art will recognize in light of this specification, a variety of other techniques exists for asymptotically sub-linear search using hierarchical structures of one sort or another. However, an advantage of linear search is that it is ideally suited to modern system architectures optimized for high memory locality and streaming data.
ApplicationsThis portion of the specification illustrates the performance of the systems and methods disclosed herein in a wide range of applications. In each experiment discussed below, the training and encoding algorithms are identical, parameterized only by the desired number of bits and the applicable metric. The method of search may vary depending on the application.
Large-Scale RetrievalIn some embodiments, the systems and methods disclosed herein may be used in large-scale retrieval applications. An illustrative large-scale retrieval experiment was performed employing the French National Institute for Research in Computer Science and Control (INRIA) Holidays dataset comprising 128D SIFT descriptors divided among a training set, search set, and query set. The training set was separately collected from a random sampling of images obtained on the Internet. The search set contains 1 million points, and the query set contains 10 thousand. Descriptor similarity is the Euclidean metric.
The systems and methods disclosed herein may be relatively simple and efficient to train and to encode high-dimensional image descriptors for large-scale retrieval applications. Specifically, the computational complexity involved in training and encoding grows slowly as the number of bits is increased, unlike methods in which computational complexity grows exponentially with respect to the number of bits (unless the descriptor is further decomposed into smaller units, which may be less advantageous with respect to the structure of the data).
Another characteristic of the systems and methods disclosed herein is that linear search is faster. And, as shown in
k-Nearest Neighbor Classification
In some embodiments, the systems and methods disclosed herein may be used in k-nearest neighbor (k-NN) classification applications. For example, some embodiments may provide nearest neighbor search techniques that remain practical as the number of object or image categories increases to thousands or even tens of thousands. In one experiment, the Massachusetts Institute of Technology (MIT) scene category dataset was evaluated. The dataset includes 2,688 images distributed among 8 scene categories. The training data has 100 randomly selected images from each category. The remaining images constitute the test set. Nearest neighbor search is based on the Euclidean metric applied to the 960D GIST descriptor computed for each image.
The results shown in
In some embodiments, the systems and methods disclosed herein may be used in local feature matching classification applications. Local feature matching typically refers to the process of forming the correspondence between two images using local feature descriptors, for instance using a SIFT algorithm to identify candidate corresponding feature point pairs, followed by Random Sample Consensus (“RANSAC”) to determine a geometrically consistent subset of the candidate pairs, which are identified as inliers. As noted above, the systems and methods described herein may provide compression of high-dimensional image descriptors without loss of expressiveness. Accordingly, in some embodiments, these operations may accelerate the local feature matching process and may be especially suited for bandwidth-limited environments.
In one experiment, panorama image sets were collected and registered to obtain a ground truth homography between each overlapping image pair. The images were of varying resolution and subject matter, including natural and man-made settings. In total, the test set included 891 registered image pairs. Feature points were obtained from the images using a Difference of Gaussians (“DoG”) detector. Each feature point was represented using a standard 128D SIFT descriptor.
The experiment encoded the SIFT descriptors at varying bit rates and measured the effectiveness of the matching process using the compressed representation in comparison to using the uncompressed representation. The methodology involved considering each pair of images as in the role of “source” and “target” for matching. For each source/target image pair, the experiment defined the “true inliers” as the set of feature pairs (fs, ft) such that fs is the closest source feature point to the target feature point ft after it is mapped to the source image under the known homography, and such that the distance between the two feature points in the source image is less than a fixed radius (e.g., 5 pixels). The experiment then used a distance ratio criterion to identify candidate matches based on the exact descriptor values and the Euclidean metric. The distance ratio criterion was the ratio of the distance in descriptor space of the closest descriptor to the second closest be less than a given fixed threshold (e.g., 0.8). The fraction of true inlier pairs that are in the matched set is the “inlier ratio,” and it is related to the likelihood that the images can be registered using RANSAC. The experiment then compressed the descriptors using a transform coder that was trained on a disjoint set of images and applied the same distance ratio criterion based on the dTS distance.
(Quantized distances tend to underestimate the exact distance.) Nevertheless, it may be concluded based on these results that the quantized descriptor of 80 bits may be sufficient for local feature matching in panoramas in some applications.
As discussed above, evaluation of dTS may be extremely fast using static 2D lookup tables. Also, compression obtained by use of the quantized descriptor may significantly improve performance in circumstances where the descriptors must be transmitted over a communications network.
Spatial Pyramid Bag-of-Words RetrievalIn some embodiments, the systems and methods disclosed herein may be used in spatial pyramid bag-of-words retrieval applications. The spatial pyramid bag-of-words scene representation has been shown to be effective for scene category classification, and scene similarity retrieval. However, conventional approaches are problematic to use in a large scale because the descriptors have high dimension (typically thousands of components), but are typically not sufficiently sparse for sparse methods, such as min-hash or inverted files, to be effective.
In one experiment, spatial pyramid bag of words descriptors were computed for each of the images in the MIT Indoor Scene Category dataset. Specifically, for each image, the experiment collected dense SIFT features in a grid pattern, quantized the feature descriptors into a vocabulary of 200 visual words, and then formed a three level spatial pyramid of histograms, resulting in a descriptor of 4,200 dimensions. (The vocabulary was learnt on a disjoint image set.) On average, 25% of the descriptor components was non-zero. The similarity metric was the frequency—inverse document frequency (“TFIDF”)-weighted histogram intersection metric.
The experiment implemented a transform coder on this very high-dimensional descriptor and non-Euclidean metric in a retrieval setting. The entire dataset included 15,620 images. In each trial, 5,000 were randomly selected for training, 10,000 for the search set, and the remainder formed the query set.
The various methods as illustrated in the figures and described herein represent example embodiments of methods. The methods may be implemented in software, hardware, or a combination thereof. The order of method may be changed, and various elements may be added, reordered, combined, omitted, modified, etc. Various modifications and changes may be made as would be obvious to a person of ordinary skill in the art having the benefit of this specification. It is intended that the invention embrace all such modifications and changes and, accordingly, the above description to be regarded in an illustrative rather than a restrictive sense.
Claims
1. A computer-readable storage medium excluding signals per se, comprising instructions stored thereon that, responsive to execution by a computing device, direct the computing device to perform operations comprising:
- transforming components of a image descriptor into transformed components in a transform domain;
- quantizing the transformed components;
- generating a compact representation of the image descriptor based, at least in part, on the quantized components; and
- responsive to the generating, constructing, at query time, a one-dimensional look-up table that stores a partial distance between two or more of the quantized components.
2. The computer-readable storage medium of claim 1, wherein the image descriptor is a SIFT descriptor.
3. The computer-readable storage medium of claim 1, wherein the quantizing reduces a distortion.
4. The computer-readable storage medium of claim 1, wherein the generating comprises concatenating the two or more of the quantized components into a word such that the quantized components do not straddle a word boundary.
5. The computer-readable storage medium of claim 4, the operations further comprising:
- calculating a partial distance between the quantized components within the word.
6. (canceled)
7. The computer-readable storage medium of claim 27, wherein the nearest neighbor search operation is performed based, at least in part, on the one-dimensional look-up table as part of a process selected from the group consisting of: a k-nearest neighbor image search, an image retrieval process, and a spatial pyramid bag-of-words retrieval process.
8. (canceled)
9. (canceled)
10. A method, comprising:
- performing, by one or more computing devices: transforming components of an image descriptor into transformed components; allocating bits to a subset of the transformed components; quantizing the subset of transformed components; concatenating two or more of the quantized components into a word; and constructing a two-dimensional look-up table, prior to a query, that stores a partial distance determined by the concatenated components within the word.
11. The method of claim 10, wherein concatenating comprises permuting the two or more quantized components within the word such that no quantized component straddles a word boundary.
12-14. (canceled)
15. The method of claim 21, wherein the nearest neighbor search operation is performed based, at least in part, on the two-dimensional look-up table as part of a local feature image matching process.
16. A system, comprising:
- at least one processor; and
- memory, communicatively coupled to the at least one processor, storing instructions that responsive to execution by the at least one processor, cause the at least one processor to perform operations comprising: quantizing components of a plurality of image descriptors; concatenating two or more of the quantized components into a word such that the quantized components do not straddle a word boundary; calculating a partial distance between the concatenated components; and constructing, at query time, a one-dimensional look-up table that stores the partial distance between the concatenated components of the word.
17. The system of claim 16, the operations further comprising:
- evaluating a nearest neighbor search based, at least in part, on the one-dimensional look-up table.
18. The system of claim 16, the operations further comprising, prior to quantizing:
- transforming components of the image descriptors to reduce a correlation among the components.
19. The system of claim 16, where a number of quantization levels allocated to a given component is a function of a statistic of the given component as determined from a training sample.
20. (canceled)
21. The method of claim 10, further comprising applying the two-dimensional look-up table in a nearest neighbor search operation.
22. The method of claim 10, further comprising applying the two-dimensional look-up table in an image feature matching process.
23. The method of claim 22, the image feature matching process including an Internet search or microarray DNA analysis.
24. The system of claim 16, the operations further comprising performing an Internet search based on the one-dimensional look-up table.
25. The computer-readable storage medium of claim 1, the operations further comprising allocating one or more bits available within a bit budget to a given transformed component within a first subset of transformed components as a function of a variance of the given transformed component, wherein a second subset of the transformed components receives zero bits.
26. The computer-readable storage medium of claim 1, wherein the transforming reduces a correlation among the components.
27. The computer-readable storage medium of claim 1, the operations further comprising evaluating a nearest neighbor search operation based, at least in part, on the compact representation of the image descriptor.
Type: Application
Filed: Aug 26, 2010
Publication Date: May 16, 2013
Inventor: Jonathan W. Brandt (Santa Cruz, CA)
Application Number: 12/869,133
International Classification: G06K 9/64 (20060101); G06K 9/46 (20060101); G06K 9/72 (20060101); G06K 9/40 (20060101);