Inverted Index Patents (Class 707/742)
  • Publication number: 20110258198
    Abstract: Systems and methods for applying user behavior data to improve serach query result ranking are provided. Upon receiving an update file indicating that recent, significant user behavior data is available for a document associated with an inverted index, the update file is published periodically and frequently to an index server. After filtering out the relevant update information from the update file, the index server extracts identifiers of the documents having the associated user behavior data. The update file and the identifier of the documents are utilized to update an in-memory index containing representations of metadata indicative of the user behavior. The in-memory index is continuously updated and utilized to serve search query results in response to user search queries. Search query results from the in-memory index are ranked using the user behavior data prior to serving. Thus, results associated with recent, significant user-behavior metadata receive prominent placement on the search results page.
    Type: Application
    Filed: June 27, 2011
    Publication date: October 20, 2011
    Applicant: MICROSOFT CORPORATION
    Inventors: WALTER SUN, JAY KUMAR GOYAL, PRATIBHA PERMANDLA, YINZHE YU, JINGFENG LI
  • Patent number: 8032532
    Abstract: A method and system for querying multifaceted information. An inverted index is constructed to include unique indexed tokens associated with posting lists of one or more documents. An indexed token is either a facet token included in a document as an annotation or a path prefix of the facet token. The annotation indicates a path within a tree structure representing a facet that includes the document. The tree structure includes nodes representing categories of documents. A query is received that includes constraints on documents. The constraints are associated with indexed tokens and corresponding posting lists. An execution of the query includes identifying the corresponding posting lists by utilizing the constraints and the inverted index and intersecting the posting lists to obtain a query result.
    Type: Grant
    Filed: May 21, 2008
    Date of Patent: October 4, 2011
    Assignee: International Business Machines Corporation
    Inventors: Andrei Z. Broder, Nadav Eiron, Felipe Marcus Fontoura, Ronny Lempel, Ning Li, John Ai McPherson, Jr., Andreas Neumann, Shila Ofek-Koifman, Runping Qi, Eugene J. Shekita
  • Patent number: 8032521
    Abstract: Embodiments of the present invention address deficiencies of the art in respect to structured content storage and provide a novel and non-obvious method, system and computer program product for managing structured content stored in a BLOB. In an embodiment of the invention, a performance optimized structured content management system can include a content repository, a content manager configured to provide access to structured content in the content repository and multiple different performance optimized containers disposed in the content repository. Each of the containers can store a portion of the structured content, and each of the containers can include a flattened form of original structured content in a primary binary large object (BLOB) and a parsed form of the original structured content in a secondary BLOB, the parsed form of the original structured content in the secondary BLOB indexing the flattened form of the original structured content in the primary BLOB.
    Type: Grant
    Filed: August 8, 2007
    Date of Patent: October 4, 2011
    Assignee: International Business Machines Corporation
    Inventors: Stephen J. Garward, Mark C. Hampton, Eric Martinez de Morentin, Kenneth Sabir
  • Publication number: 20110219008
    Abstract: A method and indexing system indexes the content of a body of documents into a content index, and the metadata of the documents into a metadata index which is a parallel index to the content index. The metadata is copied into a data store that is easily accessible by the indexing system and is stored in native form. The indexing system can dynamically re-index the metadata from the native metadata in the data store to produce a new metadata index which is used to replace the original metadata index. Search queries received by a search engine associated with the indexing system are applied to both the content and metadata index and the results are merged for return.
    Type: Application
    Filed: March 8, 2010
    Publication date: September 8, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: DAVID O. BEEN, MICHAEL BUSCH, OSAMU FURUSAWA, FREDERICK S. GRENNAN, FUMIHIKO TERUI, JUSTO L. PEREZ
  • Publication number: 20110218989
    Abstract: The present disclosure provides an information search method and system applicable in an information search system wherein each document has corresponding forward index data to address the issue of low search efficiency suffered by existing information search techniques. In one aspect, the method may include: receiving an inquiry word and obtaining one or more keywords contained in the inquiry word by segmentation; searching one or more documents matching the one or more keywords and forward index data corresponding to the one or more documents through the information search system's inverted index data; and determining an abstract of each of the one or more documents according to a corresponding document's forward index data, and outputting the abstract and information of the one or more documents as a search result. The proposed techniques can increase efficiency of information search and, at the meantime, guarantee accuracy of the search to a certain extent.
    Type: Application
    Filed: August 27, 2010
    Publication date: September 8, 2011
    Applicant: ALIBABA GROUP HOLDING LIMITED
    Inventor: Yi Luo
  • Patent number: 8010534
    Abstract: Techniques for grouping related objects such as documents and files using quantum clustering are disclosed. A method may include constructing a feature-object database of multiple objects. The feature-object database may have quantized selected features as keys. A connected objects database maybe built. Clusters of connected objects may be identified in the connected objects database. The clusters of identified objects may be evaluated to determine groups of related objects. The method may be implemented on a computing device.
    Type: Grant
    Filed: August 31, 2007
    Date of Patent: August 30, 2011
    Assignee: Orcatec LLC
    Inventors: Herbert L. Roitblat, Brian Golbère
  • Publication number: 20110202541
    Abstract: Systems and methods for performing an updating process to an in-memory index are provided. Upon receiving notice of document modifications covered by an inverted index associated with a search engine, in the form of an update file, a representation of the modification is published onto various index serving machines. Each index serving machine receiving the update file determines if the modifications are applicable to the index serving machine. If an index serving machine determines that it contains mapping information corresponding to the modified documents, the index serving machine utilizes the update file and associated mapping information to update an in-memory index. In embodiments, the in-memory index is used to provide results to user queries in tandem with the inverted index. In some embodiments, an extra in-memory index is maintained that is revised with constantly incoming metadata updates and the existing in-memory index is periodically swapped with the revised in-memory index.
    Type: Application
    Filed: February 12, 2010
    Publication date: August 18, 2011
    Applicant: MICROSOFT CORPORATION
    Inventors: PRATIBHA PERMANDLA, YINZHE YU, GAURAV SAREEN, ABHAS KUMAR
  • Patent number: 7984041
    Abstract: Methods and apparatus provide for a local search indexer to allow for an optimized search within a web server that returns accurate search results while maintaining independent control as to defining search patterns, search prioritization, and updated content available for search. Specifically, the local search indexer organizes content according to a hierarchical directory structure at a web server. The hierarchical directory structure includes at least one directory level that provides at least one directory for storing the content. The local search indexer builds a search index associated with the directory and stores the search index at the web server. The search index is populated with indexed content based on an update of the content stored in the directory. The local search indexer employs a search engine, at the web server, to process search queries against the indexed content to provide a search result that includes the update of the content.
    Type: Grant
    Filed: July 9, 2007
    Date of Patent: July 19, 2011
    Assignee: Oracle America, Inc.
    Inventor: Yogesh Y Patil
  • Patent number: 7984029
    Abstract: In a single-signature duplicate document system, a secondary set of attributes is used in addition to a primary set of attributes so as to improve the precision of the system. When the projection of a document onto the primary set of attributes is below a threshold, then a secondary set of attributes is used to supplement the primary lexicon so that the projection is above the threshold.
    Type: Grant
    Filed: June 23, 2008
    Date of Patent: July 19, 2011
    Assignee: AOL Inc.
    Inventors: Joshua Alspector, Aleksander Kolcz, Abdur R. Chowdhury
  • Patent number: 7984036
    Abstract: System and computer program product for processing a text search query in a collection of documents. A full posting index is generated that has first index terms and a full posting list for each first index term, enumerating occurrences of the first index terms in the documents of the collection. A text search query includes search conditions search terms. The search conditions are translated into conditions on the first index terms to provide translated conditions. At least one short posting index is generated, which includes second index terms and a short posting list for each second index term, enumerating documents in which the second index terms occur. Filter conditions and complementary conditions are generated to represent the translated conditions. The filter conditions approximate the translated conditions, and are processed using the short posting index. The complementary conditions are processed using the full posting index to provide a query result.
    Type: Grant
    Filed: January 25, 2008
    Date of Patent: July 19, 2011
    Assignee: International Business Machines Corporation
    Inventors: Jochen Doerre, Monika Matschke, Roland Seiffert, Matthias Tschaffler
  • Patent number: 7974976
    Abstract: A system and method for deriving user intent from a query. The system includes a query engine, and an advertisement engine. The query engine receives a query from the user. The query engine analyzes the query to determine a query intent that is matched to a domain. The query may be further analyzed to derive predicate values based on the query and the domain hierarchy. The domain and associated information may then be matched to a list of advertisements. The advertisement may be assigned an ad match score based on a correlation between the query information and various listing information provided in the advertisement.
    Type: Grant
    Filed: May 18, 2007
    Date of Patent: July 5, 2011
    Assignee: Yahoo! Inc.
    Inventors: Sihem Amer Yahia, Jayavel Shanmugasundaram, Utkarsh Srivastava, Erik Vee
  • Patent number: 7958138
    Abstract: An apparatus, method and article of manufacture of the present invention detects the presence of references to the same concept in separate sections of text, and, with no input required from the reader, presents the reader with information concerning the detected references to the concept. The information provided may comprise information related to the location of the reference to the concept in other sections of text, and the reader also is provided the ability to move from one reference to a concept directly to another reference to the same concept.
    Type: Grant
    Filed: April 13, 2009
    Date of Patent: June 7, 2011
    Inventor: Philip R Krause
  • Patent number: 7937394
    Abstract: A method and system are provided of processing a search query entered by a user of a device having a text input interface with overloaded keys. The search query is directed at identifying an item from a set of items. Each of the items has a name including one or more words. The system receives from the user an ambiguous search query directed at identifying a desired item. The search query comprises a prefix substring of at least one word in the name of the desired item. The system dynamically identifies a group of one or more items from the set of items having one or more words in the names thereof matching the search query as the user enters each character of the search query. They system also orders the one or more items of the group in accordance with given criteria.
    Type: Grant
    Filed: August 2, 2010
    Date of Patent: May 3, 2011
    Assignee: Veveo, Inc.
    Inventors: Sashikumar Venkataraman, Rakesh Barve, Murali Aravamudan, Ajit Rajasekharan
  • Patent number: 7917516
    Abstract: Systems and methods for processing an index are described. To insure that the most updated index is available without having to update the index after every change (which can consume enormous resources), a specially marked postings list is generated for a changed item. During retrieval, the specially marked postings list supplements the existing content of an inverted index referencing the changed item. In this manner, the retrieval result for items containing the term under which the changed item was originally indexed is updated in accordance with the specially marked postings list to insure the most accurate retrieval result.
    Type: Grant
    Filed: June 8, 2007
    Date of Patent: March 29, 2011
    Assignee: Apple Inc.
    Inventors: Wayne Loofbourrow, John Martin Hoernkvist, Eric Richard Koebler, Yun-chih S. Li
  • Patent number: 7917517
    Abstract: Techniques are disclosed for indexing uncertain data in query processing systems. For example, a method for processing queries in an application that involves an uncertain data set includes the following steps. A representation of records of the uncertain data set is created based on mean values and uncertainty values. The representation is utilized for processing a query received on the uncertain data set.
    Type: Grant
    Filed: February 28, 2008
    Date of Patent: March 29, 2011
    Assignee: International Business Machines Corporation
    Inventors: Charu C. Aggarwal, Philip Shi-Lung Yu
  • Publication number: 20110066623
    Abstract: Systems and methods for compressing indices are described. In one aspect, a plurality of items are selected where each item has an entry in an inverted index and each item entry comprises a listing of articles that the item appears in. At least a first item entry and a second item entry are determined for compression and the second item entry is compressed into the first item entry resulting in a compressed first item entry.
    Type: Application
    Filed: September 20, 2010
    Publication date: March 17, 2011
    Inventor: Adam J. Weissman
  • Patent number: 7895185
    Abstract: A method, computer program product, and system for managing row identifier (RID) list processing on an index are provided. The method, computer program product, and system provide for accessing one or more key values in the index based on one or more keys specified in a query, retrieving a plurality of row identifiers corresponding to the one or more key values from the index, and predicting an actual number of row identifiers to be retrieved from the index based on the one or more key values accessed and the plurality of row identifiers retrieved.
    Type: Grant
    Filed: September 28, 2006
    Date of Patent: February 22, 2011
    Assignee: International Business Machines Corporation
    Inventors: Ying-Lin Chen, You-Chin Fuh, Fen-Ling Lin, Terence Patrick Purcell, Ying Zeng
  • Publication number: 20110040905
    Abstract: A method of buffered reading of data is provided. A read request for data is received by a buffered reader, and in response to the read request, a main memory input buffer is partially filled with the data by the buffered reader to a predetermined amount that is less than a fill capacity of the input buffer. Corresponding computer system and program products are also provided.
    Type: Application
    Filed: August 11, 2010
    Publication date: February 17, 2011
    Applicant: GLOBALSPEC, INC.
    Inventors: Steinar Flatland, Mark Richard Gaulin
  • Publication number: 20110022600
    Abstract: A method of data retrieval from a data repository in response to a query having either list of keywords and/or list of attribute-value pairs, the method comprising the steps of: providing an inverted index generated from the data repository, the inverted index indicating the attribute with which each term is encountered in each entity when such an attribute is available; retrieving data from the inverted index by searching said inverted index based on said attribute-value pairs or keywords; providing scores to entities. A method of forming an inverted index from a data repository and a search engine for retrieval of data from a data repository is also provided.
    Type: Application
    Filed: July 22, 2009
    Publication date: January 27, 2011
    Applicant: Ecole Polytechnique Federale de Lausanne EPFL
    Inventors: Saket SATHE, Gleb Skobeltsyn
  • Patent number: 7870114
    Abstract: Described is a technology by which high dimensional source data corresponding to rows of records with identifiers, and columns comprising dimensions of data values, are processed into a file model for efficient access. An inverted index corresponding to any dimension is built by mapping data from raw dimension values to mapped values based on mapping entries in a dimension table. The record identifiers are arranged into subgroups based on their mapped value; a count and/or an offset may be maintained for locating each of the subgroups. The raw values for a dimension are maintained within a raw value file. For sparse data, the raw value file may be compressed, e.g., by excluding nulls and associating a record identifier with each non-null. A data manager provides access to data in the data files, such as by offering various functions, using caching for efficiency.
    Type: Grant
    Filed: June 15, 2007
    Date of Patent: January 11, 2011
    Assignee: Microsoft Corporation
    Inventors: Haidong Zhang, Guowei Liu, Yantao Li, Bing Sun, Jian Wang
  • Publication number: 20100318519
    Abstract: In embodiments of the disclosed technology, indexes, such as inverted indexes, are updated only as necessary to guarantee answer precision within predefined thresholds which are determined with little cost in comparison to the updates of the indexes themselves. With the present technology, a batch of daily updates can be processed in a matter of minutes, rather than a few hours for rebuilding an index, and a query may be answered with assurances that the results are accurate or within a threshold of accuracy.
    Type: Application
    Filed: June 10, 2009
    Publication date: December 16, 2010
    Applicant: AT&T Intellectual Property I, L.P.
    Inventors: Marios Hadjieleftheriou, Nick Koudas, Divesh Srivastava
  • Patent number: 7853598
    Abstract: A system may include a provider database, a reader database, and a database management system. The provider database may include a provider data area having a plurality of provider block addresses, and the reader database may include a reader data area having a plurality of reader block addresses, and a mapping of provider-specific identifiers to block addresses of the plurality of provider data pages and of reader-specific identifiers to block addresses of the plurality of reader data pages. The database management system may modify a database object of the reader database, the object being is associated with a provider-specific identifier; and modify the mapping to map the provider-specific identifier to a first block address of one of the plurality of reader data pages.
    Type: Grant
    Filed: March 27, 2008
    Date of Patent: December 14, 2010
    Assignee: SAP AG
    Inventors: Frederik Transier, Peter Sanders
  • Patent number: 7849063
    Abstract: Systems and methods for query processing and indexing of documents in connection with a content store in a computing system are provided. In various embodiments, an indexing model is provided that is optimized for fast, efficient and scalable retrieval of documents satisfying a query, including the mixed use of forward and inverted indexing representations, including algorithms for achieving a balance between the two representations. When processing queries, fast and efficient generation of reverse chronologically ordered posting lists is enabled for efficient execution of logical operators on query result sets. A term expand index is also provided wherein the overall terms included in the term expand index are decomposed into a plurality of lexicon files, which are combined when convenient for fast, scalable efficiency when performing queries of the content in the content store.
    Type: Grant
    Filed: October 15, 2004
    Date of Patent: December 7, 2010
    Assignee: Yahoo! Inc.
    Inventors: Raymond P. Stata, Patrick David Hunt, Thiruvalluvan Mg
  • Patent number: 7836043
    Abstract: A data acquisition and perusal system and method which enable: selection of a plurality of files for inclusion into at least one selectable database; generation of a searchable index of the data contained in the selectable database; and searches of the searchable index according to search criteria. This invention allows users to view, acquire, and generate single- or multiple-data sources locally or remotely, and to compile, index, modify, and append the data sources according to default or user defined criteria. This invention can: selectively acquire and display data contained within remote databases; capture automatically indexed HTML data; and automatically “pinpoint,” and highlight specific text or groups of text designated by the user within the resulting database. This invention contains a link module enabling custom links to be defined between selected terms of selected files of the selectable database including the custom links so that the searchable index includes only valid links.
    Type: Grant
    Filed: July 8, 2004
    Date of Patent: November 16, 2010
    Inventors: Robert Leland Jensen, Daniel Victor Smith
  • Patent number: 7831428
    Abstract: A speech segment is indexed by identifying at least two alternative word sequences for the speech segment. For each word in the alternative sequences, information is placed in an entry for the word in the index. Speech units are eliminated from entries in the index based on a comparison of a probability that the word appears in the speech segment and a threshold value.
    Type: Grant
    Filed: November 9, 2005
    Date of Patent: November 9, 2010
    Assignee: Microsoft Corporation
    Inventors: Ciprian I. Chelba, Alejandro Acero, Jorge F. Silva Sanchez
  • Patent number: 7827181
    Abstract: An efficient determination of a click distance value is made for each document in a corpus of documents from data included in a locally-stored inverted index. The click distance is measurement of the number clicks or user navigations from a first document on the network to another document. Specialized words are included in the locally-stored inverted index. The specialized words relate source documents to a set of target documents. A click distance is assigned to a source document when an inverted index is queried for the corresponding set of target documents according to a query that passes in one of the specialized words. The process is repeated for each document in the corpus of documents.
    Type: Grant
    Filed: September 29, 2005
    Date of Patent: November 2, 2010
    Assignee: Microsoft Corporation
    Inventor: Mihai Petriuc
  • Patent number: 7818333
    Abstract: A method and system for parsing of input addresses for further automated processing. A relevant locale for an input address is determined. Based on the locale, an applicable parsing tree is provided so that different address formats can be tested against the input address. The parsing tree is generated from a local address format specification that defines permissible formats for the locale. The local address format specification and the local address component rules are provided to a parsing engine to determine one or more potential parsed addresses based on compliance with specifications. The local address component rules specification is applied to the input address to determine one or more branches of the parsing tree for which the input address matches criteria of the component rules specification. Penalties may be assigned to branches of the tree when disfavored matches occur.
    Type: Grant
    Filed: June 6, 2007
    Date of Patent: October 19, 2010
    Assignee: Pitney Bowes Software Inc.
    Inventors: John R. Biard, Freddie J. Bourland, II
  • Publication number: 20100262607
    Abstract: A method for indexing advertising contracts for rapid retrieval and matching in order to match satisfying contracts to advertising slots. The descriptions of the advertising contracts include logical predicates indicating applicability to a particular demographic. Also, the descriptions of advertising slots contain logical predicates indicating applicability to a particular demographic, thus matches can be performed using at least matches on the basis of intersecting demographics. The disclosure contains structure and techniques for receiving a set of contracts with predicates, preparing a data structure index of the set of contracts, receiving an advertising slot with predicates, and structure and techniques for retrieving from the data structure contracts that satisfy a match to the advertising slot predicates.
    Type: Application
    Filed: April 10, 2009
    Publication date: October 14, 2010
    Inventors: Sergei Vassilvitskii, Ramana Yerneni, Jayavel Shanmugasundaram, Erik Vee, Chad Brower, Steven Whang
  • Publication number: 20100262608
    Abstract: Systems and methods for processing an index are described. An index may be merged with another index of comparable age and size into a single index. Since older indexes are less likely to need updating, they are “set aside” to age based on certain adaptive criteria such as the age and size of the index, percentage of deletions, and how long it takes to update the index. An index that has been set aside may be compacted into a format that is optimized for fast searching.
    Type: Application
    Filed: May 28, 2010
    Publication date: October 14, 2010
    Inventor: John Martin Hornkvist
  • Patent number: 7801898
    Abstract: Systems and methods for compressing indices are described. In one aspect, a plurality of items are selected where each item has an entry in an inverted index and each item entry comprises a listing of articles that the item appears in. At least a first item entry and a second item entry are determined for compression and the second item entry is compressed into the first item entry resulting in a compressed first item entry.
    Type: Grant
    Filed: December 30, 2003
    Date of Patent: September 21, 2010
    Assignee: Google Inc.
    Inventor: Adam J. Weissman
  • Patent number: 7792840
    Abstract: Disclosed relates to a structure of two-level n-gram inverted index and methods of building the same, processing queries and deriving the index that reduce the size of n-gram inverted index and improves the query performance by eliminating the redundancy of the position information that exists in the n-gram inverted index. The inverted index of the present invention comprises a back-end inverted index using subsequences extracted from documents as a term and a front-end inverted index using n-grams extracted from the subsequences as a term. The back-end inverted index uses the subsequences of a specific length extracted from the documents to be overlapped with each other by n?1 (n: the length of n-gram) as a term and stores position information of the subsequences occurring in the documents in a posting list for the respective subsequences.
    Type: Grant
    Filed: August 9, 2006
    Date of Patent: September 7, 2010
    Assignee: Korea Advanced Institute of Science and Technology
    Inventors: Kyu-Young Whang, Min-Soo Kim, Jae-Gil Lee, Min-Jae Lee
  • Publication number: 20100211534
    Abstract: In one embodiment, generating an ontology includes accessing an inverted index comprising a plurality of inverted index lists. An inverted index list may correspond to a term of a language. Each inverted index list may comprise a term identifier of the term and one or more document identifiers indicating one or more documents of a document set in which the term appears. The embodiment also includes generating a term identifier index according to the inverted index. The term identifier index comprises a plurality of sections and each section corresponds to a document. Each section may comprise one or more term identifiers of one or more terms that appear in the document.
    Type: Application
    Filed: February 10, 2010
    Publication date: August 19, 2010
    Applicant: Fujitsu Limited
    Inventors: Stergios Stergiou, Yannis Labrou, Jawahar Jain
  • Publication number: 20100211572
    Abstract: Disclosed is a method of encoding JavaScript Object Notation (JSON) documents in an inverted index, wherein a tree representation of a JSON document is first generated, and, next, the JSON document is shredded into a list of <value, path, type, jdewey> tuples for each atom node, n, in the tree, where value is a label associated with n, path is a concatenation of node labels associated with ancestors of n, type is a description of a type of value, and jdewey of n is a partial Dewey code of its closest ancestor array node, if one exists, or empty, otherwise. Lastly, an inverted index is built using <path, type, value> as index term, and jdewey as payload. A method is also described to search the inverted index.
    Type: Application
    Filed: February 13, 2009
    Publication date: August 19, 2010
    Applicant: International Business Machines Corporation
    Inventors: Kevin Scott Beyer, Jun Rao, Eugene J. Shekita
  • Patent number: 7779011
    Abstract: A method and system are provided of processing a search query entered by a user of a device having a text input interface with overloaded keys. The search query is directed at identifying an item from a set of items. Each of the items has a name including one or more words. The system receives from the user an ambiguous search query directed at identifying a desired item. The search query comprises a prefix substring of at least one word in the name of the desired item. The system dynamically identifies a group of one or more items from the set of items having one or more words in the names thereof matching the search query as the user enters each character of the search query. The system also orders the one or more items of the group in accordance with given criteria.
    Type: Grant
    Filed: December 20, 2005
    Date of Patent: August 17, 2010
    Assignee: Veveo, Inc.
    Inventors: Sashikumar Venkataraman, Rakesh Barve, Murali Aravamudan, Ajit Rajasekharan
  • Publication number: 20100205172
    Abstract: A method for using dual indices to support query expansion, relevance/non-relevance models, blind/relevance feedback and an intelligent search interface, comprising using a computing device (89) to: access (101) an inverted index (103) to obtain an initial retrieval of results in response to a query, and to generate a rank list of the results, the results referring to information units (IUs) where the query occurs; and determine (105) a number of “N” IUs in the results that are assumed to be relevant by accessing a forward index (104); wherein the forward index (104) and inverted index (103) have pointers to locations in the IUs where terms of the query occur, and the forward index (104) retrieves a term frequency vector of the IU or a set of contexts of the IU.
    Type: Application
    Filed: February 9, 2009
    Publication date: August 12, 2010
    Inventor: Robert Wing Pong LUK
  • Patent number: 7765214
    Abstract: Provided are techniques for computer-based electronic Information Retrieval (IR). An extended inverted index structure by generating one or more lexical affinities (LA), wherein each of the one or more lexical affinities comprises two or more search items found in proximity in one or more documents in a pool of documents, and generating a posting list for each of the one or more lexical affinities, wherein each posting list is associated with a specific lexical affinity and contains document identifying information for each of the one or more documents in the pool that contains the specific lexical affinity and a location within the document where the specific lexical affinity occurs.
    Type: Grant
    Filed: January 18, 2006
    Date of Patent: July 27, 2010
    Assignee: International Business Machines Corporation
    Inventors: Peter Altevogt, Marcus Felipe Fontoura, Silvio Wiedrich, Jason Yeong Zien
  • Patent number: 7765216
    Abstract: Described is a technology by which high dimensional data may be efficiently analyzed, including by filtering, grouping, aggregating and/or sorting operations to provide an analysis result. For efficiency in the analysis, an inverted index may be built (e.g., as part of filtering), and/or a hash structure (e.g., as part of grouping). Analysis parameters specify dimensions, on which union and/or intersection operations are performed to provide a final dataset. The analysis tool provides a user interface for inputting analysis parameters and outputting information corresponding to an analysis result. The analysis tool may sort the information corresponding to the analysis result, e.g., to output the topmost or bottommost results.
    Type: Grant
    Filed: June 15, 2007
    Date of Patent: July 27, 2010
    Assignee: Microsoft Corporation
    Inventors: Yantao Li, Guowei Liu, Haidong Zhang, Adnan Azfar Mahmud, Bing Sun, Min Wang, Wenli Zhu, Jian Wang
  • Patent number: 7765215
    Abstract: A trustworthy inverted index system processes records to identify features for indexing, generates posting lists corresponding to features in a dictionary, maintains in a storage cache a tail of at least one of the posting lists to minimize random I/Os to the index, determines a desired number of the posting lists based on a desired level of insertion performance, a query performance, or a size of the storage cache, and reads a posting list corresponding to a search feature in a query to identify records that comprise the search feature. The system maps the features in the dictionary to the desired number of posting lists. The system uses a jump pointer to point from one entry to the next in the posting lists based on increasing values of entries in the posting lists.
    Type: Grant
    Filed: August 22, 2006
    Date of Patent: July 27, 2010
    Assignee: International Business Machines Corporation
    Inventors: Windsor Wee Sun Hsu, Soumyadeb Mitra
  • Publication number: 20100179933
    Abstract: A system and method for determining a similarity between a document and a query includes building a weight vector for each of a plurality of documents in a corpus of documents stored in memory and building a weight vector for a query input into a document retrieval system. A weight matrix is generated which distinguishes between relevant documents and lower ranked documents by comparing document/query tuples using a gradient step approach. A similarity score is determined between weight vectors of the query and documents in a corpus by determining a product of a document weight vector, a query weight vector and the weight matrix.
    Type: Application
    Filed: September 18, 2009
    Publication date: July 15, 2010
    Applicant: NEC Laboratories America, Inc.
    Inventors: BING BAI, Jason Weston, Ronan Collobert, David Grangier
  • Patent number: 7756877
    Abstract: Systems and methods for compressing an index are described. In one exemplary method, the results of a search are annotated and then encoded into one or more chunks of compressed data in accordance with the annotations of the results. The annotations include an indication of a best encoding method selected from a set of available encoding methods, and an indication of whether to switch to a new chunk during encoding or to continue encoding in the current chunk. Other methods are described and data processing systems and machine readable media are also described.
    Type: Grant
    Filed: August 4, 2006
    Date of Patent: July 13, 2010
    Assignee: Apple Inc.
    Inventor: Wayne Loofbourrow
  • Patent number: 7752193
    Abstract: An indexing engine generates a full text index of English and non-English files provided to the indexing engine. The indexing engine receives an input file for indexing, and normalizes the unique words contained in the input file. The normalizing includes stripping the words of any diacritical marks, taking into account different multilingual issues, case folding the words into lowercase, and the like. The normalized words are stored in a dictionary, and a word record is generated for each stored word. Each word record includes a flag that indicates whether one or more variations exist in the input file for the normalized word. One or more tables store information on the variations for the normalized words. When a query engine is invoked to search for an input query word, the variations are searched only if the user has set an option to consider such variations.
    Type: Grant
    Filed: September 6, 2007
    Date of Patent: July 6, 2010
    Assignee: Guidance Software, Inc.
    Inventor: Dominik Weber
  • Patent number: 7734671
    Abstract: A method of sorting text for memory efficient searching is disclosed. A FM-index is created on received text, and a number of rows are marked. The locations of the marked rows are stored in data buckets as well as the last column of the FM-index, which is stored as a wavelet tree. Data blocks containing the data buckets are created; containing the number of times each character appears in the data block before each data bucket. A header block is created comprising an array of the number of times each character appears in the last column of the FM-index before each data blocks, the location of the end of the data blocks and the location of the end of the data, and appended to the data block. The header and data blocks are stored. The search process loads data buckets into memory as needed to find the required text.
    Type: Grant
    Filed: October 9, 2007
    Date of Patent: June 8, 2010
    Assignee: The United States of America as represented by the Director, National Security Agency
    Inventor: Michael P. Ferguson
  • Patent number: 7730071
    Abstract: A file system transfer designation section for transferring the file system matching with file system transfer rules from the first volume of the first storage apparatus to the second volume of the second storage apparatus based on the first file system transfer rules, a file system storage information manager for updating storage information of the file system in accordance with transfer of the file system by the file system transfer designation section, and transmitting the updated file system storage information, and a search information manager for updating search information for searching the files based on a file search request from the client apparatus using the file system storage information sent by the file system storage information manager are provided.
    Type: Grant
    Filed: November 28, 2006
    Date of Patent: June 1, 2010
    Assignee: Hitachi, Ltd.
    Inventors: Masaaki Iwasaki, Kiyotake Kumazawa
  • Publication number: 20100131515
    Abstract: A device, computer program product and a method for computing the similarity of a set of documents that avoids the large, wasted computational effort involved in calculating very small similarity scores by using thresholds to stop a similarity calculation between documents, thus ensuring that, with high probability, all document pairs with higher similarity than the thresholds have been found.
    Type: Application
    Filed: November 13, 2009
    Publication date: May 27, 2010
    Applicant: TELENOR ASA
    Inventors: Geoffrey CANRIGHT, Kenth ENGO-MONSEN
  • Patent number: 7716211
    Abstract: A system and method for facilitating full text searching utilizing inverted keyword indices in shared memory are provided. An inverted keyword index and an inverted keyword attribute index are created from keyword tokens from a set of documents. The keyword indices are stored in a shared memory buffer and accessed by a query processing component. Shared memory pointers corresponding to the indices are dynamically adjusted according to the addressing schema of the query processing component. The query processing component then processes data queries from the keyword indices stored in the shared memory buffer.
    Type: Grant
    Filed: February 10, 2004
    Date of Patent: May 11, 2010
    Assignee: Microsoft Corporation
    Inventors: Kyle G. Peltonen, Michael M. H. Cheng, David J. Lee
  • Publication number: 20100114845
    Abstract: Embodiments of prime indexing and/or other related operations are disclosed. prime indexing and/or other related operations are disclosed.
    Type: Application
    Filed: November 5, 2009
    Publication date: May 6, 2010
    Applicant: Skyler Technology, Inc.
    Inventors: Richard Crandall, Sam Noble
  • Publication number: 20100106706
    Abstract: A method of generating a search result list also provides related searches for use by a searcher. Search listings which generate a match with a search request submitted by the searcher are identified in a pay-for-placement database which includes a plurality of search listings. Related search listings contained in a related search database generated from the pay-for-placement database are identified as relevant to the search request. A search result list is returned to the searcher including the identified search listings and one or more of the identified search listings.
    Type: Application
    Filed: December 17, 2009
    Publication date: April 29, 2010
    Applicant: Yahoo! Inc.
    Inventors: Phillip G. Rorex, Thomas A. Soulanille, Bradley R. Haugaard
  • Publication number: 20100100552
    Abstract: A vast amount of information currently accessible over the Web, and in corporate networks, is stored in a variety of databases, and is being exported as XML data. However, querying this totality of information in a declarative and timely fashion is problematic because this set of databases is dynamic, and a common schema is difficult to maintain. The present invention provides a solution to the problem of issuing declarative, ad hoc XPath queries against such a dynamic collection of XML databases, and receiving timely answers. There is proposed a decentralized architectures, under the open and the agreement cooperation models between a set of sites, for processing queries and updates to XML data. Each site consists of XML data nodes. (which export their data as XML, and also pose queries) and one XML router node (which manages the query and update interactions between sites). The architectures differ in the degree of knowledge individual router nodes have about data nodes containing specific XML data.
    Type: Application
    Filed: December 22, 2009
    Publication date: April 22, 2010
    Inventors: Nikolaos Koudas, Divesh Srivastava, Michael Rabinovich
  • Patent number: 7702677
    Abstract: A method of accessing information from a collection of data includes receiving a query, generating an inverse index of the collection of data and generating results to the query in conjunction with the inverse index.
    Type: Grant
    Filed: March 11, 2008
    Date of Patent: April 20, 2010
    Assignee: International Business Machines Corporation
    Inventors: Jane Wen Chang, Raymond Lau, Michael Kyle McCandless
  • Publication number: 20100094877
    Abstract: There are provided methods and systems for efficient search in a peer-to-peer network topology. In various embodiments, search methods and systems provide for response times and network traffic that are independent from the number of query terms, thereby producing constant run-time searches and bandwidth hits in a P2P network search implementation. By distributing inverse indexes between peers, and storing with each inverse index a Bloom filter populated with selected keywords, multi-term search and analysis can be conducted on one network node without requiring exchange of posting lists between various network nodes.
    Type: Application
    Filed: October 13, 2009
    Publication date: April 15, 2010
    Inventor: Wolf Garbe