Inverted Index Patents (Class 707/742)

USING BEHAVIOR DATA TO QUICKLY IMPROVE SEARCH RANKING

Publication number: 20110258198

Abstract: Systems and methods for applying user behavior data to improve serach query result ranking are provided. Upon receiving an update file indicating that recent, significant user behavior data is available for a document associated with an inverted index, the update file is published periodically and frequently to an index server. After filtering out the relevant update information from the update file, the index server extracts identifiers of the documents having the associated user behavior data. The update file and the identifier of the documents are utilized to update an in-memory index containing representations of metadata indicative of the user behavior. The in-memory index is continuously updated and utilized to serve search query results in response to user search queries. Search query results from the in-memory index are ranked using the user behavior data prior to serving. Thus, results associated with recent, significant user-behavior metadata receive prominent placement on the search results page.

Type: Application

Filed: June 27, 2011

Publication date: October 20, 2011

Applicant: MICROSOFT CORPORATION

Inventors: WALTER SUN, JAY KUMAR GOYAL, PRATIBHA PERMANDLA, YINZHE YU, JINGFENG LI
Efficient multifaceted search in information retrieval systems

Patent number: 8032532

Abstract: A method and system for querying multifaceted information. An inverted index is constructed to include unique indexed tokens associated with posting lists of one or more documents. An indexed token is either a facet token included in a document as an annotation or a path prefix of the facet token. The annotation indicates a path within a tree structure representing a facet that includes the document. The tree structure includes nodes representing categories of documents. A query is received that includes constraints on documents. The constraints are associated with indexed tokens and corresponding posting lists. An execution of the query includes identifying the corresponding posting lists by utilizing the constraints and the inverted index and intersecting the posting lists to obtain a query result.

Type: Grant

Filed: May 21, 2008

Date of Patent: October 4, 2011

Assignee: International Business Machines Corporation

Inventors: Andrei Z. Broder, Nadav Eiron, Felipe Marcus Fontoura, Ronny Lempel, Ning Li, John Ai McPherson, Jr., Andreas Neumann, Shila Ofek-Koifman, Runping Qi, Eugene J. Shekita
Managing structured content stored as a binary large object (BLOB)

Patent number: 8032521

Abstract: Embodiments of the present invention address deficiencies of the art in respect to structured content storage and provide a novel and non-obvious method, system and computer program product for managing structured content stored in a BLOB. In an embodiment of the invention, a performance optimized structured content management system can include a content repository, a content manager configured to provide access to structured content in the content repository and multiple different performance optimized containers disposed in the content repository. Each of the containers can store a portion of the structured content, and each of the containers can include a flattened form of original structured content in a primary binary large object (BLOB) and a parsed form of the original structured content in a secondary BLOB, the parsed form of the original structured content in the secondary BLOB indexing the flattened form of the original structured content in the primary BLOB.

Type: Grant

Filed: August 8, 2007

Date of Patent: October 4, 2011

Assignee: International Business Machines Corporation

Inventors: Stephen J. Garward, Mark C. Hampton, Eric Martinez de Morentin, Kenneth Sabir
INDEXING MULTIPLE TYPES OF DATA TO FACILITATE RAPID RE-INDEXING OF ONE OR MORE TYPES OF DATA

Publication number: 20110219008

Abstract: A method and indexing system indexes the content of a body of documents into a content index, and the metadata of the documents into a metadata index which is a parallel index to the content index. The metadata is copied into a data store that is easily accessible by the indexing system and is stored in native form. The indexing system can dynamically re-index the metadata from the native metadata in the data store to produce a new metadata index which is used to replace the original metadata index. Search queries received by a search engine associated with the indexing system are applied to both the content and metadata index and the results are merged for return.

Type: Application

Filed: March 8, 2010

Publication date: September 8, 2011

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: DAVID O. BEEN, MICHAEL BUSCH, OSAMU FURUSAWA, FREDERICK S. GRENNAN, FUMIHIKO TERUI, JUSTO L. PEREZ
Information Search Method and System

Publication number: 20110218989

Abstract: The present disclosure provides an information search method and system applicable in an information search system wherein each document has corresponding forward index data to address the issue of low search efficiency suffered by existing information search techniques. In one aspect, the method may include: receiving an inquiry word and obtaining one or more keywords contained in the inquiry word by segmentation; searching one or more documents matching the one or more keywords and forward index data corresponding to the one or more documents through the information search system's inverted index data; and determining an abstract of each of the one or more documents according to a corresponding document's forward index data, and outputting the abstract and information of the one or more documents as a search result. The proposed techniques can increase efficiency of information search and, at the meantime, guarantee accuracy of the search to a certain extent.

Type: Application

Filed: August 27, 2010

Publication date: September 8, 2011

Applicant: ALIBABA GROUP HOLDING LIMITED

Inventor: Yi Luo
Identifying related objects using quantum clustering

Patent number: 8010534

Abstract: Techniques for grouping related objects such as documents and files using quantum clustering are disclosed. A method may include constructing a feature-object database of multiple objects. The feature-object database may have quantized selected features as keys. A connected objects database maybe built. Clusters of connected objects may be identified in the connected objects database. The clusters of identified objects may be evaluated to determine groups of related objects. The method may be implemented on a computing device.

Type: Grant

Filed: August 31, 2007

Date of Patent: August 30, 2011

Assignee: Orcatec LLC

Inventors: Herbert L. Roitblat, Brian Golbère
RAPID UPDATE OF INDEX METADATA

Publication number: 20110202541

Abstract: Systems and methods for performing an updating process to an in-memory index are provided. Upon receiving notice of document modifications covered by an inverted index associated with a search engine, in the form of an update file, a representation of the modification is published onto various index serving machines. Each index serving machine receiving the update file determines if the modifications are applicable to the index serving machine. If an index serving machine determines that it contains mapping information corresponding to the modified documents, the index serving machine utilizes the update file and associated mapping information to update an in-memory index. In embodiments, the in-memory index is used to provide results to user queries in tandem with the inverted index. In some embodiments, an extra in-memory index is maintained that is revised with constantly incoming metadata updates and the existing in-memory index is periodically swapped with the revised in-memory index.

Type: Application

Filed: February 12, 2010

Publication date: August 18, 2011

Applicant: MICROSOFT CORPORATION

Inventors: PRATIBHA PERMANDLA, YINZHE YU, GAURAV SAREEN, ABHAS KUMAR
Domain specific local search

Patent number: 7984041

Abstract: Methods and apparatus provide for a local search indexer to allow for an optimized search within a web server that returns accurate search results while maintaining independent control as to defining search patterns, search prioritization, and updated content available for search. Specifically, the local search indexer organizes content according to a hierarchical directory structure at a web server. The hierarchical directory structure includes at least one directory level that provides at least one directory for storing the content. The local search indexer builds a search index associated with the directory and stores the search index at the web server. The search index is populated with indexed content based on an update of the content stored in the directory. The local search indexer employs a search engine, at the web server, to process search queries against the indexed content to provide a search result that includes the update of the content.

Type: Grant

Filed: July 9, 2007

Date of Patent: July 19, 2011

Assignee: Oracle America, Inc.

Inventor: Yogesh Y Patil
Reliability of duplicate document detection algorithms

Patent number: 7984029

Abstract: In a single-signature duplicate document system, a secondary set of attributes is used in addition to a primary set of attributes so as to improve the precision of the system. When the projection of a document onto the primary set of attributes is below a threshold, then a secondary set of attributes is used to supplement the primary lexicon so that the projection is above the threshold.

Type: Grant

Filed: June 23, 2008

Date of Patent: July 19, 2011

Assignee: AOL Inc.

Inventors: Joshua Alspector, Aleksander Kolcz, Abdur R. Chowdhury
Processing a text search query in a collection of documents

Patent number: 7984036

Abstract: System and computer program product for processing a text search query in a collection of documents. A full posting index is generated that has first index terms and a full posting list for each first index term, enumerating occurrences of the first index terms in the documents of the collection. A text search query includes search conditions search terms. The search conditions are translated into conditions on the first index terms to provide translated conditions. At least one short posting index is generated, which includes second index terms and a short posting list for each second index term, enumerating documents in which the second index terms occur. Filter conditions and complementary conditions are generated to represent the translated conditions. The filter conditions approximate the translated conditions, and are processed using the short posting index. The complementary conditions are processed using the full posting index to provide a query result.

Type: Grant

Filed: January 25, 2008

Date of Patent: July 19, 2011

Assignee: International Business Machines Corporation

Inventors: Jochen Doerre, Monika Matschke, Roland Seiffert, Matthias Tschaffler
Deriving user intent from a user query

Patent number: 7974976

Abstract: A system and method for deriving user intent from a query. The system includes a query engine, and an advertisement engine. The query engine receives a query from the user. The query engine analyzes the query to determine a query intent that is matched to a domain. The query may be further analyzed to derive predicate values based on the query and the domain hierarchy. The domain and associated information may then be matched to a list of advertisements. The advertisement may be assigned an ad match score based on a correlation between the query information and various listing information provided in the advertisement.

Type: Grant

Filed: May 18, 2007

Date of Patent: July 5, 2011

Assignee: Yahoo! Inc.

Inventors: Sihem Amer Yahia, Jayavel Shanmugasundaram, Utkarsh Srivastava, Erik Vee
Method and apparatus for enhancing electronic reading by identifying relationships between sections of electronic text

Patent number: 7958138

Abstract: An apparatus, method and article of manufacture of the present invention detects the presence of references to the same concept in separate sections of text, and, with no input required from the reader, presents the reader with information concerning the detected references to the concept. The information provided may comprise information related to the location of the reference to the concept in other sections of text, and the reader also is provided the ability to move from one reference to a concept directly to another reference to the same concept.

Type: Grant

Filed: April 13, 2009

Date of Patent: June 7, 2011

Inventor: Philip R Krause
Method and system for dynamically processing ambiguous, reduced text search queries and highlighting results thereof

Patent number: 7937394

Abstract: A method and system are provided of processing a search query entered by a user of a device having a text input interface with overloaded keys. The search query is directed at identifying an item from a set of items. Each of the items has a name including one or more words. The system receives from the user an ambiguous search query directed at identifying a desired item. The search query comprises a prefix substring of at least one word in the name of the desired item. The system dynamically identifies a group of one or more items from the set of items having one or more words in the names thereof matching the search query as the user enters each character of the search query. They system also orders the one or more items of the group in accordance with given criteria.

Type: Grant

Filed: August 2, 2010

Date of Patent: May 3, 2011

Assignee: Veveo, Inc.

Inventors: Sashikumar Venkataraman, Rakesh Barve, Murali Aravamudan, Ajit Rajasekharan
Updating an inverted index

Patent number: 7917516

Abstract: Systems and methods for processing an index are described. To insure that the most updated index is available without having to update the index after every change (which can consume enormous resources), a specially marked postings list is generated for a changed item. During retrieval, the specially marked postings list supplements the existing content of an inverted index referencing the changed item. In this manner, the retrieval result for items containing the term under which the changed item was originally indexed is updated in accordance with the specially marked postings list to insure the most accurate retrieval result.

Type: Grant

Filed: June 8, 2007

Date of Patent: March 29, 2011

Assignee: Apple Inc.

Inventors: Wayne Loofbourrow, John Martin Hoernkvist, Eric Richard Koebler, Yun-chih S. Li
Method and apparatus for query processing of uncertain data

Patent number: 7917517

Abstract: Techniques are disclosed for indexing uncertain data in query processing systems. For example, a method for processing queries in an application that involves an uncertain data set includes the following steps. A representation of records of the uncertain data set is created based on mean values and uncertainty values. The representation is utilized for processing a query received on the uncertain data set.

Type: Grant

Filed: February 28, 2008

Date of Patent: March 29, 2011

Assignee: International Business Machines Corporation

Inventors: Charu C. Aggarwal, Philip Shi-Lung Yu
Methods and Systems for Compressing Indices

Publication number: 20110066623

Abstract: Systems and methods for compressing indices are described. In one aspect, a plurality of items are selected where each item has an entry in an inverted index and each item entry comprises a listing of articles that the item appears in. At least a first item entry and a second item entry are determined for compression and the second item entry is compressed into the first item entry resulting in a compressed first item entry.

Type: Application

Filed: September 20, 2010

Publication date: March 17, 2011

Inventor: Adam J. Weissman
Row-identifier list processing management

Patent number: 7895185

Abstract: A method, computer program product, and system for managing row identifier (RID) list processing on an index are provided. The method, computer program product, and system provide for accessing one or more key values in the index based on one or more keys specified in a query, retrieving a plurality of row identifiers corresponding to the one or more key values from the index, and predicting an actual number of row identifiers to be retrieved from the index based on the one or more key values accessed and the plurality of row identifiers retrieved.

Type: Grant

Filed: September 28, 2006

Date of Patent: February 22, 2011

Assignee: International Business Machines Corporation

Inventors: Ying-Lin Chen, You-Chin Fuh, Fen-Ling Lin, Terence Patrick Purcell, Ying Zeng
EFFICIENT BUFFERED READING WITH A PLUG-IN FOR INPUT BUFFER SIZE DETERMINATION

Publication number: 20110040905

Abstract: A method of buffered reading of data is provided. A read request for data is received by a buffered reader, and in response to the read request, a main memory input buffer is partially filled with the data by the buffered reader to a predetermined amount that is less than a fill capacity of the input buffer. Corresponding computer system and program products are also provided.

Type: Application

Filed: August 11, 2010

Publication date: February 17, 2011

Applicant: GLOBALSPEC, INC.

Inventors: Steinar Flatland, Mark Richard Gaulin
METHOD OF DATA RETRIEVAL, AND SEARCH ENGINE USING SUCH A METHOD

Publication number: 20110022600

Abstract: A method of data retrieval from a data repository in response to a query having either list of keywords and/or list of attribute-value pairs, the method comprising the steps of: providing an inverted index generated from the data repository, the inverted index indicating the attribute with which each term is encountered in each entity when such an attribute is available; retrieving data from the inverted index by searching said inverted index based on said attribute-value pairs or keywords; providing scores to entities. A method of forming an inverted index from a data repository and a search engine for retrieval of data from a data repository is also provided.

Type: Application

Filed: July 22, 2009

Publication date: January 27, 2011

Applicant: Ecole Polytechnique Federale de Lausanne EPFL

Inventors: Saket SATHE, Gleb Skobeltsyn
Efficient data infrastructure for high dimensional data analysis

Patent number: 7870114

Abstract: Described is a technology by which high dimensional source data corresponding to rows of records with identifiers, and columns comprising dimensions of data values, are processed into a file model for efficient access. An inverted index corresponding to any dimension is built by mapping data from raw dimension values to mapped values based on mapping entries in a dimension table. The record identifiers are arranged into subgroups based on their mapped value; a count and/or an offset may be maintained for locating each of the subgroups. The raw values for a dimension are maintained within a raw value file. For sparse data, the raw value file may be compressed, e.g., by excluding nulls and associating a record identifier with each non-null. A data manager provides access to data in the data files, such as by offering various functions, using caching for efficiency.

Type: Grant

Filed: June 15, 2007

Date of Patent: January 11, 2011

Assignee: Microsoft Corporation

Inventors: Haidong Zhang, Guowei Liu, Yantao Li, Bing Sun, Jian Wang
Incremental Maintenance of Inverted Indexes for Approximate String Matching

Publication number: 20100318519

Abstract: In embodiments of the disclosed technology, indexes, such as inverted indexes, are updated only as necessary to guarantee answer precision within predefined thresholds which are determined with little cost in comparison to the updates of the indexes themselves. With the present technology, a batch of daily updates can be processed in a matter of minutes, rather than a few hours for rebuilding an index, and a query may be answered with assurances that the results are accurate or within a threshold of accuracy.

Type: Application

Filed: June 10, 2009

Publication date: December 16, 2010

Applicant: AT&T Intellectual Property I, L.P.

Inventors: Marios Hadjieleftheriou, Nick Koudas, Divesh Srivastava
Compressed storage of documents using inverted indexes

Patent number: 7853598

Abstract: A system may include a provider database, a reader database, and a database management system. The provider database may include a provider data area having a plurality of provider block addresses, and the reader database may include a reader data area having a plurality of reader block addresses, and a mapping of provider-specific identifiers to block addresses of the plurality of provider data pages and of reader-specific identifiers to block addresses of the plurality of reader data pages. The database management system may modify a database object of the reader database, the object being is associated with a provider-specific identifier; and modify the mapping to map the provider-specific identifier to a first block address of one of the plurality of reader data pages.

Type: Grant

Filed: March 27, 2008

Date of Patent: December 14, 2010

Assignee: SAP AG

Inventors: Frederik Transier, Peter Sanders
Systems and methods for indexing content for fast and scalable retrieval

Patent number: 7849063

Abstract: Systems and methods for query processing and indexing of documents in connection with a content store in a computing system are provided. In various embodiments, an indexing model is provided that is optimized for fast, efficient and scalable retrieval of documents satisfying a query, including the mixed use of forward and inverted indexing representations, including algorithms for achieving a balance between the two representations. When processing queries, fast and efficient generation of reverse chronologically ordered posting lists is enabled for efficient execution of logical operators on query result sets. A term expand index is also provided wherein the overall terms included in the term expand index are decomposed into a plurality of lexicon files, which are combined when convenient for fast, scalable efficiency when performing queries of the content in the content store.

Type: Grant

Filed: October 15, 2004

Date of Patent: December 7, 2010

Assignee: Yahoo! Inc.

Inventors: Raymond P. Stata, Patrick David Hunt, Thiruvalluvan Mg
Database system and method for data acquisition and perusal

Patent number: 7836043

Abstract: A data acquisition and perusal system and method which enable: selection of a plurality of files for inclusion into at least one selectable database; generation of a searchable index of the data contained in the selectable database; and searches of the searchable index according to search criteria. This invention allows users to view, acquire, and generate single- or multiple-data sources locally or remotely, and to compile, index, modify, and append the data sources according to default or user defined criteria. This invention can: selectively acquire and display data contained within remote databases; capture automatically indexed HTML data; and automatically “pinpoint,” and highlight specific text or groups of text designated by the user within the resulting database. This invention contains a link module enabling custom links to be defined between selected terms of selected files of the selectable database including the custom links so that the searchable index includes only valid links.

Type: Grant

Filed: July 8, 2004

Date of Patent: November 16, 2010

Inventors: Robert Leland Jensen, Daniel Victor Smith
Speech index pruning

Patent number: 7831428

Abstract: A speech segment is indexed by identifying at least two alternative word sequences for the speech segment. For each word in the alternative sequences, information is placed in an entry for the word in the index. Speech units are eliminated from entries in the index based on a comparison of a probability that the word appears in the speech segment and a threshold value.

Type: Grant

Filed: November 9, 2005

Date of Patent: November 9, 2010

Assignee: Microsoft Corporation

Inventors: Ciprian I. Chelba, Alejandro Acero, Jorge F. Silva Sanchez
Click distance determination

Patent number: 7827181

Abstract: An efficient determination of a click distance value is made for each document in a corpus of documents from data included in a locally-stored inverted index. The click distance is measurement of the number clicks or user navigations from a first document on the network to another document. Specialized words are included in the locally-stored inverted index. The specialized words relate source documents to a set of target documents. A click distance is assigned to a source document when an inverted index is queried for the corresponding set of target documents according to a query that passes in one of the specialized words. The process is repeated for each document in the corpus of documents.

Type: Grant

Filed: September 29, 2005

Date of Patent: November 2, 2010

Assignee: Microsoft Corporation

Inventor: Mihai Petriuc
Universal address parsing system and method

Patent number: 7818333

Abstract: A method and system for parsing of input addresses for further automated processing. A relevant locale for an input address is determined. Based on the locale, an applicable parsing tree is provided so that different address formats can be tested against the input address. The parsing tree is generated from a local address format specification that defines permissible formats for the locale. The local address format specification and the local address component rules are provided to a parsing engine to determine one or more potential parsed addresses based on compliance with specifications. The local address component rules specification is applied to the input address to determine one or more branches of the parsing tree for which the input address matches criteria of the component rules specification. Penalties may be assigned to branches of the tree when disfavored matches occur.

Type: Grant

Filed: June 6, 2007

Date of Patent: October 19, 2010

Assignee: Pitney Bowes Software Inc.

Inventors: John R. Biard, Freddie J. Bourland, II
System and Method for Automatic Matching of Contracts to Impression Opportunities Using Complex Predicates and an Inverted Index

Publication number: 20100262607

Abstract: A method for indexing advertising contracts for rapid retrieval and matching in order to match satisfying contracts to advertising slots. The descriptions of the advertising contracts include logical predicates indicating applicability to a particular demographic. Also, the descriptions of advertising slots contain logical predicates indicating applicability to a particular demographic, thus matches can be performed using at least matches on the basis of intersecting demographics. The disclosure contains structure and techniques for receiving a set of contracts with predicates, preparing a data structure index of the set of contracts, receiving an advertising slot with predicates, and structure and techniques for retrieving from the data structure contracts that satisfy a match to the advertising slot predicates.

Type: Application

Filed: April 10, 2009

Publication date: October 14, 2010

Inventors: Sergei Vassilvitskii, Ramana Yerneni, Jayavel Shanmugasundaram, Erik Vee, Chad Brower, Steven Whang
INDEX AGING AND MERGING

Publication number: 20100262608

Abstract: Systems and methods for processing an index are described. An index may be merged with another index of comparable age and size into a single index. Since older indexes are less likely to need updating, they are “set aside” to age based on certain adaptive criteria such as the age and size of the index, percentage of deletions, and how long it takes to update the index. An index that has been set aside may be compacted into a format that is optimized for fast searching.

Type: Application

Filed: May 28, 2010

Publication date: October 14, 2010

Inventor: John Martin Hornkvist
Methods and systems for compressing indices

Patent number: 7801898

Abstract: Systems and methods for compressing indices are described. In one aspect, a plurality of items are selected where each item has an entry in an inverted index and each item entry comprises a listing of articles that the item appears in. At least a first item entry and a second item entry are determined for compression and the second item entry is compressed into the first item entry resulting in a compressed first item entry.

Type: Grant

Filed: December 30, 2003

Date of Patent: September 21, 2010

Assignee: Google Inc.

Inventor: Adam J. Weissman
Two-level n-gram index structure and methods of index building, query processing and index derivation

Patent number: 7792840

Abstract: Disclosed relates to a structure of two-level n-gram inverted index and methods of building the same, processing queries and deriving the index that reduce the size of n-gram inverted index and improves the query performance by eliminating the redundancy of the position information that exists in the n-gram inverted index. The inverted index of the present invention comprises a back-end inverted index using subsequences extracted from documents as a term and a front-end inverted index using n-grams extracted from the subsequences as a term. The back-end inverted index uses the subsequences of a specific length extracted from the documents to be overlapped with each other by n?1 (n: the length of n-gram) as a term and stores position information of the subsequences occurring in the documents in a posting list for the respective subsequences.

Type: Grant

Filed: August 9, 2006

Date of Patent: September 7, 2010

Assignee: Korea Advanced Institute of Science and Technology

Inventors: Kyu-Young Whang, Min-Soo Kim, Jae-Gil Lee, Min-Jae Lee
Efficient computation of ontology affinity matrices

Publication number: 20100211534

Abstract: In one embodiment, generating an ontology includes accessing an inverted index comprising a plurality of inverted index lists. An inverted index list may correspond to a term of a language. Each inverted index list may comprise a term identifier of the term and one or more document identifiers indicating one or more documents of a document set in which the term appears. The embodiment also includes generating a term identifier index according to the inverted index. The term identifier index comprises a plurality of sections and each section corresponds to a document. Each section may comprise one or more term identifiers of one or more terms that appear in the document.

Type: Application

Filed: February 10, 2010

Publication date: August 19, 2010

Applicant: Fujitsu Limited

Inventors: Stergios Stergiou, Yannis Labrou, Jawahar Jain
INDEXING AND SEARCHING JSON OBJECTS

Publication number: 20100211572

Abstract: Disclosed is a method of encoding JavaScript Object Notation (JSON) documents in an inverted index, wherein a tree representation of a JSON document is first generated, and, next, the JSON document is shredded into a list of <value, path, type, jdewey> tuples for each atom node, n, in the tree, where value is a label associated with n, path is a concatenation of node labels associated with ancestors of n, type is a description of a type of value, and jdewey of n is a partial Dewey code of its closest ancestor array node, if one exists, or empty, otherwise. Lastly, an inverted index is built using <path, type, value> as index term, and jdewey as payload. A method is also described to search the inverted index.

Type: Application

Filed: February 13, 2009

Publication date: August 19, 2010

Applicant: International Business Machines Corporation

Inventors: Kevin Scott Beyer, Jun Rao, Eugene J. Shekita
Method and system for dynamically processing ambiguous, reduced text search queries and highlighting results thereof

Patent number: 7779011

Abstract: A method and system are provided of processing a search query entered by a user of a device having a text input interface with overloaded keys. The search query is directed at identifying an item from a set of items. Each of the items has a name including one or more words. The system receives from the user an ambiguous search query directed at identifying a desired item. The search query comprises a prefix substring of at least one word in the name of the desired item. The system dynamically identifies a group of one or more items from the set of items having one or more words in the names thereof matching the search query as the user enters each character of the search query. The system also orders the one or more items of the group in accordance with given criteria.

Type: Grant

Filed: December 20, 2005

Date of Patent: August 17, 2010

Assignee: Veveo, Inc.

Inventors: Sashikumar Venkataraman, Rakesh Barve, Murali Aravamudan, Ajit Rajasekharan
METHOD FOR USING DUAL INDICES TO SUPPORT QUERY EXPANSION, RELEVANCE/NON-RELEVANCE MODELS, BLIND/RELEVANCE FEEDBACK AND AN INTELLIGENT SEARCH INTERFACE

Publication number: 20100205172

Abstract: A method for using dual indices to support query expansion, relevance/non-relevance models, blind/relevance feedback and an intelligent search interface, comprising using a computing device (89) to: access (101) an inverted index (103) to obtain an initial retrieval of results in response to a query, and to generate a rank list of the results, the results referring to information units (IUs) where the query occurs; and determine (105) a number of “N” IUs in the results that are assumed to be relevant by accessing a forward index (104); wherein the forward index (104) and inverted index (103) have pointers to locations in the IUs where terms of the query occur, and the forward index (104) retrieves a term frequency vector of the IU or a set of contexts of the IU.

Type: Application

Filed: February 9, 2009

Publication date: August 12, 2010

Inventor: Robert Wing Pong LUK
Enhancing query performance of search engines using lexical affinities

Patent number: 7765214

Abstract: Provided are techniques for computer-based electronic Information Retrieval (IR). An extended inverted index structure by generating one or more lexical affinities (LA), wherein each of the one or more lexical affinities comprises two or more search items found in proximity in one or more documents in a pool of documents, and generating a posting list for each of the one or more lexical affinities, wherein each posting list is associated with a specific lexical affinity and contains document identifying information for each of the one or more documents in the pool that contains the specific lexical affinity and a location within the document where the specific lexical affinity occurs.

Type: Grant

Filed: January 18, 2006

Date of Patent: July 27, 2010

Assignee: International Business Machines Corporation

Inventors: Peter Altevogt, Marcus Felipe Fontoura, Silvio Wiedrich, Jason Yeong Zien
Multidimensional analysis tool for high dimensional data

Patent number: 7765216

Abstract: Described is a technology by which high dimensional data may be efficiently analyzed, including by filtering, grouping, aggregating and/or sorting operations to provide an analysis result. For efficiency in the analysis, an inverted index may be built (e.g., as part of filtering), and/or a hash structure (e.g., as part of grouping). Analysis parameters specify dimensions, on which union and/or intersection operations are performed to provide a final dataset. The analysis tool provides a user interface for inputting analysis parameters and outputting information corresponding to an analysis result. The analysis tool may sort the information corresponding to the analysis result, e.g., to output the topmost or bottommost results.

Type: Grant

Filed: June 15, 2007

Date of Patent: July 27, 2010

Assignee: Microsoft Corporation

Inventors: Yantao Li, Guowei Liu, Haidong Zhang, Adnan Azfar Mahmud, Bing Sun, Min Wang, Wenli Zhu, Jian Wang
System and method for providing a trustworthy inverted index to enable searching of records

Patent number: 7765215

Abstract: A trustworthy inverted index system processes records to identify features for indexing, generates posting lists corresponding to features in a dictionary, maintains in a storage cache a tail of at least one of the posting lists to minimize random I/Os to the index, determines a desired number of the posting lists based on a desired level of insertion performance, a query performance, or a size of the storage cache, and reads a posting list corresponding to a search feature in a query to identify records that comprise the search feature. The system maps the features in the dictionary to the desired number of posting lists. The system uses a jump pointer to point from one entry to the next in the posting lists based on increasing values of entries in the posting lists.

Type: Grant

Filed: August 22, 2006

Date of Patent: July 27, 2010

Assignee: International Business Machines Corporation

Inventors: Windsor Wee Sun Hsu, Soumyadeb Mitra
SUPERVISED SEMANTIC INDEXING AND ITS EXTENSIONS

Publication number: 20100179933

Abstract: A system and method for determining a similarity between a document and a query includes building a weight vector for each of a plurality of documents in a corpus of documents stored in memory and building a weight vector for a query input into a document retrieval system. A weight matrix is generated which distinguishes between relevant documents and lower ranked documents by comparing document/query tuples using a gradient step approach. A similarity score is determined between weight vectors of the query and documents in a corpus by determining a product of a document weight vector, a query weight vector and the weight matrix.

Type: Application

Filed: September 18, 2009

Publication date: July 15, 2010

Applicant: NEC Laboratories America, Inc.

Inventors: BING BAI, Jason Weston, Ronan Collobert, David Grangier
Index compression

Patent number: 7756877

Abstract: Systems and methods for compressing an index are described. In one exemplary method, the results of a search are annotated and then encoded into one or more chunks of compressed data in accordance with the annotations of the results. The annotations include an indication of a best encoding method selected from a set of available encoding methods, and an indication of whether to switch to a new chunk during encoding or to continue encoding in the current chunk. Other methods are described and data processing systems and machine readable media are also described.

Type: Grant

Filed: August 4, 2006

Date of Patent: July 13, 2010

Assignee: Apple Inc.

Inventor: Wayne Loofbourrow
System and method for building and retrieving a full text index

Patent number: 7752193

Abstract: An indexing engine generates a full text index of English and non-English files provided to the indexing engine. The indexing engine receives an input file for indexing, and normalizes the unique words contained in the input file. The normalizing includes stripping the words of any diacritical marks, taking into account different multilingual issues, case folding the words into lowercase, and the like. The normalized words are stored in a dictionary, and a word record is generated for each stored word. Each word record includes a flag that indicates whether one or more variations exist in the input file for the normalized word. One or more tables store information on the variations for the normalized words. When a query engine is invoked to search for an input query word, the variations are searched only if the user has set an option to consider such variations.

Type: Grant

Filed: September 6, 2007

Date of Patent: July 6, 2010

Assignee: Guidance Software, Inc.

Inventor: Dominik Weber
Method of sorting text and string searching

Patent number: 7734671

Abstract: A method of sorting text for memory efficient searching is disclosed. A FM-index is created on received text, and a number of rows are marked. The locations of the marked rows are stored in data buckets as well as the last column of the FM-index, which is stored as a wavelet tree. Data blocks containing the data buckets are created; containing the number of times each character appears in the data block before each data bucket. A header block is created comprising an array of the number of times each character appears in the last column of the FM-index before each data blocks, the location of the end of the data blocks and the location of the end of the data, and appended to the data block. The header and data blocks are stored. The search process loads data buckets into memory as needed to find the required text.

Type: Grant

Filed: October 9, 2007

Date of Patent: June 8, 2010

Assignee: The United States of America as represented by the Director, National Security Agency

Inventor: Michael P. Ferguson
Data management system and data management method

Patent number: 7730071

Abstract: A file system transfer designation section for transferring the file system matching with file system transfer rules from the first volume of the first storage apparatus to the second volume of the second storage apparatus based on the first file system transfer rules, a file system storage information manager for updating storage information of the file system in accordance with transfer of the file system by the file system transfer designation section, and transmitting the updated file system storage information, and a search information manager for updating search information for searching the files based on a file search request from the client apparatus using the file system storage information sent by the file system storage information manager are provided.

Type: Grant

Filed: November 28, 2006

Date of Patent: June 1, 2010

Assignee: Hitachi, Ltd.

Inventors: Masaaki Iwasaki, Kiyotake Kumazawa
DOCUMENT SIMILARITY SCORING AND RANKING METHOD, DEVICE AND COMPUTER PROGRAM PRODUCT

Publication number: 20100131515

Abstract: A device, computer program product and a method for computing the similarity of a set of documents that avoids the large, wasted computational effort involved in calculating very small similarity scores by using thresholds to stop a similarity calculation between documents, thus ensuring that, with high probability, all document pairs with higher similarity than the thresholds have been found.

Type: Application

Filed: November 13, 2009

Publication date: May 27, 2010

Applicant: TELENOR ASA

Inventors: Geoffrey CANRIGHT, Kenth ENGO-MONSEN
System and method for facilitating full text searching utilizing inverted keyword indices

Patent number: 7716211

Abstract: A system and method for facilitating full text searching utilizing inverted keyword indices in shared memory are provided. An inverted keyword index and an inverted keyword attribute index are created from keyword tokens from a set of documents. The keyword indices are stored in a shared memory buffer and accessed by a query processing component. Shared memory pointers corresponding to the indices are dynamically adjusted according to the addressing schema of the query processing component. The query processing component then processes data queries from the keyword indices stored in the shared memory buffer.

Type: Grant

Filed: February 10, 2004

Date of Patent: May 11, 2010

Assignee: Microsoft Corporation

Inventors: Kyle G. Peltonen, Michael M. H. Cheng, David J. Lee
PRIME INDEXING AND/OR OTHER RELATED OPERATIONS

Publication number: 20100114845

Abstract: Embodiments of prime indexing and/or other related operations are disclosed. prime indexing and/or other related operations are disclosed.

Type: Application

Filed: November 5, 2009

Publication date: May 6, 2010

Applicant: Skyler Technology, Inc.

Inventors: Richard Crandall, Sam Noble
Method and apparatus for identifying related searches in a database search system

Publication number: 20100106706

Abstract: A method of generating a search result list also provides related searches for use by a searcher. Search listings which generate a match with a search request submitted by the searcher are identified in a pay-for-placement database which includes a plurality of search listings. Related search listings contained in a related search database generated from the pay-for-placement database are identified as relevant to the search request. A search result list is returned to the searcher including the identified search listings and one or more of the identified search listings.

Type: Application

Filed: December 17, 2009

Publication date: April 29, 2010

Applicant: Yahoo! Inc.

Inventors: Phillip G. Rorex, Thomas A. Soulanille, Bradley R. Haugaard
ROUTING XML QUERIES

Publication number: 20100100552

Abstract: A vast amount of information currently accessible over the Web, and in corporate networks, is stored in a variety of databases, and is being exported as XML data. However, querying this totality of information in a declarative and timely fashion is problematic because this set of databases is dynamic, and a common schema is difficult to maintain. The present invention provides a solution to the problem of issuing declarative, ad hoc XPath queries against such a dynamic collection of XML databases, and receiving timely answers. There is proposed a decentralized architectures, under the open and the agreement cooperation models between a set of sites, for processing queries and updates to XML data. Each site consists of XML data nodes. (which export their data as XML, and also pose queries) and one XML router node (which manages the query and update interactions between sites). The architectures differ in the degree of knowledge individual router nodes have about data nodes containing specific XML data.

Type: Application

Filed: December 22, 2009

Publication date: April 22, 2010

Inventors: Nikolaos Koudas, Divesh Srivastava, Michael Rabinovich
Information retrieval from a collection of data

Patent number: 7702677

Abstract: A method of accessing information from a collection of data includes receiving a query, generating an inverse index of the collection of data and generating results to the query in conjunction with the inverse index.

Type: Grant

Filed: March 11, 2008

Date of Patent: April 20, 2010

Assignee: International Business Machines Corporation

Inventors: Jane Wen Chang, Raymond Lau, Michael Kyle McCandless
SYSTEM AND METHOD FOR DISTRIBUTED INDEX SEARCHING OF ELECTRONIC CONTENT

Publication number: 20100094877

Abstract: There are provided methods and systems for efficient search in a peer-to-peer network topology. In various embodiments, search methods and systems provide for response times and network traffic that are independent from the number of query terms, thereby producing constant run-time searches and bandwidth hits in a P2P network search implementation. By distributing inverse indexes between peers, and storing with each inverse index a Bloom filter populated with selected keywords, multi-term search and analysis can be conducted on one network node without requiring exchange of posting lists between various network nodes.

Type: Application

Filed: October 13, 2009

Publication date: April 15, 2010

Inventor: Wolf Garbe

prev 1 2 3 4 5 6 next