Inverted Index Patents (Class 707/742)

USING AN INVERTED INDEX TO PRODUCE AN ANSWER TO A QUERY

Publication number: 20130097126

Abstract: In response to a query having a search term, an inverted index that is defined on a set of attributes of a database structure is accessed, where the inverted index associates values of the set of attributes with corresponding references to rows of the database structure. It is determined whether any of the attributes in the set is in the search term. In response to determining that any of the attributes in the set is in the search term, the inverted index is used to produce an answer to the query.

Type: Application

Filed: October 17, 2011

Publication date: April 18, 2013

Inventor: D. Blair ELZINGA
Segmenting text for searching

Patent number: 8423350

Abstract: Methods, systems, and apparatus, including computer program products, for segmenting text for searching are disclosed. In one implementation, a method is provided. The method includes receiving text; segmenting the text into one or more unigrams; filtering the one or more unigrams to identify one or more core unigrams; and generating a searchable resource, including: for each of the one or more core unigrams: identifying a stem, indexing the stem, and associating one or more second n-grams with the indexed stem. Each of the one or more second n-grams is derived from the text and includes a core unigram that is related to the indexed stem.

Type: Grant

Filed: May 21, 2009

Date of Patent: April 16, 2013

Assignee: Google Inc.

Inventors: Sunil Chandra, Harshit Chopra, Siddaarth Shanmugam
AUGMENTING SEARCH WITH ASSOCIATION INFORMATION

Publication number: 20130086071

Abstract: Techniques and tools are described for augmenting search using association information. Searches can be performed using a combination of index information and association information. In some examples, index information is stored in a first data store and association information is stored in a second data store. Search queries can be received and modified using association information. Modified search queries can be executed using a combination of index information and association information. Index information can be generated by indexing a set of documents. Association information can be generated by monitoring user activity occurring between users and a set of documents.

Type: Application

Filed: September 30, 2011

Publication date: April 4, 2013

Applicant: Jive Software, Inc.

Inventors: Lance Riedel, Georgios Mavromatis
Searching apparatus and searching method

Patent number: 8412697

Abstract: A searching apparatus includes a memory unit which stores transposed indexes representing appearing positions of all n-grams in plural pieces of document data subjected to searching and appearing frequencies, an n-gram extracting unit that extracts all n-grams extractable from a searching character string, a smallest-frequency deriving unit which refers to the appearing frequency of the n-gram represented by the transposed index, and derives an n-gram with the smallest appearing frequency among all of the extracted n-grams, a searching n-gram selecting unit that selects, from all extracted n-grams, a plurality of searching n-grams which form the searching character string and include the n-gram with the smallest appearing frequency, and a document specifying unit that specifies, based on the plurality of selected searching n-grams and the appearing position of the searching n-gram represented by the transposed index, document data including the searching character string among the plural pieces of document da

Type: Grant

Filed: April 26, 2011

Date of Patent: April 2, 2013

Assignee: Casio Computer Co., Ltd.

Inventor: Katsuhiko Satoh
Methods for Indexing and Searching Based on Language Locale

Publication number: 20130073559

Abstract: In response to a search query having a search term received from a client, a current language locale is determined. A state machine is built based on the current language locale, where the state machine includes one or more nodes to represent variance of the search term having identical meaning of the search term. Each node of the state machine is traversed to identify one or more postings lists of an inverted index corresponding to each node of the state machine. One or more item identifiers obtained from the one or more postings list are returned to the client, where the item identifiers identify one or more files that contain the variance of the search term represented by the state machine.

Type: Application

Filed: September 13, 2012

Publication date: March 21, 2013

Inventors: John M. Hörnkvist, Eric R. Koebler
System and method for distributed index searching of electronic content

Patent number: 8359318

Abstract: There are provided methods and systems for efficient search in a peer-to-peer network topology. In various embodiments, search methods and systems provide for response times and network traffic that are independent from the number of query terms, thereby producing constant run-time searches and bandwidth hits in a P2P network search implementation. By distributing inverse indexes between peers, and storing with each inverse index a Bloom filter populated with selected keywords, multi-term search and analysis can be conducted on one network node without requiring exchange of posting lists between various network nodes.

Type: Grant

Filed: October 13, 2009

Date of Patent: January 22, 2013

Inventor: Wolf Garbe
REAL-TIME SEARCH OF VERTICALLY PARTITIONED, INVERTED INDEXES

Publication number: 20130018891

Abstract: Provided are techniques for processing a query. A query including constraints for at least two vertically partitioned, inverted indexes is received. The constraints in the query are separated based on the vertically partitioned, inverted indexes. A document identifier iterator is obtained for each of the constraints, wherein each document identifier iterator is associated with a posting list, and wherein each posting list is ordered by document identifier order. A run-time join of the posting lists is performed to obtain a final result set.

Type: Application

Filed: September 13, 2012

Publication date: January 17, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Michael BUSCH, Rajesh M. DESAI, Robert A. FOYLE, Magesh JAYAPANDIAN
REAL-TIME SEARCH OF VERTICALLY PARTITIONED, INVERTED INDEXES

Publication number: 20130018916

Abstract: Provided are techniques for processing a query. A query including constraints for at least two vertically partitioned, inverted indexes is received. The constraints in the query are separated based on the vertically partitioned, inverted indexes. A document identifier iterator is obtained for each of the constraints, wherein each document identifier iterator is associated with a posting list, and wherein each posting list is ordered by document identifier order. A run-time join of the posting lists is performed to obtain a final result set.

Type: Application

Filed: July 13, 2011

Publication date: January 17, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Michael Busch, Rajesh M. Desai, Robert A. Foyle, Magesh Jayapandian
Systems and Methods for Natural Language Searching of Structured Data

Publication number: 20130013616

Abstract: The invention relates to searching structured data using natural language searches. More specifically and preferably, the invention relates to the use of an inverted file index built from generated documents to make data, typically unsearchable using a natural language search, searchable.

Type: Application

Filed: July 8, 2011

Publication date: January 10, 2013

Inventors: Jochen Lothar Leidner, Frank Schilder, Thomas Robert Zielund, Isabelle Alice Yvonne Moulinier
METHOD AND APPARATUS FOR CREATING A SEARCH INDEX FOR A COMPOSITE DOCUMENT AND SEARCHING SAME

Publication number: 20130007004

Abstract: A tool for generating at least one search index for a composite document, wherein the composite document comprises multiple component documents. The search index is generated by extracting characters from the document, segregating the characters into tokens of one or more characters, and determining location information of the tokens. The location information can include the page number of the component document and X, Y page coordinates for the tokens. The tool also provides a user interface that allows for searching of the composite document using at least one of the generated indexes. The user interface allows the user to enter one or more search terms and to select the criteria that will be used during the search. Results are presented to the user via a list of document names that are also hyperlinks to the document. The results documents are listed in order of relevancy, and fragments of text that contain the searched terms are also available to the user, for each document.

Type: Application

Filed: June 30, 2011

Publication date: January 3, 2013

Applicant: Landon IP, Inc.

Inventors: Krishmin RAI, George V. SHRECK
Method and System for Inverted Indexing of a Dataset

Publication number: 20120323927

Abstract: Methods and systems for providing an inverted index for a dataset are disclosed. The inverted index includes a position vector, with fields that correspond to values in the indexed dataset. The fields include data to be used in determining where each value appears in the dataset. The position vector is populated differently for different value types. A 1:1 value appears once in the dataset; a 1:n value appears multiple times. For a 1:1 value, the position vector stores information for where that value appears. For a 1:n value, the position vector stores a pointer, e.g. a memory reference, that identifies a list of locations where the value appears. The list can be encoded or otherwise compressed. A set of indicators can be stored for the fields indicating whether the field has 1:n or 1:1 value information. The indicator is used to control interpretation of the information in a field.

Type: Application

Filed: March 29, 2012

Publication date: December 20, 2012

Applicant: SAP AG

Inventor: Alexander Froemmgen
COMPUTERIZED SEARCHABLE DOCUMENT REPOSITORY USING SEPARATE METADATA AND CONTENT STORES AND FULL TEXT INDEXES

Publication number: 20120303632

Abstract: A computerized searchable repository stores documents as structured metadata parts and unstructured content parts using single instancing. A full text index used for keyword searching includes a metadata index and a content index. A linking structure includes metadata-to-content (MD to CT) links and content-to-metadata (CT to MD) linking entries, with each MD to CT link linking a metadata part of a document to each content part of the document, and each CT to MD linking entry having one or more CT to MD links collectively linking a content part to the metadata parts of the documents that include the content part. Indexing includes metadata indexing a metadata part, conditionally content indexing a content part, and updating the linking structure. Content indexing is performed only if the content part does not match a content part already stored and indexed. Index entries each associate a key word or key value with corresponding metadata or content parts containing the key word or key value.

Type: Application

Filed: May 26, 2011

Publication date: November 29, 2012

Applicant: MIMOSA SYSTEMS, INC.

Inventors: Rahul Kapoor, Sameer H. Ranade, Sherif M. Botros
Storage device having full-text search function

Patent number: 8321421

Abstract: According to one embodiment, a storage device includes an interface, a first and second memory blocks and a controller. The interface receives a content search request. The first memory block stores files and inverted files corresponding to contents included in the files. The second memory block stores a file search table. The controller creates the inverted file for each content included in the files and stores IDs of the files including the content in the inverted file. The controller obtains, by search of the content, a corresponding inverted file from the inverted files stored in the first memory block and stores, in the file search table, the IDs of the files included in the obtained inverted file. The controller outputs the IDs of the files stored in the file search table from the interface as a search result for the content search request.

Type: Grant

Filed: September 23, 2010

Date of Patent: November 27, 2012

Assignee: Kabushiki Kaisha Toshiba

Inventors: Kosuke Tatsumura, Atsuhiro Kinoshita
Device and method for constructing inverted indexes

Patent number: 8321485

Abstract: To achieve high speed document search, an inverted index is compressed at high compressibility by an encoding method decodable in a high process speed. In compressing an identification number of a document to obtain a byte sequence by the variable byte method, w bits are used to represent the number of occurrences of the indexing term in the document, and x bits are used to represent additional information of the posting, where x and w are integers given as parameters. When the number of occurrences cannot be represented within w bits, a certain value indicating a numeric value that cannot be represented by w bits is stored is written to the said w bits, and anther byte sequence that represents the value by the variable byte method follows. Additionally provided is a means for reading a compressed posting from any position of a list of postings called inverted lists, allowing a binary search on an inverted list.

Type: Grant

Filed: November 7, 2007

Date of Patent: November 27, 2012

Assignee: Hitachi, Ltd.

Inventors: Tomohiro Yasuda, Makoto Iwayama, Osamu Imaichi
Automated forensic document signatures

Patent number: 8312023

Abstract: Methods and systems are provided for a proactive approach for computer forensic investigations. The invention allows organizations anticipating the need for forensic analysis to prepare in advance. Digital representations are generated proactively for a specified target. A digital representation is a digest of the content of the target. Digital representations of a collection of targets indexed and organized in a data structure, such as an inverted index. The searching and comparison of digital representations of a collection of targets allows quick and accurate identification of targets having identical or similar content. Computational and storage costs are expended in advance, which allows more efficient computer forensic investigations. The present invention can be applied to numerous applications, such as computer forensic evidence gathering, misuse detection, network intrusion detection, and unauthorized network traffic detection and prevention.

Type: Grant

Filed: May 12, 2008

Date of Patent: November 13, 2012

Assignee: Georgetown University

Inventors: Thomas Clay Shields, Ophir Frieder, Marcus A. Maloof
System and method for semantic search

Patent number: 8301633

Abstract: Systems and methods for semantic search are provided. A corpus of information grouped into passages are indexed by semantic key terms generated from packed knowledge representations that document the semantic relationships of information within those passages. When a search is conducted, a query is similarly transformed into a packed knowledge representation that documents the semantic relationships from which semantic key terms are also generated. An inverted index relating the semantic key terms associated to the passages is searched using the semantic key terms generated from the query. A set of candidate passages is selected and refined by analysis of the semantic key terms and other information. The semantic representations associated with the set of candidate passages are then matched to the semantic representation of the query to determine a search result set.

Type: Grant

Filed: October 1, 2007

Date of Patent: October 30, 2012

Assignee: Palo Alto Research Center Incorporated

Inventor: Robert D. Cheslow
Method and apparatus for processing A query

Publication number: 20120259862

Abstract: Provided are a method and apparatus for processing a query. The method includes generating string sets comprising a plurality of partial strings from a query string, determining a subset of the string sets as a candidate set, and searching for a document comprising the query string from the candidate set.

Type: Application

Filed: October 14, 2011

Publication date: October 11, 2012

Inventors: Younghoon Kim, Hyoung Park, Kyuseok Shim, Kyoung-gu Woo
Method and apparatus for enhancing electronic reading by identifying relationships between sections of electronic text

Patent number: 8275785

Abstract: An apparatus, method and article of manufacture of the present invention detects the presence of references to the same concept in separate sections of text, and, with no input required from the reader, presents the reader with information concerning the detected references to the concept. The information provided may comprise information related to the location of the reference to the concept in other sections of text, and the reader also is provided the ability to move from one reference to a concept directly to another reference to the same concept.

Type: Grant

Filed: June 6, 2011

Date of Patent: September 25, 2012

Inventor: Philip R Krause
Method and apparatus for enhancing electronic reading by identifying relationships between sections of electronic text

Patent number: 8275776

Abstract: An apparatus, method and article of manufacture of the present invention detects the presence of references to the same concept in separate sections of text, and, with no input required from the reader, presents the reader with information concerning the detected references to the concept. The information provided may comprise information related to the location of the reference to the concept in other sections of text, and the reader also is provided the ability to move from one reference to a concept directly to another reference to the same concept.

Type: Grant

Filed: June 6, 2011

Date of Patent: September 25, 2012

Inventor: Philip R Krause
Searching documents for ranges of numeric values

Patent number: 8271498

Abstract: Provided are a method, system, and article of manufacture for searching documents for ranges of numeric values. Document identifiers for documents are accessed, wherein the documents include at least one value that is a member of a set of values. A number of posting lists are generated. Each posting list is associated with a range of consecutive values within the set of values and includes document identifiers for documents including at least one value within the range of consecutive values associated with the posting list, and wherein each document identifier is associated with one value in the set of values included in the document identified by the document identifier. The generated posting lists are stored, wherein the posting lists are used to process a query on a range of values within the set of values.

Type: Grant

Filed: August 12, 2008

Date of Patent: September 18, 2012

Assignee: International Business Machines Corporation

Inventors: Marcus Felipe Fontoura, Ronny Lempel, Runping Qi, Jason Yeong Zien
Incremental maintenance of inverted indexes for approximate string matching

Patent number: 8271499

Abstract: In embodiments of the disclosed technology, indexes, such as inverted indexes, are updated only as necessary to guarantee answer precision within predefined thresholds which are determined with little cost in comparison to the updates of the indexes themselves. With the present technology, a batch of daily updates can be processed in a matter of minutes, rather than a few hours for rebuilding an index, and a query may be answered with assurances that the results are accurate or within a threshold of accuracy.

Type: Grant

Filed: June 10, 2009

Date of Patent: September 18, 2012

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Marios Hadjieleftheriou, Nick Koudas, Divesh Srivastava
Indexing and searching JSON objects

Patent number: 8260784

Abstract: Disclosed is a method of encoding JavaScript Object Notation (JSON) documents in an inverted index, wherein a tree representation of a JSON document is first generated, and, next, the JSON document is shredded into a list of <value, path, type, jdewey> tuples for each atom node, n, in the tree, where value is a label associated with n, path is a concatenation of node labels associated with ancestors of n, type is a description of a type of value, and jdewey of n is a partial Dewey code of its closest ancestor array node, if one exists, or empty, otherwise. Lastly, an inverted index is built using <path, type, value> as index term, and jdewey as payload. A method is also described to search the inverted index.

Type: Grant

Filed: February 13, 2009

Date of Patent: September 4, 2012

Assignee: International Business Machines Corporation

Inventors: Kevin Scott Beyer, Jun Rao, Eugene J Shekita
CONTEXTUAL WEIGHTING AND EFFICIENT RE-RANKING FOR VOCABULARY TREE BASED IMAGE RETRIEVAL

Publication number: 20120221572

Abstract: Systems and methods are disclosed to search for a query image, by detecting local invariant features and local descriptors; retrieving best matching images by quantizing the local descriptors with a vocabulary tree; and reordering retrieved images with results from the vocabulary tree quantization.

Type: Application

Filed: December 28, 2011

Publication date: August 30, 2012

Applicant: NEC LABORATORIES AMERICA, INC.

Inventors: Xiaoyu Wang, Ming Yang, Timothee Cour, Shenghuo Zhu, Kai Yu
System and method for generation of computer index files

Patent number: 8250075

Abstract: Methods and systems for the generation of computer readable indexes or other ordered lists are provided. A corpus of electronic documents or other electronic information is parsed into postings that include key and reference pairs. An inversion buffer in memory is explicitly or implicitly formatted to receive the postings in a predetermined order by key. Each key is assigned a space in the inversion buffer that is subsequently filled with references associated with the key during an inversion method. In an embodiment, an index file is generated directly from the inversion buffer, or in the case of large inversions, from a plurality of inversion buffer segments.

Type: Grant

Filed: December 22, 2006

Date of Patent: August 21, 2012

Assignee: Palo Alto Research Center Incorporated

Inventor: Robert D. Cheslow
Learning syntactic patterns for automatic discovery of causal relations from text

Patent number: 8244730

Abstract: The present invention provides a method for extracting relationships between words in textual data. Initially, training relationship data, such as word triplets describing a cause-effect relationship, is received and used to collect additional textual data including the training relationship data. Distributed data collection is used to receive the training data and collect the additional textual data, allowing a broad range of data to be acquired from multiple sources. Syntactic patterns are extracted from the additional textual data and a distributed data source is scanned to extract additional relationship data describing one or more causal relationships using the extracted syntactic patterns. The extracted additional relationship data is then stored, and can be validated by a supervised learning algorithm before storage and used to train a classifier for automatic validation of additional relationship data.

Type: Grant

Filed: May 29, 2007

Date of Patent: August 14, 2012

Assignee: Honda Motor Co., Ltd.

Inventor: Rakesh Gupta
Adaptive hybrid reasoning decision support system

Patent number: 8244733

Abstract: A method for indexing a plurality of nodes using a computer system is provided. The computer system includes data storage and a processor coupled to the data storage. The method includes acts of storing the plurality of nodes in the data storage, each of the plurality of nodes having a hit count, a link count and an outcome, creating a qualitative index ordering a plurality of nodes according to the hit count, the link count and the outcome of each node and storing the qualitative index in the data storage. The hit count of each node indicates a number of times a case attribute associated with the node is presented to a user. The link count of each node indicates a number of times the case attribute associated with the node is affirmed as useful. The outcome of each node indicates a desirability of the outcome.

Type: Grant

Filed: May 5, 2009

Date of Patent: August 14, 2012

Assignee: University of Massachusetts

Inventors: Paul J. Fortier, Theophano Mitsa, Nancy Dluhy
Method and apparatus for creating an index of network data for a set of messages

Patent number: 8239382

Abstract: A method for creating an index of network data for a set of message data, the index being arranged for searching the set of message data. A method in accordance with an embodiment of the invention includes: creating a set of dialogue records, where each the dialogue record is the set of messages corresponding to a dialogue between a sender and recipient pair in a message corpus; —logging each of the set of messages in each corresponding dialogue record; and creating an index of terms from the set of messages, the index being arranged to index each term to each dialogue record in which the message comprising the respective term is logged.

Type: Grant

Filed: June 24, 2008

Date of Patent: August 7, 2012

Assignee: International Business Machines Corporation

Inventor: Stephen A. Davies
DIRECTORY TREE SEARCH

Publication number: 20120179689

Abstract: Directory tree searching uses a path index to determine a set of documents tor a directory path portion of a search query. The set of documents for the directory path portion is evaluated with a set of document for an indexed term portion of the search query to determine common documents.

Type: Application

Filed: January 13, 2012

Publication date: July 12, 2012

Inventors: John M. Hornkvist, Eric R. Koebler
Indexing mechanism for efficient node-aware full-text search over XML

Patent number: 8219563

Abstract: Techniques are provided for searching within a collection of XML documents. A relational table in an XML index stores an entry for each node of a set of nodes in the collection. Each entry of the relational table stores an order key and a path identifier along with the atomized value of the node. An index on the atomized value provides a mechanism to perform a node-aware full-text search. Instead of storing the atomized value in the table, a virtual column may be created to represent, for each node, the atomized value of the node. Alternately, each entry of the relational table stores an order key and a path identifier along with, for simple nodes, the atomized value, and for complex nodes, a null value. For a complex node with a descendant text node, a separate entry is stored for the descendant text node in the relational table.

Type: Grant

Filed: December 30, 2008

Date of Patent: July 10, 2012

Assignee: Oracle International Corporation

Inventors: Thomas Baby, Zhen Hua Liu, Wesley Lin
METHOD, APPARATUS AND COMPUTER READABLE MEDIUM FOR INDEXING ADVERTISEMENTS TO COMBINE RELEVANCE WITH CONSUMER CLICK FEEDBACK

Publication number: 20120166445

Abstract: A method and apparatus are provided for better web ad matching by combining relevance with consumer click feedback. In one example, the method includes receiving a query page, extracting features from the query page, re-weighting the query page, evaluating the query page in light of each ad in order to score each ad and pick substantially best ad matches of the indexed ads, and returning the substantially best ad matches to the consumer computer.

Type: Application

Filed: March 7, 2012

Publication date: June 28, 2012

Inventors: Deepayan Chakrabarti, Deepak K. Agrawal, Vanja Josifovski
CLUSTERING A COLLECTION USING AN INVERTED INDEX OF FEATURES

Publication number: 20120150867

Abstract: Provided are techniques for creating an inverted index for features of a set of data elements, wherein each of the data elements is represented by a vector of features, wherein the inverted index, when queried with a feature, outputs one or more data elements containing the feature. The features of the set of data elements are ranked. For each feature in the ranked list, the inverted index is queried for data elements having the feature and not having any previously selected feature and a cluster of the data elements is created based on results returned in response to the query.

Type: Application

Filed: December 13, 2010

Publication date: June 14, 2012

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Danish Contractor, Thomas Hampp-Bahnmueller, Sachindra Joshi, Raghuram Krishnapuram, Kenney Ng
Index compression

Patent number: 8190614

Abstract: Systems and methods for compressing an index are described. In one exemplary method, the results of a search are annotated and then encoded into one or more chunks of compressed data in accordance with the annotations of the results. The annotations include an indication of a best encoding method selected from a set of available encoding methods, and an indication of whether to switch to a new chunk during encoding or to continue encoding in the current chunk. Other methods are described and data processing systems and machine readable media are also described.

Type: Grant

Filed: July 12, 2010

Date of Patent: May 29, 2012

Assignee: Apple Inc.

Inventor: Wayne Loofbourrow
Analyzing software usage with instrumentation data

Patent number: 8176476

Abstract: Described is a technology by which software instrumentation data collected from user program sessions are analyzed to output an analysis report or the like via example methods and an architecture configured for efficient operation. A client component queries a service for analysis related information. To process the query, the service works with a data manager, and via a high dimensional analysis component may use information processed from the software instrumentation data, such as in the form of one or more inverted indexes and/or raw value files. The service may include a usage analysis component, a feature recognition component that locates features from command sequences, a user recognition component and/or a program reliability component. One or more counterpart components at the client may generate analysis reports or the like based on the query results. The client also may maintain user libraries and feature libraries to facilitate analyses.

Type: Grant

Filed: June 15, 2007

Date of Patent: May 8, 2012

Assignee: Microsoft Corporation

Inventors: Yantao Li, Adnan Azfar Mahmud, Wenli Zhu, Haidong Zhang, Shuguang Ye, Bing Sun, Qiang Wang, Yingnong Dang, Guowei Liu, Min Wang, Jian Wang
METHODS FOR INDEXING AND SEARCHING BASED ON LANGUAGE LOCALE

Publication number: 20120109970

Abstract: In response to a search query having a search term received from a client, a current language locale is determined. A state machine is built based on the current language locale, where the state machine includes one or more nodes to represent variance of the search term having identical meaning of the search term. Each node of the state machine is traversed to identify one or more postings lists of an inverted index corresponding to each node of the state machine. One or more item identifiers obtained from the one or more postings list are returned to the client, where the item identifiers identify one or more files that contain the variance of the search term represented by the state machine.

Type: Application

Filed: October 27, 2010

Publication date: May 3, 2012

Applicant: APPLE INC.

Inventors: John M. Hörnkvist, Eric R. Koebler
Index optimization for ranking using a linear model

Patent number: 8171031

Abstract: Technologies are described herein for providing a more efficient approach to ranking search results. An illustrative technology reduces an amount of ranking data analyzed at query time. In the technology, a term is selected, at index time, from a master index. The term corresponds to a number of documents greater than a threshold. A set of documents that includes the term is selected based on the master index. A rank is determined for each document in the set of documents that contains the term. Each document in the set of documents that contains the term is assigned to a top document list or a bottom document list based on the rank. Predefined values of at least part of the rank are stored in the top document list for documents in the top document list and are not stored in the bottom document list for documents in the bottom document list.

Type: Grant

Filed: January 19, 2010

Date of Patent: May 1, 2012

Assignee: Microsoft Corporation

Inventors: Vladimir Tankovich, Dmitriy Meyerzon, Mihai Petriuc
Search index format optimizations

Patent number: 8166041

Abstract: A search index structure which extends a typical composite index by incorporating an index which is optimized for fast retrieval from storage and which eliminates data which is specific to phrase searching. Other data is represented in a manner which allows it to be calculated rather than stored. Associating variable length entries with logical categories allows their length to be inferred from the category rather than stored. Using delta values between document IDs rather than the ID itself generates a compact, dense symbol set which is efficiently compressed by Huffman encoding or a similar compression method. Using an upper threshold to remove large, and thus rare, delta values from the symbol set prior to encoding further improves the encoding performance.

Type: Grant

Filed: June 13, 2008

Date of Patent: April 24, 2012

Assignee: Microsoft Corporation

Inventors: Chadd Creighton Merrigan, Mihai Petriuc, Raif Khassanov, Artsiom Ivanovich Kokhan
METHOD OF UPDATING AN INVERTED INDEX, AND A SERVER IMPLEMENTING THE METHOD

Publication number: 20120089611

Abstract: This method of updating an inverted index from at least one electronic document in which each electronic document is constituted by at least one ordered set of objects comprises, for each of said objects: a step of identifying a descriptor of said object, the descriptor being represented in the form of a tree; a step of determining a terminal leaf of said tree; and a step for updating a packet of information pointed to by said leaf, said packet of information including at least the list of said documents including said object.

Type: Application

Filed: October 4, 2011

Publication date: April 12, 2012

Inventor: Pierre Brochard
Method and Apparatus for Searching a Hierarchical Database and an Unstructured Database with a Single Search Query

Publication number: 20120084296

Abstract: Techniques for searching a hierarchical database and an unstructured database with a single search query are described herein.

Type: Application

Filed: October 12, 2011

Publication date: April 5, 2012

Applicant: CITRIX ONLINE LLC

Inventor: Christopher Waters
Processor for fast contextual matching

Patent number: 8135717

Abstract: Words having selected characteristics in a corpus of documents are found using a data processor arranged to execute queries. Memory stores an index structure in which entries in the index structure map words and marks for words having the selected characteristics to locations within documents in the corpus. Entries in the index structure represent words and other entries represent marks with the location information of a marked word. The entries for the marks can be tokens coalesced with prefixes of respective marked words or adjacent. A query processor forms a modified query by adding a mark for a word to the query. The processor executes the modified query.

Type: Grant

Filed: March 30, 2009

Date of Patent: March 13, 2012

Assignee: SAP America, Inc.

Inventors: Ramana B. Rao, Swapnil Hajela, Nareshkumar Rajkumar
Methods and Systems for Compressing Indices

Publication number: 20120059828

Abstract: Systems and methods for compressing indices are described. In one aspect, a plurality of items are selected where each item has an entry in an inverted index and each item entry comprises a listing of articles that the item appears in. At least a first item entry and a second item entry are determined for compression and the second item entry is compressed into the first item entry resulting in a compressed first item entry.

Type: Application

Filed: November 14, 2011

Publication date: March 8, 2012

Applicant: GOOGLE INC.

Inventor: Adam J. Weissman
Processor for fast phrase searching

Patent number: 8131730

Abstract: Phrases in a corpus of documents including stopwords are found using a data processor arranged to execute phrase queries. Memory stores an index structure which maps entries in the index structure to documents in the corpus. Entries in the index structure represent words and other entries represent stopwords found in the corpus coalesced with prefixes of respective adjacent words adjacent to the stopwords. The prefixes comprise one or more leading characters of the respective adjacent words. A query processor forms a modified query by substituting a stopword with a search token representing the stopword coalesced with a prefix of the next word in the query. The processor executes the modified query. Also, index structures including coalesced stopwords are created and maintained.

Type: Grant

Filed: March 30, 2009

Date of Patent: March 6, 2012

Assignee: SAP America, Inc.

Inventors: Swapnil Hajela, Nareshkumar Rajkumar
Unified inverted index for video passage retrieval

Patent number: 8126897

Abstract: A method for information retrieval includes extracting from a video document visual data items and textual data items that occur in the document at respective occurrence times. Indexing records, which index both the visual and the textual data items by their respective occurrence times, are constructed and stored in a memory.

Type: Grant

Filed: June 10, 2009

Date of Patent: February 28, 2012

Assignee: International Business Machines Corporation

Inventors: Benjamin Sznajder, Jonathan Mamou
Updating an inverted index

Patent number: 8122029

Abstract: Systems and methods for processing an index are described. To insure that the most updated index is available without having to update the index after every change (which can consume enormous resources), a specially marked postings list is generated for a changed item. During retrieval, the specially marked postings list supplements the existing content of an inverted index referencing the changed item. In this manner, the retrieval result for items containing the term under which the changed item was originally indexed is updated in accordance with the specially marked postings list to insure the most accurate retrieval result.

Type: Grant

Filed: March 28, 2011

Date of Patent: February 21, 2012

Assignee: Apple Inc.

Inventors: Wayne Loofbourrow, John Martin Hornkvist, Eric Richard Koebler, Yun-chih S. Li
ORDERED INDEX

Publication number: 20120005214

Abstract: Systems and methods for processing an index are described. A postings list of items containing a particular term are ordered in a desired retrieval order, e.g., most recent first. The ordered items are inserted into an inverted index in the desired retrieval order, resulting in an ordered inverted index from which items may be efficiently retrieved in the desired retrieval order. During retrieval, items may first be retrieved from a live index, and the retrieved items from the live and ordered indexes may be merged. The retrieved items may also be filtered in accordance with the items' file grouping parameters.

Type: Application

Filed: September 13, 2011

Publication date: January 5, 2012

Inventors: Wayne Loofbourrow, John Martin Hoernkvist, Eric Richard Koebler, Yan Arrouye
Searching related documents

Patent number: 8090722

Abstract: Systems, methods, and other embodiments associated with logically expanding a document and determining the relevance of the logically expanded document to a query are described. One method embodiment includes searching an index to locate a document identifier for a document in which a query term appears. The method includes determining whether the index entry includes an expansion identifier, and, if so, producing a logically expanded document. The logically expanded document may include both a document associated with the document identifier and a document associated with the expansion identifier. The method may then determine a relevance value of the logically expanded document with respect to the query and may provide a signal corresponding to the relevance value.

Type: Grant

Filed: March 21, 2007

Date of Patent: January 3, 2012

Assignee: Oracle International Corporation

Inventors: Muralidhar Krishnaprasad, Meeten Bhavsar
Updating an inverted index in a real time fashion

Patent number: 8082258

Abstract: Systems and methods for regularly updating portions of a merged index are provided. Initially, upon receiving an indication that modifications have occurred to content of web-based documents, dynamic update of index (DUI) objects that identify the documents and expose the modified content are composed by ascertaining relative positions of the modified content within the documents, and packaging identifiers of the documents, the relative positions, and metadata underlying the modified content into a message. The DUI objects are applied to an overloading index that maintains structured records of recent modifications. In particular, portions of the overloading index are targeted utilizing the document identifiers and the relative positions specified by the DUI object, thereby updating the targeted portions within the overloading index corresponding to the modified content without rewriting the entire overloading index.

Type: Grant

Filed: February 10, 2009

Date of Patent: December 20, 2011

Assignee: Microsoft Corporation

Inventors: Abhas Kumar, Pratibha Permandla, Gaurav Sareen, Anna Timasheva, Deepak Shankar
UPDATING AN INVERTED INDEX

Publication number: 20110289093

Abstract: Systems and methods for processing an index are described. To insure that the most updated index is available without having to update the index after every change (which can consume enormous resources), a specially marked postings list is generated for a changed item. During retrieval, the specially marked postings list supplements the existing content of an inverted index referencing the changed item. In this manner, the retrieval result for items containing the term under which the changed item was originally indexed is updated in accordance with the specially marked postings list to insure the most accurate retrieval result.

Type: Application

Filed: March 28, 2011

Publication date: November 24, 2011

Inventors: Wayne Loofbourrow, John Martin Hoernkvist, Eric Richard Koebler, Yun-chih S. Li
Self-compacting pattern indexer: storing, indexing and accessing information in a graph-like data structure

Patent number: 8065293

Abstract: An indexing system uses a graph-like data structure that clusters features indexes together. The minimum atomic value in the data structure is represented as a leaf node which is either a single feature index or a sequence of two or more feature indexes when a minimum sequence length is imposed. Root nodes are formed as clustered collections of leaf nodes and/or other root nodes. Context nodes are formed from root nodes that are associated with content that is being indexed. Links between a root node and other nodes each include a sequence order value that is used to maintain the sequencing order for feature indexes relative to the root node. The collection of nodes forms a graph-like data structure, where each context node is indexed according to the sequenced pattern of feature indexes. Clusters can be split, merged, and promoted to increase the efficiency in searching the data structure.

Type: Grant

Filed: October 24, 2007

Date of Patent: November 22, 2011

Assignee: Microsoft Corporation

Inventors: Kunal Mukerjee, R. Donald Thompson, III, Jeffrey Cole, Brendan Meeder
Methods and systems for compressing indices

Patent number: 8060516

Abstract: Systems and methods for compressing indices are described. In one aspect, a plurality of items are selected where each item has an entry in an inverted index and each item entry comprises a listing of articles that the item appears in. At least a first item entry and a second item entry are determined for compression and the second item entry is compressed into the first item entry resulting in a compressed first item entry.

Type: Grant

Filed: September 20, 2010

Date of Patent: November 15, 2011

Assignee: Google Inc.

Inventor: Adam J. Weissman
System and method for classifying tags of content using a hyperlinked corpus of classified web pages

Patent number: 8046361

Abstract: An improved system and method for classifying tags of content using a hyperlinked corpus of classified web pages is provided. An anchor text index may be searched to find anchor texts that may match text of the tag, documents referenced by the matching anchor texts may be found, and the documents referenced by the matching anchor texts may be grouped to disambiguate multiple classifications that result from matching the anchor texts with the categories of the reference documents. To resolve ambiguity between multiple classifications, weighted classifications may be used where each document may be assigned a positive weight for a mapping to a category to indicate the confidence of the classification of the document to the category. The classification for the grouping of the documents referenced by the matching anchor texts with greatest frequency may be selected and output as the classification for the tag.

Type: Grant

Filed: April 18, 2008

Date of Patent: October 25, 2011

Assignee: Yahoo! Inc.

Inventors: Börkur Sigurbjörnsson, Roelof van Zwol, Simon E. Overell

prev 1 2 3 4 5 6 next