Inverted Index Patents (Class 707/742)
  • Publication number: 20130097126
    Abstract: In response to a query having a search term, an inverted index that is defined on a set of attributes of a database structure is accessed, where the inverted index associates values of the set of attributes with corresponding references to rows of the database structure. It is determined whether any of the attributes in the set is in the search term. In response to determining that any of the attributes in the set is in the search term, the inverted index is used to produce an answer to the query.
    Type: Application
    Filed: October 17, 2011
    Publication date: April 18, 2013
    Inventor: D. Blair ELZINGA
  • Patent number: 8423350
    Abstract: Methods, systems, and apparatus, including computer program products, for segmenting text for searching are disclosed. In one implementation, a method is provided. The method includes receiving text; segmenting the text into one or more unigrams; filtering the one or more unigrams to identify one or more core unigrams; and generating a searchable resource, including: for each of the one or more core unigrams: identifying a stem, indexing the stem, and associating one or more second n-grams with the indexed stem. Each of the one or more second n-grams is derived from the text and includes a core unigram that is related to the indexed stem.
    Type: Grant
    Filed: May 21, 2009
    Date of Patent: April 16, 2013
    Assignee: Google Inc.
    Inventors: Sunil Chandra, Harshit Chopra, Siddaarth Shanmugam
  • Publication number: 20130086071
    Abstract: Techniques and tools are described for augmenting search using association information. Searches can be performed using a combination of index information and association information. In some examples, index information is stored in a first data store and association information is stored in a second data store. Search queries can be received and modified using association information. Modified search queries can be executed using a combination of index information and association information. Index information can be generated by indexing a set of documents. Association information can be generated by monitoring user activity occurring between users and a set of documents.
    Type: Application
    Filed: September 30, 2011
    Publication date: April 4, 2013
    Applicant: Jive Software, Inc.
    Inventors: Lance Riedel, Georgios Mavromatis
  • Patent number: 8412697
    Abstract: A searching apparatus includes a memory unit which stores transposed indexes representing appearing positions of all n-grams in plural pieces of document data subjected to searching and appearing frequencies, an n-gram extracting unit that extracts all n-grams extractable from a searching character string, a smallest-frequency deriving unit which refers to the appearing frequency of the n-gram represented by the transposed index, and derives an n-gram with the smallest appearing frequency among all of the extracted n-grams, a searching n-gram selecting unit that selects, from all extracted n-grams, a plurality of searching n-grams which form the searching character string and include the n-gram with the smallest appearing frequency, and a document specifying unit that specifies, based on the plurality of selected searching n-grams and the appearing position of the searching n-gram represented by the transposed index, document data including the searching character string among the plural pieces of document da
    Type: Grant
    Filed: April 26, 2011
    Date of Patent: April 2, 2013
    Assignee: Casio Computer Co., Ltd.
    Inventor: Katsuhiko Satoh
  • Publication number: 20130073559
    Abstract: In response to a search query having a search term received from a client, a current language locale is determined. A state machine is built based on the current language locale, where the state machine includes one or more nodes to represent variance of the search term having identical meaning of the search term. Each node of the state machine is traversed to identify one or more postings lists of an inverted index corresponding to each node of the state machine. One or more item identifiers obtained from the one or more postings list are returned to the client, where the item identifiers identify one or more files that contain the variance of the search term represented by the state machine.
    Type: Application
    Filed: September 13, 2012
    Publication date: March 21, 2013
    Inventors: John M. Hörnkvist, Eric R. Koebler
  • Patent number: 8359318
    Abstract: There are provided methods and systems for efficient search in a peer-to-peer network topology. In various embodiments, search methods and systems provide for response times and network traffic that are independent from the number of query terms, thereby producing constant run-time searches and bandwidth hits in a P2P network search implementation. By distributing inverse indexes between peers, and storing with each inverse index a Bloom filter populated with selected keywords, multi-term search and analysis can be conducted on one network node without requiring exchange of posting lists between various network nodes.
    Type: Grant
    Filed: October 13, 2009
    Date of Patent: January 22, 2013
    Inventor: Wolf Garbe
  • Publication number: 20130018891
    Abstract: Provided are techniques for processing a query. A query including constraints for at least two vertically partitioned, inverted indexes is received. The constraints in the query are separated based on the vertically partitioned, inverted indexes. A document identifier iterator is obtained for each of the constraints, wherein each document identifier iterator is associated with a posting list, and wherein each posting list is ordered by document identifier order. A run-time join of the posting lists is performed to obtain a final result set.
    Type: Application
    Filed: September 13, 2012
    Publication date: January 17, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael BUSCH, Rajesh M. DESAI, Robert A. FOYLE, Magesh JAYAPANDIAN
  • Publication number: 20130018916
    Abstract: Provided are techniques for processing a query. A query including constraints for at least two vertically partitioned, inverted indexes is received. The constraints in the query are separated based on the vertically partitioned, inverted indexes. A document identifier iterator is obtained for each of the constraints, wherein each document identifier iterator is associated with a posting list, and wherein each posting list is ordered by document identifier order. A run-time join of the posting lists is performed to obtain a final result set.
    Type: Application
    Filed: July 13, 2011
    Publication date: January 17, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael Busch, Rajesh M. Desai, Robert A. Foyle, Magesh Jayapandian
  • Publication number: 20130013616
    Abstract: The invention relates to searching structured data using natural language searches. More specifically and preferably, the invention relates to the use of an inverted file index built from generated documents to make data, typically unsearchable using a natural language search, searchable.
    Type: Application
    Filed: July 8, 2011
    Publication date: January 10, 2013
    Inventors: Jochen Lothar Leidner, Frank Schilder, Thomas Robert Zielund, Isabelle Alice Yvonne Moulinier
  • Publication number: 20130007004
    Abstract: A tool for generating at least one search index for a composite document, wherein the composite document comprises multiple component documents. The search index is generated by extracting characters from the document, segregating the characters into tokens of one or more characters, and determining location information of the tokens. The location information can include the page number of the component document and X, Y page coordinates for the tokens. The tool also provides a user interface that allows for searching of the composite document using at least one of the generated indexes. The user interface allows the user to enter one or more search terms and to select the criteria that will be used during the search. Results are presented to the user via a list of document names that are also hyperlinks to the document. The results documents are listed in order of relevancy, and fragments of text that contain the searched terms are also available to the user, for each document.
    Type: Application
    Filed: June 30, 2011
    Publication date: January 3, 2013
    Applicant: Landon IP, Inc.
    Inventors: Krishmin RAI, George V. SHRECK
  • Publication number: 20120323927
    Abstract: Methods and systems for providing an inverted index for a dataset are disclosed. The inverted index includes a position vector, with fields that correspond to values in the indexed dataset. The fields include data to be used in determining where each value appears in the dataset. The position vector is populated differently for different value types. A 1:1 value appears once in the dataset; a 1:n value appears multiple times. For a 1:1 value, the position vector stores information for where that value appears. For a 1:n value, the position vector stores a pointer, e.g. a memory reference, that identifies a list of locations where the value appears. The list can be encoded or otherwise compressed. A set of indicators can be stored for the fields indicating whether the field has 1:n or 1:1 value information. The indicator is used to control interpretation of the information in a field.
    Type: Application
    Filed: March 29, 2012
    Publication date: December 20, 2012
    Applicant: SAP AG
    Inventor: Alexander Froemmgen
  • Publication number: 20120303632
    Abstract: A computerized searchable repository stores documents as structured metadata parts and unstructured content parts using single instancing. A full text index used for keyword searching includes a metadata index and a content index. A linking structure includes metadata-to-content (MD to CT) links and content-to-metadata (CT to MD) linking entries, with each MD to CT link linking a metadata part of a document to each content part of the document, and each CT to MD linking entry having one or more CT to MD links collectively linking a content part to the metadata parts of the documents that include the content part. Indexing includes metadata indexing a metadata part, conditionally content indexing a content part, and updating the linking structure. Content indexing is performed only if the content part does not match a content part already stored and indexed. Index entries each associate a key word or key value with corresponding metadata or content parts containing the key word or key value.
    Type: Application
    Filed: May 26, 2011
    Publication date: November 29, 2012
    Applicant: MIMOSA SYSTEMS, INC.
    Inventors: Rahul Kapoor, Sameer H. Ranade, Sherif M. Botros
  • Patent number: 8321421
    Abstract: According to one embodiment, a storage device includes an interface, a first and second memory blocks and a controller. The interface receives a content search request. The first memory block stores files and inverted files corresponding to contents included in the files. The second memory block stores a file search table. The controller creates the inverted file for each content included in the files and stores IDs of the files including the content in the inverted file. The controller obtains, by search of the content, a corresponding inverted file from the inverted files stored in the first memory block and stores, in the file search table, the IDs of the files included in the obtained inverted file. The controller outputs the IDs of the files stored in the file search table from the interface as a search result for the content search request.
    Type: Grant
    Filed: September 23, 2010
    Date of Patent: November 27, 2012
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Kosuke Tatsumura, Atsuhiro Kinoshita
  • Patent number: 8321485
    Abstract: To achieve high speed document search, an inverted index is compressed at high compressibility by an encoding method decodable in a high process speed. In compressing an identification number of a document to obtain a byte sequence by the variable byte method, w bits are used to represent the number of occurrences of the indexing term in the document, and x bits are used to represent additional information of the posting, where x and w are integers given as parameters. When the number of occurrences cannot be represented within w bits, a certain value indicating a numeric value that cannot be represented by w bits is stored is written to the said w bits, and anther byte sequence that represents the value by the variable byte method follows. Additionally provided is a means for reading a compressed posting from any position of a list of postings called inverted lists, allowing a binary search on an inverted list.
    Type: Grant
    Filed: November 7, 2007
    Date of Patent: November 27, 2012
    Assignee: Hitachi, Ltd.
    Inventors: Tomohiro Yasuda, Makoto Iwayama, Osamu Imaichi
  • Patent number: 8312023
    Abstract: Methods and systems are provided for a proactive approach for computer forensic investigations. The invention allows organizations anticipating the need for forensic analysis to prepare in advance. Digital representations are generated proactively for a specified target. A digital representation is a digest of the content of the target. Digital representations of a collection of targets indexed and organized in a data structure, such as an inverted index. The searching and comparison of digital representations of a collection of targets allows quick and accurate identification of targets having identical or similar content. Computational and storage costs are expended in advance, which allows more efficient computer forensic investigations. The present invention can be applied to numerous applications, such as computer forensic evidence gathering, misuse detection, network intrusion detection, and unauthorized network traffic detection and prevention.
    Type: Grant
    Filed: May 12, 2008
    Date of Patent: November 13, 2012
    Assignee: Georgetown University
    Inventors: Thomas Clay Shields, Ophir Frieder, Marcus A. Maloof
  • Patent number: 8301633
    Abstract: Systems and methods for semantic search are provided. A corpus of information grouped into passages are indexed by semantic key terms generated from packed knowledge representations that document the semantic relationships of information within those passages. When a search is conducted, a query is similarly transformed into a packed knowledge representation that documents the semantic relationships from which semantic key terms are also generated. An inverted index relating the semantic key terms associated to the passages is searched using the semantic key terms generated from the query. A set of candidate passages is selected and refined by analysis of the semantic key terms and other information. The semantic representations associated with the set of candidate passages are then matched to the semantic representation of the query to determine a search result set.
    Type: Grant
    Filed: October 1, 2007
    Date of Patent: October 30, 2012
    Assignee: Palo Alto Research Center Incorporated
    Inventor: Robert D. Cheslow
  • Publication number: 20120259862
    Abstract: Provided are a method and apparatus for processing a query. The method includes generating string sets comprising a plurality of partial strings from a query string, determining a subset of the string sets as a candidate set, and searching for a document comprising the query string from the candidate set.
    Type: Application
    Filed: October 14, 2011
    Publication date: October 11, 2012
    Inventors: Younghoon Kim, Hyoung Park, Kyuseok Shim, Kyoung-gu Woo
  • Patent number: 8275785
    Abstract: An apparatus, method and article of manufacture of the present invention detects the presence of references to the same concept in separate sections of text, and, with no input required from the reader, presents the reader with information concerning the detected references to the concept. The information provided may comprise information related to the location of the reference to the concept in other sections of text, and the reader also is provided the ability to move from one reference to a concept directly to another reference to the same concept.
    Type: Grant
    Filed: June 6, 2011
    Date of Patent: September 25, 2012
    Inventor: Philip R Krause
  • Patent number: 8275776
    Abstract: An apparatus, method and article of manufacture of the present invention detects the presence of references to the same concept in separate sections of text, and, with no input required from the reader, presents the reader with information concerning the detected references to the concept. The information provided may comprise information related to the location of the reference to the concept in other sections of text, and the reader also is provided the ability to move from one reference to a concept directly to another reference to the same concept.
    Type: Grant
    Filed: June 6, 2011
    Date of Patent: September 25, 2012
    Inventor: Philip R Krause
  • Patent number: 8271498
    Abstract: Provided are a method, system, and article of manufacture for searching documents for ranges of numeric values. Document identifiers for documents are accessed, wherein the documents include at least one value that is a member of a set of values. A number of posting lists are generated. Each posting list is associated with a range of consecutive values within the set of values and includes document identifiers for documents including at least one value within the range of consecutive values associated with the posting list, and wherein each document identifier is associated with one value in the set of values included in the document identified by the document identifier. The generated posting lists are stored, wherein the posting lists are used to process a query on a range of values within the set of values.
    Type: Grant
    Filed: August 12, 2008
    Date of Patent: September 18, 2012
    Assignee: International Business Machines Corporation
    Inventors: Marcus Felipe Fontoura, Ronny Lempel, Runping Qi, Jason Yeong Zien
  • Patent number: 8271499
    Abstract: In embodiments of the disclosed technology, indexes, such as inverted indexes, are updated only as necessary to guarantee answer precision within predefined thresholds which are determined with little cost in comparison to the updates of the indexes themselves. With the present technology, a batch of daily updates can be processed in a matter of minutes, rather than a few hours for rebuilding an index, and a query may be answered with assurances that the results are accurate or within a threshold of accuracy.
    Type: Grant
    Filed: June 10, 2009
    Date of Patent: September 18, 2012
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Marios Hadjieleftheriou, Nick Koudas, Divesh Srivastava
  • Patent number: 8260784
    Abstract: Disclosed is a method of encoding JavaScript Object Notation (JSON) documents in an inverted index, wherein a tree representation of a JSON document is first generated, and, next, the JSON document is shredded into a list of <value, path, type, jdewey> tuples for each atom node, n, in the tree, where value is a label associated with n, path is a concatenation of node labels associated with ancestors of n, type is a description of a type of value, and jdewey of n is a partial Dewey code of its closest ancestor array node, if one exists, or empty, otherwise. Lastly, an inverted index is built using <path, type, value> as index term, and jdewey as payload. A method is also described to search the inverted index.
    Type: Grant
    Filed: February 13, 2009
    Date of Patent: September 4, 2012
    Assignee: International Business Machines Corporation
    Inventors: Kevin Scott Beyer, Jun Rao, Eugene J Shekita
  • Publication number: 20120221572
    Abstract: Systems and methods are disclosed to search for a query image, by detecting local invariant features and local descriptors; retrieving best matching images by quantizing the local descriptors with a vocabulary tree; and reordering retrieved images with results from the vocabulary tree quantization.
    Type: Application
    Filed: December 28, 2011
    Publication date: August 30, 2012
    Applicant: NEC LABORATORIES AMERICA, INC.
    Inventors: Xiaoyu Wang, Ming Yang, Timothee Cour, Shenghuo Zhu, Kai Yu
  • Patent number: 8250075
    Abstract: Methods and systems for the generation of computer readable indexes or other ordered lists are provided. A corpus of electronic documents or other electronic information is parsed into postings that include key and reference pairs. An inversion buffer in memory is explicitly or implicitly formatted to receive the postings in a predetermined order by key. Each key is assigned a space in the inversion buffer that is subsequently filled with references associated with the key during an inversion method. In an embodiment, an index file is generated directly from the inversion buffer, or in the case of large inversions, from a plurality of inversion buffer segments.
    Type: Grant
    Filed: December 22, 2006
    Date of Patent: August 21, 2012
    Assignee: Palo Alto Research Center Incorporated
    Inventor: Robert D. Cheslow
  • Patent number: 8244730
    Abstract: The present invention provides a method for extracting relationships between words in textual data. Initially, training relationship data, such as word triplets describing a cause-effect relationship, is received and used to collect additional textual data including the training relationship data. Distributed data collection is used to receive the training data and collect the additional textual data, allowing a broad range of data to be acquired from multiple sources. Syntactic patterns are extracted from the additional textual data and a distributed data source is scanned to extract additional relationship data describing one or more causal relationships using the extracted syntactic patterns. The extracted additional relationship data is then stored, and can be validated by a supervised learning algorithm before storage and used to train a classifier for automatic validation of additional relationship data.
    Type: Grant
    Filed: May 29, 2007
    Date of Patent: August 14, 2012
    Assignee: Honda Motor Co., Ltd.
    Inventor: Rakesh Gupta
  • Patent number: 8244733
    Abstract: A method for indexing a plurality of nodes using a computer system is provided. The computer system includes data storage and a processor coupled to the data storage. The method includes acts of storing the plurality of nodes in the data storage, each of the plurality of nodes having a hit count, a link count and an outcome, creating a qualitative index ordering a plurality of nodes according to the hit count, the link count and the outcome of each node and storing the qualitative index in the data storage. The hit count of each node indicates a number of times a case attribute associated with the node is presented to a user. The link count of each node indicates a number of times the case attribute associated with the node is affirmed as useful. The outcome of each node indicates a desirability of the outcome.
    Type: Grant
    Filed: May 5, 2009
    Date of Patent: August 14, 2012
    Assignee: University of Massachusetts
    Inventors: Paul J. Fortier, Theophano Mitsa, Nancy Dluhy
  • Patent number: 8239382
    Abstract: A method for creating an index of network data for a set of message data, the index being arranged for searching the set of message data. A method in accordance with an embodiment of the invention includes: creating a set of dialogue records, where each the dialogue record is the set of messages corresponding to a dialogue between a sender and recipient pair in a message corpus; —logging each of the set of messages in each corresponding dialogue record; and creating an index of terms from the set of messages, the index being arranged to index each term to each dialogue record in which the message comprising the respective term is logged.
    Type: Grant
    Filed: June 24, 2008
    Date of Patent: August 7, 2012
    Assignee: International Business Machines Corporation
    Inventor: Stephen A. Davies
  • Publication number: 20120179689
    Abstract: Directory tree searching uses a path index to determine a set of documents tor a directory path portion of a search query. The set of documents for the directory path portion is evaluated with a set of document for an indexed term portion of the search query to determine common documents.
    Type: Application
    Filed: January 13, 2012
    Publication date: July 12, 2012
    Inventors: John M. Hornkvist, Eric R. Koebler
  • Patent number: 8219563
    Abstract: Techniques are provided for searching within a collection of XML documents. A relational table in an XML index stores an entry for each node of a set of nodes in the collection. Each entry of the relational table stores an order key and a path identifier along with the atomized value of the node. An index on the atomized value provides a mechanism to perform a node-aware full-text search. Instead of storing the atomized value in the table, a virtual column may be created to represent, for each node, the atomized value of the node. Alternately, each entry of the relational table stores an order key and a path identifier along with, for simple nodes, the atomized value, and for complex nodes, a null value. For a complex node with a descendant text node, a separate entry is stored for the descendant text node in the relational table.
    Type: Grant
    Filed: December 30, 2008
    Date of Patent: July 10, 2012
    Assignee: Oracle International Corporation
    Inventors: Thomas Baby, Zhen Hua Liu, Wesley Lin
  • Publication number: 20120166445
    Abstract: A method and apparatus are provided for better web ad matching by combining relevance with consumer click feedback. In one example, the method includes receiving a query page, extracting features from the query page, re-weighting the query page, evaluating the query page in light of each ad in order to score each ad and pick substantially best ad matches of the indexed ads, and returning the substantially best ad matches to the consumer computer.
    Type: Application
    Filed: March 7, 2012
    Publication date: June 28, 2012
    Inventors: Deepayan Chakrabarti, Deepak K. Agrawal, Vanja Josifovski
  • Publication number: 20120150867
    Abstract: Provided are techniques for creating an inverted index for features of a set of data elements, wherein each of the data elements is represented by a vector of features, wherein the inverted index, when queried with a feature, outputs one or more data elements containing the feature. The features of the set of data elements are ranked. For each feature in the ranked list, the inverted index is queried for data elements having the feature and not having any previously selected feature and a cluster of the data elements is created based on results returned in response to the query.
    Type: Application
    Filed: December 13, 2010
    Publication date: June 14, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Danish Contractor, Thomas Hampp-Bahnmueller, Sachindra Joshi, Raghuram Krishnapuram, Kenney Ng
  • Patent number: 8190614
    Abstract: Systems and methods for compressing an index are described. In one exemplary method, the results of a search are annotated and then encoded into one or more chunks of compressed data in accordance with the annotations of the results. The annotations include an indication of a best encoding method selected from a set of available encoding methods, and an indication of whether to switch to a new chunk during encoding or to continue encoding in the current chunk. Other methods are described and data processing systems and machine readable media are also described.
    Type: Grant
    Filed: July 12, 2010
    Date of Patent: May 29, 2012
    Assignee: Apple Inc.
    Inventor: Wayne Loofbourrow
  • Patent number: 8176476
    Abstract: Described is a technology by which software instrumentation data collected from user program sessions are analyzed to output an analysis report or the like via example methods and an architecture configured for efficient operation. A client component queries a service for analysis related information. To process the query, the service works with a data manager, and via a high dimensional analysis component may use information processed from the software instrumentation data, such as in the form of one or more inverted indexes and/or raw value files. The service may include a usage analysis component, a feature recognition component that locates features from command sequences, a user recognition component and/or a program reliability component. One or more counterpart components at the client may generate analysis reports or the like based on the query results. The client also may maintain user libraries and feature libraries to facilitate analyses.
    Type: Grant
    Filed: June 15, 2007
    Date of Patent: May 8, 2012
    Assignee: Microsoft Corporation
    Inventors: Yantao Li, Adnan Azfar Mahmud, Wenli Zhu, Haidong Zhang, Shuguang Ye, Bing Sun, Qiang Wang, Yingnong Dang, Guowei Liu, Min Wang, Jian Wang
  • Publication number: 20120109970
    Abstract: In response to a search query having a search term received from a client, a current language locale is determined. A state machine is built based on the current language locale, where the state machine includes one or more nodes to represent variance of the search term having identical meaning of the search term. Each node of the state machine is traversed to identify one or more postings lists of an inverted index corresponding to each node of the state machine. One or more item identifiers obtained from the one or more postings list are returned to the client, where the item identifiers identify one or more files that contain the variance of the search term represented by the state machine.
    Type: Application
    Filed: October 27, 2010
    Publication date: May 3, 2012
    Applicant: APPLE INC.
    Inventors: John M. Hörnkvist, Eric R. Koebler
  • Patent number: 8171031
    Abstract: Technologies are described herein for providing a more efficient approach to ranking search results. An illustrative technology reduces an amount of ranking data analyzed at query time. In the technology, a term is selected, at index time, from a master index. The term corresponds to a number of documents greater than a threshold. A set of documents that includes the term is selected based on the master index. A rank is determined for each document in the set of documents that contains the term. Each document in the set of documents that contains the term is assigned to a top document list or a bottom document list based on the rank. Predefined values of at least part of the rank are stored in the top document list for documents in the top document list and are not stored in the bottom document list for documents in the bottom document list.
    Type: Grant
    Filed: January 19, 2010
    Date of Patent: May 1, 2012
    Assignee: Microsoft Corporation
    Inventors: Vladimir Tankovich, Dmitriy Meyerzon, Mihai Petriuc
  • Patent number: 8166041
    Abstract: A search index structure which extends a typical composite index by incorporating an index which is optimized for fast retrieval from storage and which eliminates data which is specific to phrase searching. Other data is represented in a manner which allows it to be calculated rather than stored. Associating variable length entries with logical categories allows their length to be inferred from the category rather than stored. Using delta values between document IDs rather than the ID itself generates a compact, dense symbol set which is efficiently compressed by Huffman encoding or a similar compression method. Using an upper threshold to remove large, and thus rare, delta values from the symbol set prior to encoding further improves the encoding performance.
    Type: Grant
    Filed: June 13, 2008
    Date of Patent: April 24, 2012
    Assignee: Microsoft Corporation
    Inventors: Chadd Creighton Merrigan, Mihai Petriuc, Raif Khassanov, Artsiom Ivanovich Kokhan
  • Publication number: 20120089611
    Abstract: This method of updating an inverted index from at least one electronic document in which each electronic document is constituted by at least one ordered set of objects comprises, for each of said objects: a step of identifying a descriptor of said object, the descriptor being represented in the form of a tree; a step of determining a terminal leaf of said tree; and a step for updating a packet of information pointed to by said leaf, said packet of information including at least the list of said documents including said object.
    Type: Application
    Filed: October 4, 2011
    Publication date: April 12, 2012
    Inventor: Pierre Brochard
  • Publication number: 20120084296
    Abstract: Techniques for searching a hierarchical database and an unstructured database with a single search query are described herein.
    Type: Application
    Filed: October 12, 2011
    Publication date: April 5, 2012
    Applicant: CITRIX ONLINE LLC
    Inventor: Christopher Waters
  • Patent number: 8135717
    Abstract: Words having selected characteristics in a corpus of documents are found using a data processor arranged to execute queries. Memory stores an index structure in which entries in the index structure map words and marks for words having the selected characteristics to locations within documents in the corpus. Entries in the index structure represent words and other entries represent marks with the location information of a marked word. The entries for the marks can be tokens coalesced with prefixes of respective marked words or adjacent. A query processor forms a modified query by adding a mark for a word to the query. The processor executes the modified query.
    Type: Grant
    Filed: March 30, 2009
    Date of Patent: March 13, 2012
    Assignee: SAP America, Inc.
    Inventors: Ramana B. Rao, Swapnil Hajela, Nareshkumar Rajkumar
  • Publication number: 20120059828
    Abstract: Systems and methods for compressing indices are described. In one aspect, a plurality of items are selected where each item has an entry in an inverted index and each item entry comprises a listing of articles that the item appears in. At least a first item entry and a second item entry are determined for compression and the second item entry is compressed into the first item entry resulting in a compressed first item entry.
    Type: Application
    Filed: November 14, 2011
    Publication date: March 8, 2012
    Applicant: GOOGLE INC.
    Inventor: Adam J. Weissman
  • Patent number: 8131730
    Abstract: Phrases in a corpus of documents including stopwords are found using a data processor arranged to execute phrase queries. Memory stores an index structure which maps entries in the index structure to documents in the corpus. Entries in the index structure represent words and other entries represent stopwords found in the corpus coalesced with prefixes of respective adjacent words adjacent to the stopwords. The prefixes comprise one or more leading characters of the respective adjacent words. A query processor forms a modified query by substituting a stopword with a search token representing the stopword coalesced with a prefix of the next word in the query. The processor executes the modified query. Also, index structures including coalesced stopwords are created and maintained.
    Type: Grant
    Filed: March 30, 2009
    Date of Patent: March 6, 2012
    Assignee: SAP America, Inc.
    Inventors: Swapnil Hajela, Nareshkumar Rajkumar
  • Patent number: 8126897
    Abstract: A method for information retrieval includes extracting from a video document visual data items and textual data items that occur in the document at respective occurrence times. Indexing records, which index both the visual and the textual data items by their respective occurrence times, are constructed and stored in a memory.
    Type: Grant
    Filed: June 10, 2009
    Date of Patent: February 28, 2012
    Assignee: International Business Machines Corporation
    Inventors: Benjamin Sznajder, Jonathan Mamou
  • Patent number: 8122029
    Abstract: Systems and methods for processing an index are described. To insure that the most updated index is available without having to update the index after every change (which can consume enormous resources), a specially marked postings list is generated for a changed item. During retrieval, the specially marked postings list supplements the existing content of an inverted index referencing the changed item. In this manner, the retrieval result for items containing the term under which the changed item was originally indexed is updated in accordance with the specially marked postings list to insure the most accurate retrieval result.
    Type: Grant
    Filed: March 28, 2011
    Date of Patent: February 21, 2012
    Assignee: Apple Inc.
    Inventors: Wayne Loofbourrow, John Martin Hornkvist, Eric Richard Koebler, Yun-chih S. Li
  • Publication number: 20120005214
    Abstract: Systems and methods for processing an index are described. A postings list of items containing a particular term are ordered in a desired retrieval order, e.g., most recent first. The ordered items are inserted into an inverted index in the desired retrieval order, resulting in an ordered inverted index from which items may be efficiently retrieved in the desired retrieval order. During retrieval, items may first be retrieved from a live index, and the retrieved items from the live and ordered indexes may be merged. The retrieved items may also be filtered in accordance with the items' file grouping parameters.
    Type: Application
    Filed: September 13, 2011
    Publication date: January 5, 2012
    Inventors: Wayne Loofbourrow, John Martin Hoernkvist, Eric Richard Koebler, Yan Arrouye
  • Patent number: 8090722
    Abstract: Systems, methods, and other embodiments associated with logically expanding a document and determining the relevance of the logically expanded document to a query are described. One method embodiment includes searching an index to locate a document identifier for a document in which a query term appears. The method includes determining whether the index entry includes an expansion identifier, and, if so, producing a logically expanded document. The logically expanded document may include both a document associated with the document identifier and a document associated with the expansion identifier. The method may then determine a relevance value of the logically expanded document with respect to the query and may provide a signal corresponding to the relevance value.
    Type: Grant
    Filed: March 21, 2007
    Date of Patent: January 3, 2012
    Assignee: Oracle International Corporation
    Inventors: Muralidhar Krishnaprasad, Meeten Bhavsar
  • Patent number: 8082258
    Abstract: Systems and methods for regularly updating portions of a merged index are provided. Initially, upon receiving an indication that modifications have occurred to content of web-based documents, dynamic update of index (DUI) objects that identify the documents and expose the modified content are composed by ascertaining relative positions of the modified content within the documents, and packaging identifiers of the documents, the relative positions, and metadata underlying the modified content into a message. The DUI objects are applied to an overloading index that maintains structured records of recent modifications. In particular, portions of the overloading index are targeted utilizing the document identifiers and the relative positions specified by the DUI object, thereby updating the targeted portions within the overloading index corresponding to the modified content without rewriting the entire overloading index.
    Type: Grant
    Filed: February 10, 2009
    Date of Patent: December 20, 2011
    Assignee: Microsoft Corporation
    Inventors: Abhas Kumar, Pratibha Permandla, Gaurav Sareen, Anna Timasheva, Deepak Shankar
  • Publication number: 20110289093
    Abstract: Systems and methods for processing an index are described. To insure that the most updated index is available without having to update the index after every change (which can consume enormous resources), a specially marked postings list is generated for a changed item. During retrieval, the specially marked postings list supplements the existing content of an inverted index referencing the changed item. In this manner, the retrieval result for items containing the term under which the changed item was originally indexed is updated in accordance with the specially marked postings list to insure the most accurate retrieval result.
    Type: Application
    Filed: March 28, 2011
    Publication date: November 24, 2011
    Inventors: Wayne Loofbourrow, John Martin Hoernkvist, Eric Richard Koebler, Yun-chih S. Li
  • Patent number: 8065293
    Abstract: An indexing system uses a graph-like data structure that clusters features indexes together. The minimum atomic value in the data structure is represented as a leaf node which is either a single feature index or a sequence of two or more feature indexes when a minimum sequence length is imposed. Root nodes are formed as clustered collections of leaf nodes and/or other root nodes. Context nodes are formed from root nodes that are associated with content that is being indexed. Links between a root node and other nodes each include a sequence order value that is used to maintain the sequencing order for feature indexes relative to the root node. The collection of nodes forms a graph-like data structure, where each context node is indexed according to the sequenced pattern of feature indexes. Clusters can be split, merged, and promoted to increase the efficiency in searching the data structure.
    Type: Grant
    Filed: October 24, 2007
    Date of Patent: November 22, 2011
    Assignee: Microsoft Corporation
    Inventors: Kunal Mukerjee, R. Donald Thompson, III, Jeffrey Cole, Brendan Meeder
  • Patent number: 8060516
    Abstract: Systems and methods for compressing indices are described. In one aspect, a plurality of items are selected where each item has an entry in an inverted index and each item entry comprises a listing of articles that the item appears in. At least a first item entry and a second item entry are determined for compression and the second item entry is compressed into the first item entry resulting in a compressed first item entry.
    Type: Grant
    Filed: September 20, 2010
    Date of Patent: November 15, 2011
    Assignee: Google Inc.
    Inventor: Adam J. Weissman
  • Patent number: 8046361
    Abstract: An improved system and method for classifying tags of content using a hyperlinked corpus of classified web pages is provided. An anchor text index may be searched to find anchor texts that may match text of the tag, documents referenced by the matching anchor texts may be found, and the documents referenced by the matching anchor texts may be grouped to disambiguate multiple classifications that result from matching the anchor texts with the categories of the reference documents. To resolve ambiguity between multiple classifications, weighted classifications may be used where each document may be assigned a positive weight for a mapping to a category to indicate the confidence of the classification of the document to the category. The classification for the grouping of the documents referenced by the matching anchor texts with greatest frequency may be selected and output as the classification for the tag.
    Type: Grant
    Filed: April 18, 2008
    Date of Patent: October 25, 2011
    Assignee: Yahoo! Inc.
    Inventors: Börkur Sigurbjörnsson, Roelof van Zwol, Simon E. Overell