Inverted Index Patents (Class 707/742)
  • Patent number: 9773054
    Abstract: According to an aspect, storing and querying conceptual indices (CIs) includes creating a conceptual inverted index (CII) from the CIs. The CII includes CII entries, each of which corresponds to a concept in a concept graph. Creating the CII includes populating each entry with pointers to documents selected from the CIs having likelihoods of being related to the concept that are greater than a threshold value, and the corresponding likelihoods. An aspect also includes receiving a query that includes a concept in the concept graph, and generating query results from a search that include the row at least a subset of the pointers to documents. Each of the CIs is associated with a corresponding document and includes a CI entry for each concept in the concept graph, and each of the CI entries specifies a value indicating a likelihood that the document is related to the concept in the concept graph.
    Type: Grant
    Filed: March 11, 2015
    Date of Patent: September 26, 2017
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michele M. Franceschini, Luis A. Lastras-Montano, Livio B. Soares, Mark N. Wegman
  • Patent number: 9703858
    Abstract: According to an aspect, storing and querying conceptual indices (CIs) includes creating a conceptual inverted index (CII) from the CIs. The CII includes CII entries, each of which corresponds to a concept in a concept graph. Creating the CII includes populating each entry with pointers to documents selected from the CIs having likelihoods of being related to the concept that are greater than a threshold value, and the corresponding likelihoods. An aspect also includes receiving a query that includes a concept in the concept graph, and generating query results from a search that include at least a subset of the pointers to documents. Each of the CIs is associated with a corresponding document and includes a CI entry for each concept in the concept graph, and each of the CI entries specifies a value indicating a likelihood that the document is related to the concept in the concept graph.
    Type: Grant
    Filed: July 14, 2014
    Date of Patent: July 11, 2017
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michele M. Franceschini, Luis A. Lastras-Montano, Livio B. Soares, Mark N. Wegman
  • Patent number: 9665605
    Abstract: Methods and apparatus for building a search index for a database are disclosed. When an incremental build trigger is detected (e.g., a threshold number of documents are added to database), the system determines which sub-indexes need to be updated and which sub-indexes do not need to be updated. Rather than update the affected sub-indexes directly, the system builds new sub-indexes to replace the affected sub-indexes. Database queries that occur during the generation of the replacement sub-indexes use the old sub-indexes. When the new sub-indexes are ready, the system moves pointers from the old sub-indexes to the new sub-indexes so that subsequent database queries use the new sub-indexes.
    Type: Grant
    Filed: September 9, 2014
    Date of Patent: May 30, 2017
    Assignee: KCURA LLC
    Inventors: Mikhail Kogan, Michael B. Goldstein, Vidhyapriya Govindarajan, Keith L. Kaminski, Mason D. May, Fatima Z. Mecci, Nikita Solilov, Kyle A. Stachowiak
  • Patent number: 9665568
    Abstract: Methods, apparatus and systems, including computer program products, for creating subject matter synonyms from definitions extracted from a subject matter glossary. Confidence scores, each representing a likelihood that two terms defined in the subject matter glossary are synonyms, are determined by applying natural language processing (e.g., passage term matching, lexical matching, and syntactic matching) to the extracted definitions. A subject matter thesaurus is built based on the confidence scores. In one embodiment, a statement containing a first term is created based on an extracted definition of the first term, a modified statement is created by substituting a second term in the statement in lieu of the first term, a corpus is searched, and a confidence score is determined based on evidence in the corpus that the modified statement is accurate. The first and second terms are marked as synonyms if the confidence score is greater than a threshold.
    Type: Grant
    Filed: February 12, 2016
    Date of Patent: May 30, 2017
    Assignee: International Business Machines Corporation
    Inventors: Scott N. Gerard, Mark G. Megerian
  • Patent number: 9589277
    Abstract: Methods, computer systems, and computer storage media are provided for evaluating information retrieval (IR) such as search query results (including advertisements) by a machine learning scorer. In an embodiment, a set of features is derived from a query and a machine learning algorithm is applied to construct a linear model of (query, ads) for scoring by maximizing a relevance metric. In an embodiment, the machine learned scorer is adapted for use with WAND algorithm based ad selection.
    Type: Grant
    Filed: December 31, 2013
    Date of Patent: March 7, 2017
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Bruce Zhang, Jianchang Mao, Yuan Shen
  • Patent number: 9535983
    Abstract: Storing text samples in a manner that the text samples may be quickly searched. The text samples are assigned a text sample identifier and are each parsed to thereby extract text components from the text samples. Text components that have the same content are assigned the same text component identifier. For each parsed text component, a text component entry is created that includes the assigned text component identifier as well as the text sample identifier for the text sample from which the text component was parsed. A text sample entry group is created for each text sample that contains the text component entries in sequence for the text components found within the text sample. The text sample entry groups are stored so as to be scannable during a future search.
    Type: Grant
    Filed: October 29, 2013
    Date of Patent: January 3, 2017
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Cristian Petculescu, Marius Dumitru, Vasile Paraschiv, Amir Netz, Paul Jonathon Sanders
  • Patent number: 9367621
    Abstract: A method and system for automatically updating searches are described. In one embodiment, a first search result may be compared with a second search result to automatically identify at least one data item within the first search result that is changed relative to the second search result. The at least one data item may comprise a transaction term. A notification of the at least one data item may be transmitted to a user device.
    Type: Grant
    Filed: January 20, 2014
    Date of Patent: June 14, 2016
    Assignee: eBay Inc.
    Inventors: Wen Wen, Patricia Ng
  • Patent number: 9311300
    Abstract: Methods, apparatus and systems, including computer program products, for creating subject matter synonyms from definitions extracted from a subject matter glossary. Confidence scores, each representing a likelihood that two terms defined in the subject matter glossary are synonyms, are determined by applying natural language processing (e.g., passage term matching, lexical matching, and syntactic matching) to the extracted definitions. A subject matter thesaurus is built based on the confidence scores. In one embodiment, a statement containing a first term is created based on an extracted definition of the first term, a modified statement is created by substituting a second term in the statement in lieu of the first term, a corpus is searched, and a confidence score is determined based on evidence in the corpus that the modified statement is accurate. The first and second terms are marked as synonyms if the confidence score is greater than a threshold.
    Type: Grant
    Filed: September 13, 2013
    Date of Patent: April 12, 2016
    Assignee: International Business Machines Corporation
    Inventors: Scott N. Gerard, Mark G. Megerian
  • Patent number: 9244931
    Abstract: Techniques provide time-aware ranking, such as ranking of information, files or URL (uniform resource locator) links. For example, time-aware modeling assists in determining user intent of a query to a search engine. In response to the query, results are ranked in a time-aware manner to better match the user intent. The ranking may model query, URL and query-URL pair behavior over time to create time-aware query, URL and query-URL pair models, respectively. Such models may predict behavior of a query-URL pair, such as frequency and timing of clicks to the URL of the pair when the query of the pair is posed to the search engine. Results of a query may be ranked by predicted query-URL behavior. Once ranked, the results may be sent to the user in response to the query.
    Type: Grant
    Filed: October 11, 2011
    Date of Patent: January 26, 2016
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Kira Radinsky, Susan T. Dumais, Krysta M Svore, Jaime Brooks Teevan, Eric J. Horvitz
  • Patent number: 9218414
    Abstract: A method for searching multiple documents on a computer system includes steps for sending a query to a system core where the query is passed to a search component for searching the documents. The system core in turn receives results from the search component indicating related documents to the query and passes to a summarization component a specified number of the results. The summarization component processes related documents corresponding to the specified number of results to produce a multi-document summary. The system core receives the summary from the summarization component. The multi-document summary is received from the system core.
    Type: Grant
    Filed: June 29, 2012
    Date of Patent: December 22, 2015
    Inventor: Dmitri Soubbotin
  • Patent number: 9202079
    Abstract: A method, system, and computer-readable memory containing instructions include employing a tokenizing authority to obtain a tokenized query term that represents a query term, using the tokenized query term to perform a lookup against a tokenized term database, determining whether the tokenized query term exists in the database. The method, system, and computer-readable memory may further include returning an encryption or decryption key corresponding to an encrypted record of information associated with the query term and corresponding to the tokenized query term.
    Type: Grant
    Filed: October 25, 2012
    Date of Patent: December 1, 2015
    Assignee: VERISIGN, INC.
    Inventor: Burton S. Kaliski, Jr.
  • Patent number: 9122748
    Abstract: Techniques and tools are described for matching documents against monitors. An index can be generated from a plurality of monitors, where the index represents the query logic of the plurality of monitors. The index can be searched using the documents as search queries. The searching can comprise matching the documents against the monitors using the query logic represented in the index. An index can be distributed to a plurality of computing devices to be searched at the plurality of computing devices, where each computing device searches a subset of a plurality of documents against the full index. Searching at the plurality of computing devices can be performed in parallel, and results can be aggregated at a central location.
    Type: Grant
    Filed: March 23, 2012
    Date of Patent: September 1, 2015
    Assignee: Jive Software, Inc.
    Inventor: Lance Riedel
  • Patent number: 9122733
    Abstract: A pedigree data processing system receives a first item from an upstream partner and generates a receive native event for the first item. The mechanism receives pedigree data for the first item from the upstream partner, generates at least one synthetic event based on the pedigree data and stores the receive native event and the at least one synthetic event in a pedigree data repository. The pedigree data processing system determines whether to send electronic pedigree information for the first item to downstream partners using push data exchange or pull data exchange. The pedigree data processing system generates an electronic pedigree for the first item using pull data exchange based on the receive native event and the at least one synthetic event and provides the electronic pedigree to a first downstream partner pedigree system.
    Type: Grant
    Filed: December 3, 2012
    Date of Patent: September 1, 2015
    Assignee: International Business Machines Corporation
    Inventors: Victor Dogaru, Arthur F. Kaufmann, Martin A. Siegenthaler
  • Patent number: 9116969
    Abstract: A pedigree data processing system receives a first item from an upstream partner and generates a receive native event for the first item. The mechanism receives pedigree data for the first item from the upstream partner, generates at least one synthetic event based on the pedigree data and stores the receive native event and the at least one synthetic event in a pedigree data repository. The pedigree data processing system determines whether to send electronic pedigree information for the first item to downstream partners using push data exchange or pull data exchange. The pedigree data processing system generates an electronic pedigree for the first item using pull data exchange based on the receive native event and the at least one synthetic event and provides the electronic pedigree to a first downstream partner pedigree system.
    Type: Grant
    Filed: April 30, 2012
    Date of Patent: August 25, 2015
    Assignee: International Business Machines Corporation
    Inventors: Victor Dogaru, Arthur F. Kaufmann, Martin A. Siegenthaler
  • Patent number: 9058377
    Abstract: This specification describes technologies relating to fixed width encoding/decoding of document posting lists. In general, one aspect of the subject matter described in this specification can be embodied in apparatuses that include a server obtaining a list of one or more of document identification numbers, each of the document identification numbers uniquely identifying a document; an encoding device operatively connected to the server, the encoding device generating a sequence of deltas from the sequential list of one or more of the document identification numbers, and encoding each delta in the sequence of deltas using a fixed-width encoding scheme.
    Type: Grant
    Filed: June 3, 2011
    Date of Patent: June 16, 2015
    Assignee: Google Inc.
    Inventors: Priyendra Deshwal, Srdjan Petrovic, Asim Shankar
  • Publication number: 20150142821
    Abstract: A database system performs analytics on longitudinal data, such as medical histories with events occurring to patients over time. Input data is processed into streams of events. A set of indexes of event characteristics is generated. A set of patient event histories, partitioned by patient, is generated. Several copies of event data are stored, each copy being structured to support a specific analytical task. Data is partitioned and distributed over several hardware nodes to allow parallel queries. Definitions of sets of candidate patients are translated into sets of filters applied to the set of indexes. Data for these candidates are input to analytical modules. Reports from analysis are automatically generated to be compatible with standard guidelines for reporting. Workflows support one task or a set of closely related tasks by offering the user a defined sequence of query options and analytic choices specifically arranged for the task.
    Type: Application
    Filed: November 18, 2013
    Publication date: May 21, 2015
    Inventors: Jeremy Rassen, Allon Rauer, Sebastian Schneeweiss
  • Publication number: 20150127648
    Abstract: A method for generating image descriptors for media content of images represented by a set of key-points, fn, is recommended which determines for each key-point of the image, designated as a central key-point, a neighbourhood of other key-points, fml, whose features are expressed relative to those of the central key-point. A sparse photo-geometric descriptor, SPGD, of each key-point in the image being a representation of the geometry and intensity content of a feature and its neighbourhood is provided to perform an efficient image querying for efficient searches. The approach demonstrates that incorporating geometrical constraints in image registration applications does not need to be a computationally demanding operation carried out to refine a query response short-list.
    Type: Application
    Filed: June 7, 2012
    Publication date: May 7, 2015
    Applicant: THOMSON LICENSING
    Inventors: Patrick Perez, Joaquin Salvatierra Zepeda
  • Patent number: 9026538
    Abstract: The present invention provides a method for performing transactions on data entities in a database and a transactional database. The database comprises an ordered set of data stores with at least one static data store, wherein said static data store uses an index structure based on a non-updatable representation of an ordered set of integers according to the principle of compressed inverted indices. The method allows to generate a modifiable data store when the performed transaction comprises an insert, update or delete operation, to execute operations of the transaction on the ordered set being present at the time when the transaction has been started and, if present, on the modifiable data store and to convert data stores to a new static data store, The insert, update or delete operation are executed on the modifiable data store which is the only data store modifiable for the transaction.
    Type: Grant
    Filed: October 13, 2009
    Date of Patent: May 5, 2015
    Assignee: Open Text S.A.
    Inventors: Gary J. Promhouse, Matthew David George Timmermans, Karl-Heinz Krachenfels
  • Patent number: 9020950
    Abstract: A system and method for generating tag glossaries and use thereof is provided. A set of tags is accessed. Each tag is associated with a glossary that includes one or more terms and definitions for the terms. A new tag is generated and a new glossary is generated for the new tag based on the glossaries associated with the set of tags. The tag glossaries can be used to provide context for documents associated with the tags, to determine appropriate tags for untagged documents, to help in search for other documents, and to build indices for documents or collections of documents.
    Type: Grant
    Filed: December 19, 2011
    Date of Patent: April 28, 2015
    Assignee: Palo Alto Research Center Incorporated
    Inventors: William C. Janssen, Jr., Lauri J. Karttunen
  • Patent number: 9020951
    Abstract: In response to a search query having a search term received from a client, a current language locale is determined. A state machine is built based on the current language locale, where the state machine includes one or more nodes to represent variance of the search term having identical meaning of the search term. Each node of the state machine is traversed to identify one or more postings lists of an inverted index corresponding to each node of the state machine. One or more item identifiers obtained from the one or more postings list are returned to the client, where the item identifiers identify one or more files that contain the variance of the search term represented by the state machine.
    Type: Grant
    Filed: September 13, 2012
    Date of Patent: April 28, 2015
    Assignee: Apple Inc.
    Inventors: John M. Hörnkvist, Eric R. Koebler
  • Patent number: 8996531
    Abstract: A process is disclosed for the computer management of inverted lists and inverted indices, in which the standard representation and processing of inverted lists is changed in order to achieve a simpler, more compact and more efficient architecture.
    Type: Grant
    Filed: February 14, 2011
    Date of Patent: March 31, 2015
    Inventor: Giovanni M Sacco
  • Publication number: 20150088901
    Abstract: In one embodiment, a method includes receiving, from a user, a search query requesting objects of a first object type. The search query includes an inner query requesting objects of a second object type. The method includes identifying objects of the second object type requested by the inner query using an inverted index of a data store corresponding to the second object type; identifying objects of the first object type requested by the search query using the identified objects of the second object type and a forward index of the data store corresponding to the second object type; and sending search results to the user responsive to the search query, each search result corresponding to an identified object of the first object type.
    Type: Application
    Filed: December 4, 2014
    Publication date: March 26, 2015
    Inventors: Soren Bogh Lassen, Sandhya Kunnatur, Michael Curtiss
  • Patent number: 8990200
    Abstract: A topical search computer system identifies topics from various definitional (i.e., data) sources. The system generates a catalog of different topics from the data sources. Topics with similar names are differentiated by the system based on the context in which each topic is used. The context for a topic is represented by a context vector, which describes the co-occurrence relationships between the topic and other topics derived from the data sources. Because the system has computed a context for each topic, the system can provide improved search results responsive to user queries for information.
    Type: Grant
    Filed: October 1, 2010
    Date of Patent: March 24, 2015
    Assignee: Flipboard, Inc.
    Inventors: Jens Bagger Christensen, Arthur Anthonie Van Hoff
  • Patent number: 8983947
    Abstract: Techniques and tools are described for augmenting search using association information. Searches can be performed using a combination of index information and association information. In some examples, index information is stored in a first data store and association information is stored in a second data store. Search queries can be received and modified using association information. Modified search queries can be executed using a combination of index information and association information. Index information can be generated by indexing a set of documents. Association information can be generated by monitoring user activity occurring between users and a set of documents.
    Type: Grant
    Filed: September 30, 2011
    Date of Patent: March 17, 2015
    Assignee: Jive Software, Inc.
    Inventors: Lance Riedel, Georgios Mavromatis
  • Publication number: 20150066947
    Abstract: An indexing apparatus and method for search of security monitoring data are provided. The indexing apparatus includes a data collection unit and a data index generation unit. The data collection unit collects data, that is, a basis of search of monitoring information, from a database in which security monitoring data has been stored. The data index generation unit generates file structure-based data in which indices have assigned to multiple search elements of the data collected by the data collection unit.
    Type: Application
    Filed: July 21, 2014
    Publication date: March 5, 2015
    Inventors: Taek kyu LEE, Geun Yong KIM, Suk won LEE, Kyu Cheol JUNG, SoonJwa HONG, In seog SEO
  • Patent number: 8972403
    Abstract: Embodiments of the invention relate to organizing data records in a relational database. An aspect of the invention includes creating index items for a plurality of data records. Each index item includes a counter and the creating results in a plurality of counters. The numerical values of counters in corresponding index items are updated for data records in the plurality of data records that are subjected to random access. The plurality of data records are reorganized based upon the numerical values of the plurality of counters.
    Type: Grant
    Filed: February 3, 2012
    Date of Patent: March 3, 2015
    Assignee: International Business Machines Corporation
    Inventors: You-Chin Fuh, Ke Wei Wei, Xin Ying Yang, Jian Wei Zhang, Jing Zhou, Xiang Zhou
  • Patent number: 8954470
    Abstract: Systems and methods are disclosed that allow for indexing, processing, or both of information from physical media or electronic media, which may be received from a plurality of sources. In embodiments, a document file may be matched using pattern matching methods and may include comparisons with a comparison reference database to improve or accelerate the indexing process. In embodiments, information may be presented to a user as potential matches thereby improving manual indexing processes. In embodiments, one or more additional actions may occur as part of the processing, including without limitation, association additional data with a document file, making observations from the document file, notifying individuals, creating composite messages, and billing events. In an embodiment, data from a document file may be associated with a key word, key phrase, or word frequency value that enables adaptive learning so that unindexed data may be automatically indexed based on user interaction history.
    Type: Grant
    Filed: December 18, 2012
    Date of Patent: February 10, 2015
    Assignee: Indxit Systems, Inc.
    Inventors: Michael J. Ebaugh, Matthew J. Morvant
  • Patent number: 8949247
    Abstract: In a method for a dynamic updating of an index of a search engine, wherein the index is an inverted index comprising a dictionary, a posting file with a posting list for each keyword of the index and a database log, the documents are inserted in the index in small batches called update generations, a list of all occurrences of keywords in the documents of each update generation is generated, the occurrence list is inserted in the database log, and for each keyword entered in the database a reference to a previous entry of the same keyword is created. This previous entry has a reference stored in the mass storage device as the last added entry of all recently keywords.—A search engine performing the method may be implemented on one or more servers with a mass storage device, and comprises a core search engine with a search subsystem and an indexing subsystem for creating a keyword index stored on the mass storage device and with the index realized as a dynamically updateable index.
    Type: Grant
    Filed: December 18, 2008
    Date of Patent: February 3, 2015
    Assignee: Microsoft International Holdings B.V.
    Inventor: Øystein Torbjørnsen
  • Patent number: 8938410
    Abstract: To implement open information extraction, a new extraction paradigm has been developed in which a system makes a single data-driven pass over a corpus of text, extracting a large set of relational tuples without requiring any human input. Using training data, a Self-Supervised Learner employs a parser and heuristics to determine criteria that will be used by an extraction classifier (or other ranking model) for evaluating the trustworthiness of candidate tuples that have been extracted from the corpus of text, by applying heuristics to the corpus of text. The classifier retains tuples with a sufficiently high probability of being trustworthy. A redundancy-based assessor assigns a probability to each retained tuple to indicate a likelihood that the retained tuple is an actual instance of a relationship between a plurality of objects comprising the retained tuple. The retained tuples comprise an extraction graph that can be queried for information.
    Type: Grant
    Filed: December 16, 2010
    Date of Patent: January 20, 2015
    Assignee: University of Washington through its Center for Commercialization
    Inventors: Michael J. Cafarella, Michele Banko, Oren Etzioni
  • Patent number: 8935271
    Abstract: In one embodiment, a method includes receiving a search query requesting first search results of a first object type. The search query includes an inner query requesting second search results of a second object type. The method includes accessing an inverted index of a data store corresponding to the second object type; retrieving the second search results requested by the inner query using the inverted index of the data store corresponding to the second object type; accessing a forward index of the data store corresponding to the second object type; retrieving the first search results requested by the search query using the second search results and the forward index of the data store corresponding to the second object type.
    Type: Grant
    Filed: December 21, 2012
    Date of Patent: January 13, 2015
    Assignee: Facebook, Inc.
    Inventors: Soren Bogh Lassen, Sandhya Kunnatur, Michael Curtiss
  • Patent number: 8935263
    Abstract: The disclosed embodiments provide a reputation system. The reputation system includes a scoring apparatus that provides a matrix of reputation scores for a set of items and a set of dimensions of the items in the reputation system, wherein the matrix comprises unknown values for a subset of the reputation scores. The reputation system also includes an inference apparatus that calculates a factorization of the matrix and uses the factorization to update the matrix with a set of inferred values for the set of reputation scores. Finally, the reputation system includes a ranking apparatus that uses the updated matrix to obtain a ranking of the items by one or more of the dimensions.
    Type: Grant
    Filed: March 17, 2014
    Date of Patent: January 13, 2015
    Assignee: LinkedIn Corporation
    Inventors: Mario S. Rodriguez, Viet Thuc Ha, Jessica V. Zuniga, Mathieu Bastian, Michael Conover
  • Publication number: 20140379728
    Abstract: A query is received that includes two or more facets of a multidimensional inverted index for a collection of documents. Each document is associated with at least one facet. Generation of the multidimensional inverted index includes creating one or more entries. Each entry includes a combination of two or more facets and a posting list of indications for the documents associated with respective facets of each entry. Each indication identifies a document. Generation of the index also includes determining documents associated with respective facets of the combination of each entry. The multidimensional inverted index is searched for an entry having the combination of two or more facets included in the query and a search result is returned. An indication for a document may be included in a posting list if it is determined that the document is associated with each facet of the combination of facets of the entry.
    Type: Application
    Filed: June 21, 2013
    Publication date: December 25, 2014
    Inventors: Rohan A. Ambasta, Bharath Ganesh, Parag S. Gokhale, Chandrashekhar Jain
  • Publication number: 20140372450
    Abstract: The invention relates generally to a method for interactive viewing and analysis of high content data in a biological pathway context. The high content data maybe related to the expression of biomarkers within a tissue, cellular, or cellular compartment of individual cell such that the data may reveal patterns of expression to identify a biological process, a clinical diagnosis or prognosis.
    Type: Application
    Filed: June 14, 2013
    Publication date: December 18, 2014
    Inventors: John Frederick Graf, Brion Daryl Sarachan, Maria Ildiko Zavodszky, Lee Aaron Newberg, Chinnappa Dilip Kodira
  • Patent number: 8914376
    Abstract: An electronic document analysis method receiving N electronic documents pertaining to a case encompassing a set of issues including at least one issue and establishing relevance of at least the N documents to at least one individual issue in the set of issues, the method comprising, for at least one individual issue from among the set of issues, receiving an output of a categorization process applied to each document in training and control subsets of the at least N documents, the output including, for each document in the subsets, one of a relevant-to-the-individual issue indication and a non-relevant-to-the-individual issue indication; building a text classifier simulating the categorization process using the output for all documents in the training subset of documents; and running the text classifier on the at least N documents thereby to obtain a ranking of the extent of relevance of each of the at least N documents to the individual issue.
    Type: Grant
    Filed: July 2, 2013
    Date of Patent: December 16, 2014
    Assignee: Equivio Ltd.
    Inventor: Yiftach Ravid
  • Patent number: 8914356
    Abstract: Techniques for indexing file paths of items in a content repository may include taking turns in querying each different item type or folder type in a round robin schedule to visit select nodes of the folder tree of that type to update and maintain the file path indexes. Item types or folder types may be associated with a count of instances or children of instances that are missing indexes. For each item type or folder type, a query may be performed for instances of the item type or folder type having children that are missing indexes, the instances or children of the instances returned may be associated with file path indexes, and the count of instances or children of instances may be adjusted based on the associating.
    Type: Grant
    Filed: November 1, 2012
    Date of Patent: December 16, 2014
    Assignee: International Business Machines Corporation
    Inventor: David B. Victor
  • Patent number: 8914380
    Abstract: A search index structure which extends a typical composite index by incorporating an index which is optimized for fast retrieval from storage and which eliminates data which is specific to phrase searching. Other data is represented in a manner which allows it to be calculated rather than stored. Associating variable length entries with logical categories allows their length to be inferred from the category rather than stored. Using delta values between document IDs rather than the ID itself generates a compact, dense symbol set which is efficiently compressed by Huffman encoding or a similar compression method. Using an upper threshold to remove large, and thus rare, delta values from the symbol set prior to encoding further improves the encoding performance.
    Type: Grant
    Filed: March 19, 2012
    Date of Patent: December 16, 2014
    Assignee: Microsoft Corporation
    Inventors: Chadd Creighton Merrigan, Mihai Petriuc, Raif Khassanov, Artsiom Ivanovich Kokhan
  • Patent number: 8903828
    Abstract: A method for configuring a multi-path index includes receiving and storing, in a database management system configured to store a structured document in its native format, a multi-path index definition associated with a data model corresponding to the structured document. In an embodiment, the multi-path index definition includes a sub-path definition that covers a plurality of descendant elements of a root element of the data model and includes at least one index property. Each of a plurality of descendant elements covered by the sub-path definition is automatically indexed according to the at least one index property. The multi-path index definition is stored in a data structure associated with a multi-path index configured to store indexed data from the structured document.
    Type: Grant
    Filed: June 16, 2011
    Date of Patent: December 2, 2014
    Assignee: EMC Corporation
    Inventors: Edward C. Bueche, Francisco Borges, Petr Pleshachkov, Shanshan Quan, Marc Brette, Venkatesan Chandrasekaran
  • Patent number: 8903829
    Abstract: A method for indexing a structured document includes providing a multi-path index definition associated with a data model corresponding to a structured document. The multi-path index definition includes a sub-path definition that covers a root element's descendant elements and includes at least one index property. When a first path expression representing a first descendant element from a first structured document is received, the method includes determining that the first descendant element is covered by the sub-path definition based on the first path expression, indexing the first descendant element according to the index property to generate a path-value pair, and storing the path-value pair and a reference to the first structured document in an inverted multi-path index.
    Type: Grant
    Filed: June 16, 2011
    Date of Patent: December 2, 2014
    Assignee: EMC Corporation
    Inventors: Edward C. Bueche, Francisco Borges, Petr Pleshachkov, Shanshan Quan, Marc Brette, Venkatesan Chandrasekaran
  • Publication number: 20140351244
    Abstract: Systems and methods for processing an index are described. A postings list of items containing a particular term are ordered in a desired retrieval order, e.g., most recent first. The ordered items are inserted into an inverted index in the desired retrieval order, resulting in an ordered inverted index from which items may be efficiently retrieved in the desired retrieval order. During retrieval, items may first be retrieved from a live index, and the retrieved items from the live and ordered indexes may be merged. The retrieved items may also be filtered in accordance with the items' file grouping parameters.
    Type: Application
    Filed: June 3, 2014
    Publication date: November 27, 2014
    Applicant: Apple Inc.
    Inventors: Wayne Loofbourrow, John Martin Hornkvist, Eric Richard Koebler, Yan Arrouye
  • Patent number: 8898107
    Abstract: In one aspect, in general, a method for managing data in a data storage system comprises receiving data to be stored in the data storage system, computing values corresponding to different respective portions of the received data, generating identifiers corresponding to different respective portions of the received data, with an identifier corresponding to a particular portion of data including the computed value corresponding to the particular portion of data and metadata indicating a location where the particular portion of data is being stored in the data storage system, and storing at least some of the identifiers in an index until the index reaches a predetermined size.
    Type: Grant
    Filed: May 6, 2013
    Date of Patent: November 25, 2014
    Assignee: Permabit Technology Corp.
    Inventors: Jered J. Floyd, Michael Fortson, Assar Westerlund, Jonathan Coburn
  • Patent number: 8898172
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for enhanced parallel latent Dirichlet allocation (PLDA+). A PLDA+ system is a system of multiple processors that are configured to generate topics from multiple documents. The multiple processors are designated as two types: document processors and matrix processors. The documents are distributed among the document processors. Generated topics are distributed among the matrix processors. Tasks performed on the document processors and matrix processors are segregated into two types of tasks: computation-bound tasks and communication-bound tasks. Computation-bound tasks are CPU intensive tasks; communication-bound tasks are network intensive tasks. Data placement and pipeline strategies are employed such that the computation-bound tasks and the communication-bound tasks are distributed to the processors in a balanced manner, and performed in parallel.
    Type: Grant
    Filed: May 11, 2011
    Date of Patent: November 25, 2014
    Assignee: Google Inc.
    Inventors: Zhiyuan Liu, Yuzhou Zhang, Edward Y. Chang
  • Publication number: 20140337355
    Abstract: A method and computer program product for implementing indexed natural language processing are disclosed. Source document features including but not limited to terms, punctuation, parts-of-speech, phrases (including the syntactic types of the phrases), dependent clauses (including the syntactic types of the dependent clauses), independent clauses (including the syntactic types of the independent clauses), sentences, paragraphs, labeled document sections and document type and cognitive grammar constraints on the scope of influence and binding for the same are entered into an index by their begin and end byte offsets (or some alternative indexing method).
    Type: Application
    Filed: March 31, 2014
    Publication date: November 13, 2014
    Applicant: GNOETICS, INC.
    Inventor: Daniel Heinze
  • Publication number: 20140324882
    Abstract: The present invention relates to systems and methods for storing, navigating and retrieving information. In particular, the present invention is concerned with systems and methods for storing data in, for retrieving data from, and for navigating large and/or complex datasets. The systems and methods of the present invention in particular are concerned with the materialization/denormalization of complex data sets comprising a plurality of large, interconnected but distinct data record collections. The materialization/denormalization of such data sets can be performed in a precomputation phase, prior to a browsing/searching operation.
    Type: Application
    Filed: April 29, 2014
    Publication date: October 30, 2014
    Inventors: Tummarello GIOVANNI, Delbru RENAUD
  • Patent number: 8874611
    Abstract: An apparatus, method and article of manufacture of the present invention detects the presence of references to the same concept in separate sections of text, and, with no input required from the reader, presents the reader with information concerning the detected references to the concept. The information provided may comprise information related to the location of the reference to the concept in other sections of text, and the reader also is provided the ability to move from one reference to a concept directly to another reference to the same concept.
    Type: Grant
    Filed: September 24, 2012
    Date of Patent: October 28, 2014
    Inventor: Philip R Krause
  • Patent number: 8856137
    Abstract: An information re-organization system includes a plurality of counters coordinated to meaning attributes, and a re-organization incentive notification unit that updates, in case the information stored in preset storage unit has been updated, value of a counter out of the multiple counters that has the meaning attribute associated with contents updated. The information re-organization system also includes an information re-organization processor that executes, in case the value of the counter section updated has met one of a number of predetermined conditions for information re-organization, a processing for information re-organization corresponding to the condition for information re-organization on the information stored in the preset storage unit.
    Type: Grant
    Filed: June 3, 2013
    Date of Patent: October 7, 2014
    Assignee: NEC Corporation
    Inventor: Masaki Kan
  • Publication number: 20140289224
    Abstract: A method, apparatus and system of method and system of directory sharing and management in a group communication environment is disclosed. In one embodiment, a method of a fast-search server includes processing a character of a query of music data, referencing the character with a reverse index of a music database, determining that the character matches a data record of the music database using the reverse index and returning the data record of the music database prior to receiving all characters of the query of music data from a user. The reverse index may be created from a combination of letters appearing as a string in a data field of the music database. The method may include preforking the character of the query of music data along with other processes in the fast-search server to minimize concurrency issues and to minimize threading locks.
    Type: Application
    Filed: March 28, 2014
    Publication date: September 25, 2014
    Applicant: BEATS MUSIC, LLC
    Inventor: Lucas S. Carlson
  • Patent number: 8843486
    Abstract: A set of index keys is included in an index search system that are associated with the scope of the search rather than the content of the documents that are the target of the search. These scope related index keys, or scope keys allows the scope of the search to be selected, reducing the number of documents that a search is required to sift through to obtain results. Furthermore, compound scopes are recognized and stored such that an index of complex search scopes is provided to eliminate rehashing of the searches based on these complex search scopes.
    Type: Grant
    Filed: September 29, 2009
    Date of Patent: September 23, 2014
    Assignee: Microsoft Corporation
    Inventors: Chadd Creighton Merrigan, Kyle G. Peltonen, Dmitriy Meyerzon, David J. Lee
  • Patent number: 8837818
    Abstract: A feature section including a feature of a candidate region but not including a feature of a related large region is set as for a style type different in feature from the related large region among a plurality of style types, with respect to each index candidate region. At least one or both of the large regions and the candidate regions having the feature included in the set feature section are grouped. An index evaluation degree is calculated, based on the grouped result, with respect to each candidate region. It is determined whether or not a logical element of each candidate region is an index, based on the calculated index evaluation degree.
    Type: Grant
    Filed: March 11, 2010
    Date of Patent: September 16, 2014
    Assignee: Konica Minolta Business Technologies, Inc.
    Inventor: Yoshio Komaki
  • Patent number: 8825665
    Abstract: Certain example embodiments relate to a database index for indexing one or more text documents in a database. The text documents include one or more hierarchical nodes, and each node includes one or more words. The database index includes at least one entry, with each entry including a key. The key, in turn, includes a subset of words occurring in one of the hierarchical nodes of the text documents and the name of the respective hierarchical node. Associated with each key is a value including one or more references to the text documents in which the subset of words occurs.
    Type: Grant
    Filed: December 15, 2008
    Date of Patent: September 2, 2014
    Assignee: Software AG
    Inventors: Jürgen Harbarth, Juliane Harbarth
  • Patent number: 8818971
    Abstract: Systems and methods for deleting non-key values from an index distributed over a plurality of computing devices maintains a non-key master list that includes the non-key values that are stored on each of the plurality of computing devices and receives a list of non-key values to delete. The systems and methods further intersect the list of non-key values to delete with the non-key master list, creating a first delete list for a first one of the plurality of computing devices that includes non-key values to be deleted that are stored on the first computing device. The systems and methods further transmit the first delete list to the first computing device and update the non-key master list based on the list of non-key values to delete.
    Type: Grant
    Filed: December 21, 2012
    Date of Patent: August 26, 2014
    Assignee: Google Inc.
    Inventors: Marcus Fontoura, Jan Hendrik Pieper, Krishna Tatavarthi, Bjoern Carlin, Hsiang-ling Lin