Inverted Index Patents (Class 707/742)
  • Patent number: 11157477
    Abstract: A method, computer system, and computer program product for segment differential-based document text-index modeling are provided. The embodiment may include receiving, by a processor, a document with a valid document ID and version ID tuple. The embodiment may also include determining the received document is a new version of a previously stored document and consequently multiplexing versions of the document into a single indexed document. The embodiment may further include segmenting the received document and building a token vector. The embodiment may also include calculating a difference between the received new version of the document and the previously stored document using information obtained from the segmentation. The embodiment may further include in response to the calculated difference being below a pre-configured threshold value, discarding the received new version.
    Type: Grant
    Filed: November 28, 2018
    Date of Patent: October 26, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Roger C. Raphael, Rajesh M. Desai, Fumihiko Terui, Justo L. Perez, Thomas Hampp
  • Patent number: 11157678
    Abstract: The presently disclosed subject matter includes a computer-implemented system and method for receiving content from another computer device and dynamically adapting display of the received content within a container of a formatted document, the container defining a restricted area within the formatted document designated for displaying the content. Sub-elements within at least one content item are identified and tagged, the tagging enables to acquire display parameters of tagged sub-elements and calculate therefor a required adaptation of the content item such that it can be fitted within the respective container.
    Type: Grant
    Filed: May 6, 2019
    Date of Patent: October 26, 2021
    Assignee: TABOOLA.COM LTD
    Inventor: Efraim Nadiv
  • Patent number: 11126632
    Abstract: Systems and methods are disclosed for executing a query that includes an indication to process data managed by an external data system. The system identifies the external data system that manages the data to be processed, and obtained search configuration data from the external system. The system uses the search configuration data to generate a subquery for the external data system. The system also generates instructions for one or more worker nodes to receive and process results of the subquery from the external data system.
    Type: Grant
    Filed: July 31, 2018
    Date of Patent: September 21, 2021
    Assignee: Splunk Inc.
    Inventors: Sourav Pal, Arindam Bhattacharjee
  • Patent number: 11106663
    Abstract: A search for a regular expression in a tree hierarchy, includes, in part, searching for a match to the regular expression in a first subtree defined by a first node name, recording information about the first subtree if there is no match, determining whether a second subtree defined by a second node name is identical to the first node, skipping search of the second subtree if the second subtree is determined to be identical and prefix equivalent, with respect to the regular expression, to the first subtree. The second subtree is determined to be prefix equivalent to the first subtree when for any string s, a first prefix defined by a concatenation of the first node name and the string s results in a match if and only if a second prefix defined by a concatenation of the second node name and the string s results in a match.
    Type: Grant
    Filed: February 22, 2019
    Date of Patent: August 31, 2021
    Assignee: Synopsys, Inc.
    Inventors: Ilya Kudryavtsev, Daniel Geist, Boris Gommershtadt
  • Patent number: 11087765
    Abstract: In one example, a method includes method comprising: receiving audio data generated by a microphone of a current computing device; identifying, based on the audio data, one or more computing devices that each emitted a respective audio signal in response to speech reception being activated at the current computing device; and selecting either the current computing device or a particular computing device from the identified one or more computing devices to satisfy a spoken utterance determined based on the audio data.
    Type: Grant
    Filed: March 15, 2021
    Date of Patent: August 10, 2021
    Assignee: GOOGLE LLC
    Inventor: Jian Wei Leong
  • Patent number: 11049505
    Abstract: In one example, a method includes method comprising: receiving audio data generated by a microphone of a current computing device; identifying, based on the audio data, one or more computing devices that each emitted a respective audio signal in response to speech reception being activated at the current computing device; and selecting either the current computing device or a particular computing device from the identified one or more computing devices to satisfy a spoken utterance determined based on the audio data.
    Type: Grant
    Filed: March 15, 2021
    Date of Patent: June 29, 2021
    Assignee: GOOGLE LLC
    Inventor: Jian Wei Leong
  • Patent number: 11030242
    Abstract: A search system processes queries for accessing information stored in documents. A document comprises fields. The search system stores a plurality of indexes in a key-value store. Each index comprises key-value pairs. A key of a key-value pair is obtained by combining field data describing a field of a document. The value of each field is stored as an individual key-value in the key-value store. The search system receives a query requesting information stored in documents and specifying a search criteria. The search system builds a key-expression based on the search criteria and uses one or more indexes to find key-value pairs matching the key-expression. The search system finds the requested information based on the matching key-value pairs and provides the requested information to the query source.
    Type: Grant
    Filed: October 15, 2018
    Date of Patent: June 8, 2021
    Assignee: Rockset, Inc.
    Inventors: Dhruba Borthakur, Venkat Venkataramani, Igor Canadi, Tudor Bosman
  • Patent number: 11003692
    Abstract: Systems, methods, and non-transitory computer-readable media can obtain a first batch of content items to be clustered. A set of clusters can be generated by clustering respective binary hash codes for each content item in the first batch, wherein content items included in a cluster are visually similar to one another. A next batch of content items to be clustered can be obtained. One or more respective binary hash codes for the content items in the next batch can be assigned to a cluster in the set of clusters.
    Type: Grant
    Filed: December 28, 2015
    Date of Patent: May 11, 2021
    Assignee: Facebook, Inc.
    Inventors: Yunchao Gong, Marcin Pawlowski, Fei Yang, Lubomir Bourdev, Louis Dominic Brandy, Robert D. Fergus
  • Patent number: 10997138
    Abstract: Embodiments are directed towards a method for searching data. The method comprises providing an inverted index that comprises at least one record, wherein the at least one record comprises at least one field name and a corresponding at least one field value. The at least one field name and corresponding value are extracted from time-stamped searchable events that are stored in a field searchable datastore and comprise portions of raw data. The at least one record further comprises a posting value that identifies a location in the field searchable datastore where an event associated with the at least one record is stored. The method further comprises receiving an incoming search query that references a field name and evaluating the incoming search query. Furthermore, responsive to the evaluating, the method comprises determining results for the incoming search query using both of the field searchable datastore and the inverted index.
    Type: Grant
    Filed: May 28, 2019
    Date of Patent: May 4, 2021
    Assignee: Splunk, Inc.
    Inventors: David Ryan Marquardt, Mitchell Neuman Blank, Jr., Stephen Phillip Sorkin
  • Patent number: 10970337
    Abstract: A method for outputting a result of one or more operations using data sources of different types is provided. The method includes steps of: (a) when a user query is acquired, a device (i) acquiring data elements respectively from the data sources of different types by referring to the user query, and (ii) performing main joint operations on the data elements, to thereby generate data set; and (b) the device performing data processing operations and output operations on the data set, to thereby generate an answer for the user query. It has an effect of providing the method for outputting the result of the operations using the data sources of the different types by referring to each of languages corresponding to each of the data sources.
    Type: Grant
    Filed: September 11, 2020
    Date of Patent: April 6, 2021
    Assignee: Seculayer Co., LTD.
    Inventor: Jin Sang You
  • Patent number: 10922346
    Abstract: In some examples, a set of sentences is extracted from a digital document, and each sentence is scored using a respective informativeness measure and readability measure. Sentences in the set of sentences are selected based on the readability measures and informativeness measures. A low readability, high informativeness sentence is identified from the set of sentences. A concatenated sentence is generated by concatenating at least one contextual sentence with the low readability, high informativeness sentence, where the concatenated sentence has a higher readability than the low readability, high informativeness sentence.
    Type: Grant
    Filed: June 13, 2017
    Date of Patent: February 16, 2021
    Assignee: Micro Focus LLC
    Inventor: Vinay Deolalikar
  • Patent number: 10838994
    Abstract: Natural Language Processing (NLP) is performed on a corpus using a processor and a memory to extract a set of facets corresponding to a dimension in a set of dimensions. Using a score threshold, a subset of the set of facets is selected where each facet in the set of facets has a corresponding score relative to the corpus. A subsequent query is formed by increasing a complexity of a previous query using a facet in the subset of facets. The subsequent query is executed on at least a portion of the corpus. The documents in a new result set are ranked, the new result set being in response to executing the subsequent query. An output is produced from the new result set, which includes a ranking of that subset of documents whose ranks have changed by more than a threshold rank distance from the corresponding ranks in the corpus.
    Type: Grant
    Filed: August 31, 2017
    Date of Patent: November 17, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Takashi Fukuda, Hiroaki Kikuchi
  • Patent number: 10733211
    Abstract: In an approach to faceted classification, a computer receives a search query. The computer creates a first table of facet value ranges, based on the search query. The computer fetches a first search result corresponding to the search query. The computer retrieves a first facet value associated with the first search result. The computer maps the first facet value to a first facet value range. The computer determines whether the first facet value range is in the first table of facet value ranges. The computer inserts the first facet value range into the first table of facet value ranges. The computer determines whether a number of facet value ranges in the first table of facet value ranges is below a pre-defined threshold. The computer creates a second table of facet value ranges. The computer identifies a second facet value range that includes the first facet value range.
    Type: Grant
    Filed: December 19, 2017
    Date of Patent: August 4, 2020
    Assignee: International Business Machines Corporation
    Inventors: Marta Breno, Roberto Ragusa
  • Patent number: 10642831
    Abstract: Techniques are described herein to generate and to execute a query execution plan using static data buffering. After receiving a query with a clause that requires multiple iterations to execute, a database management system (DBMS) generates a plurality of plans that vary the order in which the database operations are executed. Within each plan, the DBMS identifies sets of rows within that plan that contain static data during execution of the query. Then, an additional step is added to each plan that includes loading the static set of rows in a database buffer cache. One or more database operations, from an iteration other than the first iteration, may be performed against the cached static set of rows. For each plan generated in this manner, a cost analysis model is applied, and the plan with the lowest estimated computational cost is selected for use as the query execution plan.
    Type: Grant
    Filed: September 16, 2016
    Date of Patent: May 5, 2020
    Assignee: ORACLE INTERNATIONAL CORPORATION
    Inventors: Mohamed Ziauddin, Yali Zhu
  • Patent number: 10621246
    Abstract: A method and apparatus of a device that indexes donatable content from a network site is described. In an exemplary embodiment, the device receives a requested document, where the requested document includes a plurality of tags. In addition, the device detects a donatable tag in the plurality tags that indicates the network site includes donatable content. In response to the detecting, the device sends a request for the donatable content to the network site. Furthermore, the device receives the donatable content from the network site. The device additionally indexes the donatable content into an on-device search index, where at least some of the index donatable content is further returned as a search result for an on-device search.
    Type: Grant
    Filed: September 29, 2017
    Date of Patent: April 14, 2020
    Assignee: Apple Inc.
    Inventors: Anubhav Malhotra, John M. Hörnkvist
  • Patent number: 10614031
    Abstract: The present disclosure relates to systems and methods for indexing and mapping data sets by feature matrices, comprising at least a processor and a non-transitory memory storing instructions that cause the processor to perform operations including receiving data sets of the same type, applying autoencoders to generate feature matrices, and generating a neural network model trained to generate synthetic data corresponding to the type of data files. Further, the processor performs operations to applying more autoencoders to part of the hidden layer of the neural network model to generate more corresponding feature matrices and indexing the data set using the feature matrices such that the data sets are searchable using an index wherein a search query is received and a third feature matrix is generated so that a data set can be retrieved and compared to the feature matrices using the index.
    Type: Grant
    Filed: July 8, 2019
    Date of Patent: April 7, 2020
    Assignee: Capital One Services, LLC
    Inventors: Austin Walters, Jeremy Goodsitt, Vincent Pham, Galen Rafferty, Anh Truong, Reza Farivar
  • Patent number: 10545936
    Abstract: Linear run length encoding is described. A system and method include storing a table of time series data in a database of a data platform, the table of time series data representing a set of time series blocks. Each time series block of the set of time series blocks has a time series of equally-incremented time intervals and a run length. Each time interval of the time series is associated with one or more values. The run length has a starting position with at least one starting value and an ending position with at least one ending value. The starting position and the at least one starting value is stored for each time series block in a column store of the database. Then, a compressed index is generated in the column store of the database for each time series block, the compressed index comprising the starting position and the at least one starting value.
    Type: Grant
    Filed: July 8, 2014
    Date of Patent: January 28, 2020
    Assignee: SAP SE
    Inventors: Gordon Gaumnitz, Robert Schulze, Lars Dannecker, Ivan Bowman, Dan Farrar
  • Patent number: 10540332
    Abstract: Technologies are described herein for denormalizing data instances. Schemas for data instances are embedded with annotations indicating how the denormalization is to be performed. Based on the annotations, one or more sub per object indexes (“sub POIs”) can be generated for each data instance and stored. The sub POIs can include a target sub POI containing data from the data instance, and at least one source sub POI containing data from another data instance, if the data instance depends on the other data instance. Data instance updates can be performed by identifying sub POIs that are related to the updated data instance in storage, and updating the related sub POIs according to the update to the data instance. The sub POIs can be sent to an indexing engine to generate an index for a search engine to facilitate searches on the data instances.
    Type: Grant
    Filed: August 3, 2016
    Date of Patent: January 21, 2020
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Christopher Clayton McConnell, Weipeng Liu, Shahin Shayandeh, Robert Lovejoy Goodwin
  • Patent number: 10521408
    Abstract: In general, embodiments of the technology relate to a method for servicing requests. The method includes receiving a search request from a client, determining a main path and a conditional subpath associated with the search request, determining a subpath index associated with the main path and the conditional subpath, obtaining, using at least a portion of the search request, a set of subpath index entries from the subpath index, wherein each of the subpath index entries specifies a facet subpath and content associated with the facet subpath, generating a final result using at least a portion of the contents in the set of subpath index entries, and providing the final result to the client.
    Type: Grant
    Filed: September 30, 2016
    Date of Patent: December 31, 2019
    Assignee: OPEN TEXT CORPORATION
    Inventors: Caroline Spruit, Petr Olegovich Pleshachkov
  • Patent number: 10445650
    Abstract: A processing unit can successively operate layers of a multilayer computational graph (MCG) according to a forward computational order to determine a topic value associated with a document based at least in part on content values associated with the document. The processing unit can successively determine, according to a reverse computational order, layer-specific deviation values associated with the layers based at least in part on the topic value, the content values, and a characteristic value associated with the document. The processing unit can determine a model adjustment value based at least in part on the layer-specific deviation values. The processing unit can modify at least one parameter associated with the MCG based at least in part on the model adjustment value. The MCG can be operated to provide a result characteristic value associated with test content values of a test document.
    Type: Grant
    Filed: November 23, 2015
    Date of Patent: October 15, 2019
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Jianfeng Gao, Li Deng, Xiaodong He, Lin Xiao, Xinying Song, Yelong Shen, Ji He, Jianshu Chen
  • Patent number: 10346511
    Abstract: The presently disclosed subject matter includes a computer-implemented system and method for receiving content from another computer device and dynamically adapting display of the received content within a container of a formatted document, the container defining a restricted area within the formatted document designated for displaying the content. Sub-elements within at least one content item are identified and tagged, the tagging enables to acquire display parameters of tagged sub-elements and calculate therefor a required adaptation of the content item such that it can be fitted within the respective container.
    Type: Grant
    Filed: June 1, 2017
    Date of Patent: July 9, 2019
    Assignee: TABOOLA.COM LTD.
    Inventor: Efraim Nadiv
  • Patent number: 10248681
    Abstract: A system and method for faster access for compressed time series data. A set of blocks are generated based on a table stored in a database of the data platform. The table stores data associated with multiple sources of data provided as consecutive values, each block containing index vectors having a range of the consecutive values. A block index is generated for each block having a field start vector representing a starting position of the block relative to the range of consecutive values, and a starting value vector representing a value of the block at the starting position. The field start vector of the block index is accessed to obtain the starting position of a field corresponding to a first block and to the range of the consecutive values of the first block. The starting value vector is then determined from the block index to determine an end and a length of the field of the first block.
    Type: Grant
    Filed: July 8, 2014
    Date of Patent: April 2, 2019
    Assignee: SAP SE
    Inventors: Gordon Gaumnitz, Robert Schulze, Lars Dannecker, Ivan Bowman, Dan Farrar
  • Patent number: 10235377
    Abstract: Innovations for adaptive compression and decompression for dictionaries of a column-store database can reduce the amount of memory used for columns of the database, allowing a system to keep column data in memory for more columns, while delays for access operations remain acceptable. For example, dictionary compression variants use different compression techniques and implementation options. Some dictionary compression variants provide more aggressive compression (reduced memory consumption) but result in slower run-time performance. Other dictionary compression variants provide less aggressive compression (higher memory consumption) but support faster run-time performance. As another example, a compression manager can automatically select a dictionary compression variant for a given column in a column-store database.
    Type: Grant
    Filed: December 23, 2013
    Date of Patent: March 19, 2019
    Assignee: SAP SE
    Inventors: Ingo Mueller, Cornelius Ratsch, Peter Sanders, Franz Faerber
  • Patent number: 10083082
    Abstract: A method to efficiently checkpoint and reconstruct an in-memory index associated with a log-structured object store includes enabling asynchronous write operations to occur to a log-structured object store. The log-structured object store utilizes an in-memory index to access objects therein. The method further enables checkpoint operations to occur to the log-structured object store without pausing the asynchronous write operations. When initiating checkpoint operations, the method establishes a “begin checkpoint” marker on the log-structured object store. This “begin checkpoint” marker is configured to point to an oldest known log location recorded in the in-memory index. In the event the in-memory index is lost, the method reconstructs the in-memory index by analyzing the log-structured object store starting from the oldest known log location. A corresponding system and computer program product are also disclosed and claimed herein.
    Type: Grant
    Filed: September 7, 2015
    Date of Patent: September 25, 2018
    Assignee: International Business Machines Corporation
    Inventors: Lawrence Y. Chiu, Paul H. Muench, Sangeetha Seshadri
  • Patent number: 10083089
    Abstract: A method to efficiently checkpoint and reconstruct an in-memory index associated with a log-structured object store includes enabling asynchronous write operations to occur to a log-structured object store. The log-structured object store utilizes an in-memory index to access objects therein. The method further enables checkpoint operations to occur to the log-structured object store without pausing the asynchronous write operations. When initiating checkpoint operations, the method establishes a “begin checkpoint” marker on the log-structured object store. This “begin checkpoint” marker is configured to point to an earliest address in the log-structured object store that is uncommitted to the in-memory index. In the event the in-memory index is lost, the method reconstructs the in-memory index by analyzing the log-structured object store starting from the earliest address uncommitted to the in-memory index. A corresponding system and computer program product are also disclosed.
    Type: Grant
    Filed: September 7, 2015
    Date of Patent: September 25, 2018
    Assignee: International Business Machines Corporation
    Inventors: Lawrence Y. Chiu, Paul H. Muench, Sangeetha Seshadri
  • Patent number: 10061808
    Abstract: Embodiments relate to view caching techniques that cache for a limited time, some of the (intermediate) results of a previous query execution, in order to avoid expensive re-computation of query results. Particular embodiments may utilize a cache manager to determine whether information relevant to a subsequent user request can be satisfied by an existing cache instance or view, or whether creation of an additional cache instance is appropriate. At design time, cache defining columns of a view are defined, with user input parameters automatically being cache defining. Cache instances are created for each tuple of literals for the cache defining columns, and for each explicit or implicit group by clause. Certain embodiments may feature enhanced reuse between cache instances, in order to limit memory footprint. Over time a cache instances may be evicted from memory based upon implementation of a policy such as a Least Recently Used (LRU) strategy.
    Type: Grant
    Filed: June 3, 2014
    Date of Patent: August 28, 2018
    Assignee: SAP SE
    Inventors: Ki Hong Kim, Norman May, Alexander Boehm, Sung Heun Wi, Jeong Ae Han, Sang Il Song, Yongsik Yoon
  • Patent number: 10019518
    Abstract: Methods and systems are disclosed that relate to ranking functions for multiple different domains. By way of example but not limitation, ranking functions for multiple different domains may be trained based on inter-domain loss, and such ranking functions may be used to rank search results from multiple different domains so that they may be blended without normalizing relevancy scores.
    Type: Grant
    Filed: October 9, 2009
    Date of Patent: July 10, 2018
    Assignee: Excalibur IP, LLC
    Inventors: Jiang Chen, Wei Chu, Zhenzhen Kou, Zhaohui Zheng
  • Patent number: 9990362
    Abstract: Profiling data includes processing an accessed collection of records, including: generating, for a first set of distinct values appearing in a first set of one or more fields, corresponding location information; generating, for the first set of fields, a corresponding list of entries identifying a distinct value from the first set of distinct values and the location information for the distinct value; generating, for a second set of one or more fields, a corresponding list of entries, with each entry identifying a distinct value from a second set of distinct values appearing in the second set of fields; and generating result information, based at least in part on: locating at least one record of the collection using the location information for at least one value appearing in the first set of fields, and determining at least one value appearing in the second set of fields of the located record.
    Type: Grant
    Filed: September 21, 2015
    Date of Patent: June 5, 2018
    Assignee: Ab Initio Technology LLC
    Inventor: Arlen Anderson
  • Patent number: 9952778
    Abstract: A data processing technology is provided, and is applied to a partition management device. The partition management device stores a partition view, the partition view records a correspondence between an ID of a current partition and an address of a storage disk, and a total quantity of current partitions may be less than a total quantity of final partitions. By using the technology, data forwarding may be performed on key-value data by using a current partition, thereby reducing complexity of a partition view.
    Type: Grant
    Filed: May 4, 2017
    Date of Patent: April 24, 2018
    Assignee: HUAWEI TECHNOLOGIES CO., LTD.
    Inventor: Xiong Luo
  • Patent number: 9921945
    Abstract: Aspects provide for automatic verification of JavaScript Object Notation (JSON) data by making a JSON call via an Extensible Markup Language (XML) Hypertext Transfer Protocol (HTTP) HTTP object against a data warehouse data item stored in a back end server. JSON response data returned from the back end server in response to the JSON call is converted into actual XML result data that includes a first plurality of XML statements. A Structured Query Language (SQL) query is executed against the data warehouse data item, and expected XML result data generated in response thereto that include a different (second) plurality of XML statements. The JSON response data returned from the back end server is thereby verified in response to matching the actual XML result data to the expected XML result data.
    Type: Grant
    Filed: April 6, 2015
    Date of Patent: March 20, 2018
    Assignee: ADP, LLC
    Inventors: Tista Das, Sachin V. Havaldar, Laiyuan Liu
  • Patent number: 9916314
    Abstract: An AND operation is performed for an integrated appearance map of a compression code of character data “”, an integrated appearance map of a compression code of character data “”, and an integrated deletion map for a segment. The AND result is “1100” and it is found that the character data “” and “” are likely to be present in the segments (sg1(1)) and (sg1(2)). Since the segments are specified from the AND result, the AND operations are performed. As a result, the segments are specified and the AND operations are performed. As a result, a file number 3 is specified from the segment (sg0(1)) and a file number 19 is specified from the segment (sg0(5)). Therefore, it is found that both of the character data “” and “” are present in compression files (f3) and (f19).
    Type: Grant
    Filed: March 10, 2014
    Date of Patent: March 13, 2018
    Assignee: FUJITSU LIMITED
    Inventors: Masahiro Kataoka, Ryo Matsumura
  • Patent number: 9846688
    Abstract: Techniques for use with electronic book readers include coordinating or translating position information between different versions of an electronic book. Positions within different versions can be translated for various purposes, such as transferring annotations between versions or synchronizing positions within different versions.
    Type: Grant
    Filed: December 28, 2010
    Date of Patent: December 19, 2017
    Assignee: Amazon Technologies, Inc.
    Inventors: Christopher F. Weight, Janna Hamaker, Tom Killalea, Bruno A. Posokhow, Daniel B. Rausch
  • Patent number: 9773054
    Abstract: According to an aspect, storing and querying conceptual indices (CIs) includes creating a conceptual inverted index (CII) from the CIs. The CII includes CII entries, each of which corresponds to a concept in a concept graph. Creating the CII includes populating each entry with pointers to documents selected from the CIs having likelihoods of being related to the concept that are greater than a threshold value, and the corresponding likelihoods. An aspect also includes receiving a query that includes a concept in the concept graph, and generating query results from a search that include the row at least a subset of the pointers to documents. Each of the CIs is associated with a corresponding document and includes a CI entry for each concept in the concept graph, and each of the CI entries specifies a value indicating a likelihood that the document is related to the concept in the concept graph.
    Type: Grant
    Filed: March 11, 2015
    Date of Patent: September 26, 2017
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michele M. Franceschini, Luis A. Lastras-Montano, Livio B. Soares, Mark N. Wegman
  • Patent number: 9703858
    Abstract: According to an aspect, storing and querying conceptual indices (CIs) includes creating a conceptual inverted index (CII) from the CIs. The CII includes CII entries, each of which corresponds to a concept in a concept graph. Creating the CII includes populating each entry with pointers to documents selected from the CIs having likelihoods of being related to the concept that are greater than a threshold value, and the corresponding likelihoods. An aspect also includes receiving a query that includes a concept in the concept graph, and generating query results from a search that include at least a subset of the pointers to documents. Each of the CIs is associated with a corresponding document and includes a CI entry for each concept in the concept graph, and each of the CI entries specifies a value indicating a likelihood that the document is related to the concept in the concept graph.
    Type: Grant
    Filed: July 14, 2014
    Date of Patent: July 11, 2017
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michele M. Franceschini, Luis A. Lastras-Montano, Livio B. Soares, Mark N. Wegman
  • Patent number: 9665605
    Abstract: Methods and apparatus for building a search index for a database are disclosed. When an incremental build trigger is detected (e.g., a threshold number of documents are added to database), the system determines which sub-indexes need to be updated and which sub-indexes do not need to be updated. Rather than update the affected sub-indexes directly, the system builds new sub-indexes to replace the affected sub-indexes. Database queries that occur during the generation of the replacement sub-indexes use the old sub-indexes. When the new sub-indexes are ready, the system moves pointers from the old sub-indexes to the new sub-indexes so that subsequent database queries use the new sub-indexes.
    Type: Grant
    Filed: September 9, 2014
    Date of Patent: May 30, 2017
    Assignee: KCURA LLC
    Inventors: Mikhail Kogan, Michael B. Goldstein, Vidhyapriya Govindarajan, Keith L. Kaminski, Mason D. May, Fatima Z. Mecci, Nikita Solilov, Kyle A. Stachowiak
  • Patent number: 9665568
    Abstract: Methods, apparatus and systems, including computer program products, for creating subject matter synonyms from definitions extracted from a subject matter glossary. Confidence scores, each representing a likelihood that two terms defined in the subject matter glossary are synonyms, are determined by applying natural language processing (e.g., passage term matching, lexical matching, and syntactic matching) to the extracted definitions. A subject matter thesaurus is built based on the confidence scores. In one embodiment, a statement containing a first term is created based on an extracted definition of the first term, a modified statement is created by substituting a second term in the statement in lieu of the first term, a corpus is searched, and a confidence score is determined based on evidence in the corpus that the modified statement is accurate. The first and second terms are marked as synonyms if the confidence score is greater than a threshold.
    Type: Grant
    Filed: February 12, 2016
    Date of Patent: May 30, 2017
    Assignee: International Business Machines Corporation
    Inventors: Scott N. Gerard, Mark G. Megerian
  • Patent number: 9589277
    Abstract: Methods, computer systems, and computer storage media are provided for evaluating information retrieval (IR) such as search query results (including advertisements) by a machine learning scorer. In an embodiment, a set of features is derived from a query and a machine learning algorithm is applied to construct a linear model of (query, ads) for scoring by maximizing a relevance metric. In an embodiment, the machine learned scorer is adapted for use with WAND algorithm based ad selection.
    Type: Grant
    Filed: December 31, 2013
    Date of Patent: March 7, 2017
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Bruce Zhang, Jianchang Mao, Yuan Shen
  • Patent number: 9535983
    Abstract: Storing text samples in a manner that the text samples may be quickly searched. The text samples are assigned a text sample identifier and are each parsed to thereby extract text components from the text samples. Text components that have the same content are assigned the same text component identifier. For each parsed text component, a text component entry is created that includes the assigned text component identifier as well as the text sample identifier for the text sample from which the text component was parsed. A text sample entry group is created for each text sample that contains the text component entries in sequence for the text components found within the text sample. The text sample entry groups are stored so as to be scannable during a future search.
    Type: Grant
    Filed: October 29, 2013
    Date of Patent: January 3, 2017
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Cristian Petculescu, Marius Dumitru, Vasile Paraschiv, Amir Netz, Paul Jonathon Sanders
  • Patent number: 9367621
    Abstract: A method and system for automatically updating searches are described. In one embodiment, a first search result may be compared with a second search result to automatically identify at least one data item within the first search result that is changed relative to the second search result. The at least one data item may comprise a transaction term. A notification of the at least one data item may be transmitted to a user device.
    Type: Grant
    Filed: January 20, 2014
    Date of Patent: June 14, 2016
    Assignee: eBay Inc.
    Inventors: Wen Wen, Patricia Ng
  • Patent number: 9311300
    Abstract: Methods, apparatus and systems, including computer program products, for creating subject matter synonyms from definitions extracted from a subject matter glossary. Confidence scores, each representing a likelihood that two terms defined in the subject matter glossary are synonyms, are determined by applying natural language processing (e.g., passage term matching, lexical matching, and syntactic matching) to the extracted definitions. A subject matter thesaurus is built based on the confidence scores. In one embodiment, a statement containing a first term is created based on an extracted definition of the first term, a modified statement is created by substituting a second term in the statement in lieu of the first term, a corpus is searched, and a confidence score is determined based on evidence in the corpus that the modified statement is accurate. The first and second terms are marked as synonyms if the confidence score is greater than a threshold.
    Type: Grant
    Filed: September 13, 2013
    Date of Patent: April 12, 2016
    Assignee: International Business Machines Corporation
    Inventors: Scott N. Gerard, Mark G. Megerian
  • Patent number: 9244931
    Abstract: Techniques provide time-aware ranking, such as ranking of information, files or URL (uniform resource locator) links. For example, time-aware modeling assists in determining user intent of a query to a search engine. In response to the query, results are ranked in a time-aware manner to better match the user intent. The ranking may model query, URL and query-URL pair behavior over time to create time-aware query, URL and query-URL pair models, respectively. Such models may predict behavior of a query-URL pair, such as frequency and timing of clicks to the URL of the pair when the query of the pair is posed to the search engine. Results of a query may be ranked by predicted query-URL behavior. Once ranked, the results may be sent to the user in response to the query.
    Type: Grant
    Filed: October 11, 2011
    Date of Patent: January 26, 2016
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Kira Radinsky, Susan T. Dumais, Krysta M Svore, Jaime Brooks Teevan, Eric J. Horvitz
  • Patent number: 9218414
    Abstract: A method for searching multiple documents on a computer system includes steps for sending a query to a system core where the query is passed to a search component for searching the documents. The system core in turn receives results from the search component indicating related documents to the query and passes to a summarization component a specified number of the results. The summarization component processes related documents corresponding to the specified number of results to produce a multi-document summary. The system core receives the summary from the summarization component. The multi-document summary is received from the system core.
    Type: Grant
    Filed: June 29, 2012
    Date of Patent: December 22, 2015
    Inventor: Dmitri Soubbotin
  • Patent number: 9202079
    Abstract: A method, system, and computer-readable memory containing instructions include employing a tokenizing authority to obtain a tokenized query term that represents a query term, using the tokenized query term to perform a lookup against a tokenized term database, determining whether the tokenized query term exists in the database. The method, system, and computer-readable memory may further include returning an encryption or decryption key corresponding to an encrypted record of information associated with the query term and corresponding to the tokenized query term.
    Type: Grant
    Filed: October 25, 2012
    Date of Patent: December 1, 2015
    Assignee: VERISIGN, INC.
    Inventor: Burton S. Kaliski, Jr.
  • Patent number: 9122748
    Abstract: Techniques and tools are described for matching documents against monitors. An index can be generated from a plurality of monitors, where the index represents the query logic of the plurality of monitors. The index can be searched using the documents as search queries. The searching can comprise matching the documents against the monitors using the query logic represented in the index. An index can be distributed to a plurality of computing devices to be searched at the plurality of computing devices, where each computing device searches a subset of a plurality of documents against the full index. Searching at the plurality of computing devices can be performed in parallel, and results can be aggregated at a central location.
    Type: Grant
    Filed: March 23, 2012
    Date of Patent: September 1, 2015
    Assignee: Jive Software, Inc.
    Inventor: Lance Riedel
  • Patent number: 9122733
    Abstract: A pedigree data processing system receives a first item from an upstream partner and generates a receive native event for the first item. The mechanism receives pedigree data for the first item from the upstream partner, generates at least one synthetic event based on the pedigree data and stores the receive native event and the at least one synthetic event in a pedigree data repository. The pedigree data processing system determines whether to send electronic pedigree information for the first item to downstream partners using push data exchange or pull data exchange. The pedigree data processing system generates an electronic pedigree for the first item using pull data exchange based on the receive native event and the at least one synthetic event and provides the electronic pedigree to a first downstream partner pedigree system.
    Type: Grant
    Filed: December 3, 2012
    Date of Patent: September 1, 2015
    Assignee: International Business Machines Corporation
    Inventors: Victor Dogaru, Arthur F. Kaufmann, Martin A. Siegenthaler
  • Patent number: 9116969
    Abstract: A pedigree data processing system receives a first item from an upstream partner and generates a receive native event for the first item. The mechanism receives pedigree data for the first item from the upstream partner, generates at least one synthetic event based on the pedigree data and stores the receive native event and the at least one synthetic event in a pedigree data repository. The pedigree data processing system determines whether to send electronic pedigree information for the first item to downstream partners using push data exchange or pull data exchange. The pedigree data processing system generates an electronic pedigree for the first item using pull data exchange based on the receive native event and the at least one synthetic event and provides the electronic pedigree to a first downstream partner pedigree system.
    Type: Grant
    Filed: April 30, 2012
    Date of Patent: August 25, 2015
    Assignee: International Business Machines Corporation
    Inventors: Victor Dogaru, Arthur F. Kaufmann, Martin A. Siegenthaler
  • Patent number: 9058377
    Abstract: This specification describes technologies relating to fixed width encoding/decoding of document posting lists. In general, one aspect of the subject matter described in this specification can be embodied in apparatuses that include a server obtaining a list of one or more of document identification numbers, each of the document identification numbers uniquely identifying a document; an encoding device operatively connected to the server, the encoding device generating a sequence of deltas from the sequential list of one or more of the document identification numbers, and encoding each delta in the sequence of deltas using a fixed-width encoding scheme.
    Type: Grant
    Filed: June 3, 2011
    Date of Patent: June 16, 2015
    Assignee: Google Inc.
    Inventors: Priyendra Deshwal, Srdjan Petrovic, Asim Shankar
  • Publication number: 20150142821
    Abstract: A database system performs analytics on longitudinal data, such as medical histories with events occurring to patients over time. Input data is processed into streams of events. A set of indexes of event characteristics is generated. A set of patient event histories, partitioned by patient, is generated. Several copies of event data are stored, each copy being structured to support a specific analytical task. Data is partitioned and distributed over several hardware nodes to allow parallel queries. Definitions of sets of candidate patients are translated into sets of filters applied to the set of indexes. Data for these candidates are input to analytical modules. Reports from analysis are automatically generated to be compatible with standard guidelines for reporting. Workflows support one task or a set of closely related tasks by offering the user a defined sequence of query options and analytic choices specifically arranged for the task.
    Type: Application
    Filed: November 18, 2013
    Publication date: May 21, 2015
    Inventors: Jeremy Rassen, Allon Rauer, Sebastian Schneeweiss
  • Publication number: 20150127648
    Abstract: A method for generating image descriptors for media content of images represented by a set of key-points, fn, is recommended which determines for each key-point of the image, designated as a central key-point, a neighbourhood of other key-points, fml, whose features are expressed relative to those of the central key-point. A sparse photo-geometric descriptor, SPGD, of each key-point in the image being a representation of the geometry and intensity content of a feature and its neighbourhood is provided to perform an efficient image querying for efficient searches. The approach demonstrates that incorporating geometrical constraints in image registration applications does not need to be a computationally demanding operation carried out to refine a query response short-list.
    Type: Application
    Filed: June 7, 2012
    Publication date: May 7, 2015
    Applicant: THOMSON LICENSING
    Inventors: Patrick Perez, Joaquin Salvatierra Zepeda
  • Patent number: 9026538
    Abstract: The present invention provides a method for performing transactions on data entities in a database and a transactional database. The database comprises an ordered set of data stores with at least one static data store, wherein said static data store uses an index structure based on a non-updatable representation of an ordered set of integers according to the principle of compressed inverted indices. The method allows to generate a modifiable data store when the performed transaction comprises an insert, update or delete operation, to execute operations of the transaction on the ordered set being present at the time when the transaction has been started and, if present, on the modifiable data store and to convert data stores to a new static data store, The insert, update or delete operation are executed on the modifiable data store which is the only data store modifiable for the transaction.
    Type: Grant
    Filed: October 13, 2009
    Date of Patent: May 5, 2015
    Assignee: Open Text S.A.
    Inventors: Gary J. Promhouse, Matthew David George Timmermans, Karl-Heinz Krachenfels