Inverted Index Patents (Class 707/742)
  • Patent number: 11947512
    Abstract: The disclosed technology is generally directed to the compression of inverted indexes. In one example of the technology, an inverted index that includes a plurality of posting lists and metadata is provided. The inverted index indicates compression settings that are associated with the plurality of posting lists. At periodic scheduled times, a regeneration is performed on the inverted index. The regeneration includes decompressing the inverted index. The decompressing uses the compression settings indicated by the inverted index. The regeneration further includes determining compression settings to use during a next periodic scheduled time of the plurality of periodic scheduled times, such that at least a first posting list of the plurality of posting lists uses a different compression setting than a second posting list of the plurality of posting lists.
    Type: Grant
    Filed: February 22, 2022
    Date of Patent: April 2, 2024
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Torsten Amundsen, Pavel Sukhov
  • Patent number: 11934370
    Abstract: Systems and methods are disclosed to implement an indexing engine that maintains an index in an index store for a storage object in a data store. In embodiments, the index store may be implemented using an in-memory storage cluster separate from the data store. The storage object may have multiple indexes, which may have different filtering or sorting criteria for the data. In embodiments, updates to the storage object are received as an update stream by the indexing engine. Based on configurable indexing rules, the indexing engine applies the updates to the appropriate indexes. To service a query to the data store, a query engine first retrieves a set of keys satisfying the query from the index store, and then data corresponding to the keys from the data store or another index. In embodiments, the index may be refreshed via touch updates of selected data in the storage object.
    Type: Grant
    Filed: December 11, 2017
    Date of Patent: March 19, 2024
    Assignee: Amazon Technologies, Inc.
    Inventors: Long Nguyen, Dominic Corona, Fletcher Liverance
  • Patent number: 11914583
    Abstract: Various embodiments are directed to a system that utilizes regular expression (regex) to recognize at least portions of characters, words, text, numbers, etc. in a structured or unstructured dataset, any patterns associated therewith, and/or similarities between the determined patterns. In examples, a regex-based pattern recognition platform may receive a dataset and determine whether at least a first regex pattern and a second regex pattern can be identified. The occurrences of the first and second regex patterns and the frequency of those occurrences may reveal something about the dataset itself or any patterns contained therein.
    Type: Grant
    Filed: September 16, 2020
    Date of Patent: February 27, 2024
    Assignee: Capital One Services, LLC
    Inventors: Jeremy Edward Goodsitt, Austin Grant Walters, Reza Farivar, Mark Louis Watson, Anh Truong, Galen Rafferty, Vincent Pham
  • Patent number: 11876851
    Abstract: A messaging channel is embedded directly into a media stream. Messages delivered via the embedded messaging channel are extracted at a client media player. According to a variant embodiment, and in lieu of embedding all of the message data in the media stream, only a coordination index is injected, and the message data is sent separately and merged into the media stream downstream (at the client media player) based on the coordination index. In one example embodiment, multiple data streams (each potentially with different content intended for a particular “type” or class of user) are transmitted alongside the video stream in which the coordination index (e.g., a sequence number) has been injected into a video frame. Based on a user's service level, a particular one of the multiple data streams is released when the sequence number appears in the video frame, and the data in that stream is associated with the media.
    Type: Grant
    Filed: March 22, 2022
    Date of Patent: January 16, 2024
    Assignee: Akamai Technologies, Inc.
    Inventors: Mark M. Ingerman, Michael Archer
  • Patent number: 11841879
    Abstract: Described herein is a computer implemented method for identifying one or more documents of potential relevance to an input query. The method comprises receiving the input query; processing input text from the query to generate an input query vector; accessing document records from a record database, each document record including a document vector; generating a document similarity score in respect of each accessed document, the document similarity score for a given document record being generated using the document vector for the given document record and the input query vector, the document similarity score for a given document record indicating the similarity of the input text to a document that the given document record is in respect of; and identifying one or more potentially relevant document records based on their document similarity scores.
    Type: Grant
    Filed: August 20, 2022
    Date of Patent: December 12, 2023
    Assignees: ATLASSIAN PTY LTD., ATLASSIAN US, INC.
    Inventors: Geoff Sims, Michael Fulthorp, Mike Ortman, Jeff Nelson, Matthew Hunter
  • Patent number: 11829755
    Abstract: Systems, apparatuses, and methods related to acceleration circuitry for posit operations are described. Signaling indicative of performance of an operation to write a first bit string to a first buffer resident on acceleration circuitry and a second bit string resident on the acceleration circuitry can be received at an DMA controller couplable to the acceleration circuitry. The acceleration circuitry can be configured to perform arithmetic operations, logical operations, or both on bit strings formatted in a unum or posit format. Signaling indicative of an arithmetic operation, a logical operation, or both, to be performed using the first and second bit strings can be transmitted to the acceleration circuitry. The arithmetic operation, the logical operation, or both can be performed via the acceleration circuitry and according to the signaling. Signaling indicative of a result of the arithmetic operation, the logical operation, or both can be transmitting to the DMA controller.
    Type: Grant
    Filed: August 1, 2022
    Date of Patent: November 28, 2023
    Assignee: Micron Technology, Inc.
    Inventors: Vijay S. Ramesh, Phillip G. Hays, Craig M. Cutler, Andrew J. Rees
  • Patent number: 11789950
    Abstract: Systems and methods are described for a streaming data processing system that defers processing of some data based on a determined importance of the data. A streaming data processing system can ingest a data stream that contains multiple events, and can extract data field values from individual events and process the data field values to determine event importance. The streaming data processing system can then do further processing and indexing of high importance events, and can generate a storage prefix for each low importance event that determines where to store the low importance event in a data storage system. The streaming data processing system can then process queries by retrieving the indexed high importance events, and can extract the data field values from a high importance event to determine the storage prefix for retrieving corresponding low importance events from the data storage system.
    Type: Grant
    Filed: October 19, 2020
    Date of Patent: October 17, 2023
    Assignee: Splunk Inc.
    Inventors: Paul Jean André Bernier, Poornima Devaraj, Ivneet Kaur, Zhimin Liang, Min Zhang
  • Patent number: 11775755
    Abstract: Described herein are improved systems and methods for overcoming technical problems associated with processing and visualization of textual data and natural language processing. In some examples, a method is provided for determining sentiment associated with big data analysis of database information. In some examples, textual news data (e.g., NEWS API, RSS, etc.) is received via a communications network from a plurality of data platforms. The textual news data is parsed, and syntactic dependency trees are generated therefrom. A sentiment score is derived for the parsed textual data corresponding to a word or phrase associated with the textual data, and an image is generated reflecting scored sentiment for the parsed textual data.
    Type: Grant
    Filed: March 2, 2023
    Date of Patent: October 3, 2023
    Assignee: TLDR LLC
    Inventors: Benjamin Olander, Jedediah Carty, Matthew Van Dusen, Philip Stockton
  • Patent number: 11698889
    Abstract: Embodiments of the present disclosure relate to processing data. An example method includes acquiring data related to a first moment in streaming data of an object to be processed. The method further includes storing the data in a first entry of a data table based on an identification of the object to be processed, wherein the data table further includes a second entry before the first entry, and the second entry stores data related to a second moment before the first moment in the streaming data. The method further includes updating an index related to the object to be processed based on the first entry. Thus, a solution to the problem of performing search in data at different moments is provided, and it is unnecessary for a user to participate in the solution, thus improving the user experience and reducing the use of storage resources.
    Type: Grant
    Filed: April 30, 2021
    Date of Patent: July 11, 2023
    Assignee: EMC IP HOLDING COMPANY LLC
    Inventors: Pengfei Su, Lu Lei, Julius Jian Zhu
  • Patent number: 11693839
    Abstract: A method includes obtaining a query containing at least one field from which data is being queried, obtaining a dataset having a schema-free data exchange format having multiple fields of data at different physical positions in the dataset, and parsing the dataset by obtaining a structural index that maps logical locations of fields to physical locations of the fields of the dataset, accessing the structural index with logical locations of the fields that index to the physical locations, and providing data from the fields based on the physical locations responsive to the query.
    Type: Grant
    Filed: September 10, 2020
    Date of Patent: July 4, 2023
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Yinan Li, Nikolaos Romanos Katsipoulakis, Badrish Chandramouli, Jonathan D Goldstein, Donald Kossmann
  • Patent number: 11687514
    Abstract: Multimodal table encoding, including: Receiving an electronic document that contains a table. The table includes multiple rows, multiple columns, and a schema comprising column labels or row labels. The electronic document includes a description of the table which is located externally to the table. Next, operating separate machine learning encoders to separately encode the description, schema, each of the rows, and each of the columns of the table, respectively. The schema, the rows, and the columns are encoded together with end-of-column tokens and end-of-row tokens that mark an end of each column and row, respectively. Then, applying a machine learning gating mechanism to the encoded description, encoded schema, encoded rows, and encoded columns, to produce a fused encoding of the table, wherein the fused encoding is representative of both a structure of the table and a content of the table.
    Type: Grant
    Filed: July 15, 2020
    Date of Patent: June 27, 2023
    Assignee: International Business Machines Corporation
    Inventors: Roee Shraga, Haggai Roitman, Guy Feigenblat, Mustafa Canim
  • Patent number: 11671413
    Abstract: A technique to cache content securely within edge network environments, even within portions of that network that might be considered less secure than what a customer desires, while still providing the acceleration and off-loading benefits of the edge network. The approach ensures that customer confidential data (whether content, keys, etc.) are not exposed either in transit or at rest. In this approach, only encrypted copies of the customer's content objects are maintained within the portion of the edge network, but without any need to manage the encryption keys. To take full advantage of the secure content caching technique, preferably the encrypted content (or portions thereof) are pre-positioned within the edge network portion to improve performance of secure content delivery from the environment.
    Type: Grant
    Filed: January 26, 2021
    Date of Patent: June 6, 2023
    Assignee: Akamai Technologies, Inc.
    Inventor: Tong Chen
  • Patent number: 11659033
    Abstract: A technique to cache content securely within edge network environments, even within portions of that network that might be considered less secure than what a customer desires, while still providing the acceleration and off-loading benefits of the edge network. The approach ensures that customer confidential data (whether content, keys, etc.) are not exposed either in transit or at rest. In this approach, only encrypted copies of the customer's content objects are maintained within the portion of the edge network, but without any need to manage the encryption keys. To take full advantage of the secure content caching technique, preferably the encrypted content (or portions thereof) are pre-positioned within the edge network portion to improve performance of secure content delivery from the environment.
    Type: Grant
    Filed: January 25, 2021
    Date of Patent: May 23, 2023
    Assignee: Akamai Technologies, Inc.
    Inventor: Tong Chen
  • Patent number: 11593388
    Abstract: A method and a computer program product are used generating an index of a scoring payload dataset. Correlation coefficients for correlations between input data values and output data values of the machine learning model provided by the scoring payload datasets as well as performance data values of the processes provided by process datasets are calculated. Features of which feature values are used as input data values are ranked according to their importance using the correlation coefficients. For the features of a set of highest-ranking features feature value sets with feature values of the respective features are selected from the scoring payload datasets and a database index of the selected feature value sets is generated.
    Type: Grant
    Filed: March 19, 2021
    Date of Patent: February 28, 2023
    Assignee: International Business Machines Corporation
    Inventors: Rafal Bigaj, Lukasz G. Cmielowski, Wojciech Sobala, Maksymilian Erazmus
  • Patent number: 11544569
    Abstract: A method includes receiving an image by a deep neural network (DNN) and obtaining a first feature map based on the image while the DNN is in a trained state, wherein the DNN is configured to perform a task based on the image, and is trained with a training image by using a feature sparsification with smoothness regularization process and a back propagation and weight update process that updates the DNN based on an output of the feature sparsification with smoothness regularization process.
    Type: Grant
    Filed: October 5, 2020
    Date of Patent: January 3, 2023
    Assignee: TENCENT AMERICA LLC
    Inventors: Wei Jiang, Wei Wang, Shan Liu
  • Patent number: 11397715
    Abstract: Indexing and matching records in a data management system by defining entity indexing attributes associated with system records, receiving an incoming data entity, selecting a set of entity candidates according to the entity indexing attributes, matching the incoming entity to an entity candidate, generating an analysis of the entity candidate selection according to entity attribute effectiveness, and revising the entity indexing attributes according to the analysis.
    Type: Grant
    Filed: July 31, 2019
    Date of Patent: July 26, 2022
    Assignee: International Business Machines Corporation
    Inventors: Shettigar Parkala Srinivas, Soma Shekar Naganna, Neeraj Ramkrishna Singh, Abhishek Seth, Prabhakaran Ramalingam
  • Patent number: 11334723
    Abstract: A method for processing untagged data includes: similarity comparison is performed on a semantic vector of untagged data and a semantic vector of each piece of tagged data to obtain similarities corresponding to respective pieces of tagged data; a preset number of similarities are selected according to a preset selection rule; the untagged data is predicted with a tagging model obtained by training through the tagged data, to obtain a prediction result of the untagged data; and the untagged data is divided into untagged data that can be tagged by a device or untagged data that cannot be tagged by the device according to the preset number of similarities and the prediction result.
    Type: Grant
    Filed: November 16, 2019
    Date of Patent: May 17, 2022
    Assignee: Beijing Xiaomi Intelligent Technology Co., Ltd.
    Inventors: Xiaotong Pan, Zuopeng Liu
  • Patent number: 11204926
    Abstract: A tuple manager of a database system processes partial tuples from a streaming application and stores them in a database. The partial tuples may include a large object (LOB) that arrives at the database at a different time than the rest of the corresponding tuple. A tuple manager stores partial tuples and uses a partial tuples index to track the partial tuples and coordinate recombination of corresponding partial tuples. The database allows queries to be run on the partial data before the tuples are reconstructed allowing faster access to potentially important data before the arrival and processing of a partial tuple such as an LOB.
    Type: Grant
    Filed: October 31, 2018
    Date of Patent: December 21, 2021
    Assignee: International Business Machines Corporation
    Inventors: Rafal P. Konik, Jessica R. Eidem, Jingdong Sun, Roger A. Mittelstadt
  • Patent number: 11157678
    Abstract: The presently disclosed subject matter includes a computer-implemented system and method for receiving content from another computer device and dynamically adapting display of the received content within a container of a formatted document, the container defining a restricted area within the formatted document designated for displaying the content. Sub-elements within at least one content item are identified and tagged, the tagging enables to acquire display parameters of tagged sub-elements and calculate therefor a required adaptation of the content item such that it can be fitted within the respective container.
    Type: Grant
    Filed: May 6, 2019
    Date of Patent: October 26, 2021
    Assignee: TABOOLA.COM LTD
    Inventor: Efraim Nadiv
  • Patent number: 11157477
    Abstract: A method, computer system, and computer program product for segment differential-based document text-index modeling are provided. The embodiment may include receiving, by a processor, a document with a valid document ID and version ID tuple. The embodiment may also include determining the received document is a new version of a previously stored document and consequently multiplexing versions of the document into a single indexed document. The embodiment may further include segmenting the received document and building a token vector. The embodiment may also include calculating a difference between the received new version of the document and the previously stored document using information obtained from the segmentation. The embodiment may further include in response to the calculated difference being below a pre-configured threshold value, discarding the received new version.
    Type: Grant
    Filed: November 28, 2018
    Date of Patent: October 26, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Roger C. Raphael, Rajesh M. Desai, Fumihiko Terui, Justo L. Perez, Thomas Hampp
  • Patent number: 11126632
    Abstract: Systems and methods are disclosed for executing a query that includes an indication to process data managed by an external data system. The system identifies the external data system that manages the data to be processed, and obtained search configuration data from the external system. The system uses the search configuration data to generate a subquery for the external data system. The system also generates instructions for one or more worker nodes to receive and process results of the subquery from the external data system.
    Type: Grant
    Filed: July 31, 2018
    Date of Patent: September 21, 2021
    Assignee: Splunk Inc.
    Inventors: Sourav Pal, Arindam Bhattacharjee
  • Patent number: 11106663
    Abstract: A search for a regular expression in a tree hierarchy, includes, in part, searching for a match to the regular expression in a first subtree defined by a first node name, recording information about the first subtree if there is no match, determining whether a second subtree defined by a second node name is identical to the first node, skipping search of the second subtree if the second subtree is determined to be identical and prefix equivalent, with respect to the regular expression, to the first subtree. The second subtree is determined to be prefix equivalent to the first subtree when for any string s, a first prefix defined by a concatenation of the first node name and the string s results in a match if and only if a second prefix defined by a concatenation of the second node name and the string s results in a match.
    Type: Grant
    Filed: February 22, 2019
    Date of Patent: August 31, 2021
    Assignee: Synopsys, Inc.
    Inventors: Ilya Kudryavtsev, Daniel Geist, Boris Gommershtadt
  • Patent number: 11087765
    Abstract: In one example, a method includes method comprising: receiving audio data generated by a microphone of a current computing device; identifying, based on the audio data, one or more computing devices that each emitted a respective audio signal in response to speech reception being activated at the current computing device; and selecting either the current computing device or a particular computing device from the identified one or more computing devices to satisfy a spoken utterance determined based on the audio data.
    Type: Grant
    Filed: March 15, 2021
    Date of Patent: August 10, 2021
    Assignee: GOOGLE LLC
    Inventor: Jian Wei Leong
  • Patent number: 11049505
    Abstract: In one example, a method includes method comprising: receiving audio data generated by a microphone of a current computing device; identifying, based on the audio data, one or more computing devices that each emitted a respective audio signal in response to speech reception being activated at the current computing device; and selecting either the current computing device or a particular computing device from the identified one or more computing devices to satisfy a spoken utterance determined based on the audio data.
    Type: Grant
    Filed: March 15, 2021
    Date of Patent: June 29, 2021
    Assignee: GOOGLE LLC
    Inventor: Jian Wei Leong
  • Patent number: 11030242
    Abstract: A search system processes queries for accessing information stored in documents. A document comprises fields. The search system stores a plurality of indexes in a key-value store. Each index comprises key-value pairs. A key of a key-value pair is obtained by combining field data describing a field of a document. The value of each field is stored as an individual key-value in the key-value store. The search system receives a query requesting information stored in documents and specifying a search criteria. The search system builds a key-expression based on the search criteria and uses one or more indexes to find key-value pairs matching the key-expression. The search system finds the requested information based on the matching key-value pairs and provides the requested information to the query source.
    Type: Grant
    Filed: October 15, 2018
    Date of Patent: June 8, 2021
    Assignee: Rockset, Inc.
    Inventors: Dhruba Borthakur, Venkat Venkataramani, Igor Canadi, Tudor Bosman
  • Patent number: 11003692
    Abstract: Systems, methods, and non-transitory computer-readable media can obtain a first batch of content items to be clustered. A set of clusters can be generated by clustering respective binary hash codes for each content item in the first batch, wherein content items included in a cluster are visually similar to one another. A next batch of content items to be clustered can be obtained. One or more respective binary hash codes for the content items in the next batch can be assigned to a cluster in the set of clusters.
    Type: Grant
    Filed: December 28, 2015
    Date of Patent: May 11, 2021
    Assignee: Facebook, Inc.
    Inventors: Yunchao Gong, Marcin Pawlowski, Fei Yang, Lubomir Bourdev, Louis Dominic Brandy, Robert D. Fergus
  • Patent number: 10997138
    Abstract: Embodiments are directed towards a method for searching data. The method comprises providing an inverted index that comprises at least one record, wherein the at least one record comprises at least one field name and a corresponding at least one field value. The at least one field name and corresponding value are extracted from time-stamped searchable events that are stored in a field searchable datastore and comprise portions of raw data. The at least one record further comprises a posting value that identifies a location in the field searchable datastore where an event associated with the at least one record is stored. The method further comprises receiving an incoming search query that references a field name and evaluating the incoming search query. Furthermore, responsive to the evaluating, the method comprises determining results for the incoming search query using both of the field searchable datastore and the inverted index.
    Type: Grant
    Filed: May 28, 2019
    Date of Patent: May 4, 2021
    Assignee: Splunk, Inc.
    Inventors: David Ryan Marquardt, Mitchell Neuman Blank, Jr., Stephen Phillip Sorkin
  • Patent number: 10970337
    Abstract: A method for outputting a result of one or more operations using data sources of different types is provided. The method includes steps of: (a) when a user query is acquired, a device (i) acquiring data elements respectively from the data sources of different types by referring to the user query, and (ii) performing main joint operations on the data elements, to thereby generate data set; and (b) the device performing data processing operations and output operations on the data set, to thereby generate an answer for the user query. It has an effect of providing the method for outputting the result of the operations using the data sources of the different types by referring to each of languages corresponding to each of the data sources.
    Type: Grant
    Filed: September 11, 2020
    Date of Patent: April 6, 2021
    Assignee: Seculayer Co., LTD.
    Inventor: Jin Sang You
  • Patent number: 10922346
    Abstract: In some examples, a set of sentences is extracted from a digital document, and each sentence is scored using a respective informativeness measure and readability measure. Sentences in the set of sentences are selected based on the readability measures and informativeness measures. A low readability, high informativeness sentence is identified from the set of sentences. A concatenated sentence is generated by concatenating at least one contextual sentence with the low readability, high informativeness sentence, where the concatenated sentence has a higher readability than the low readability, high informativeness sentence.
    Type: Grant
    Filed: June 13, 2017
    Date of Patent: February 16, 2021
    Assignee: Micro Focus LLC
    Inventor: Vinay Deolalikar
  • Patent number: 10838994
    Abstract: Natural Language Processing (NLP) is performed on a corpus using a processor and a memory to extract a set of facets corresponding to a dimension in a set of dimensions. Using a score threshold, a subset of the set of facets is selected where each facet in the set of facets has a corresponding score relative to the corpus. A subsequent query is formed by increasing a complexity of a previous query using a facet in the subset of facets. The subsequent query is executed on at least a portion of the corpus. The documents in a new result set are ranked, the new result set being in response to executing the subsequent query. An output is produced from the new result set, which includes a ranking of that subset of documents whose ranks have changed by more than a threshold rank distance from the corresponding ranks in the corpus.
    Type: Grant
    Filed: August 31, 2017
    Date of Patent: November 17, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Takashi Fukuda, Hiroaki Kikuchi
  • Patent number: 10733211
    Abstract: In an approach to faceted classification, a computer receives a search query. The computer creates a first table of facet value ranges, based on the search query. The computer fetches a first search result corresponding to the search query. The computer retrieves a first facet value associated with the first search result. The computer maps the first facet value to a first facet value range. The computer determines whether the first facet value range is in the first table of facet value ranges. The computer inserts the first facet value range into the first table of facet value ranges. The computer determines whether a number of facet value ranges in the first table of facet value ranges is below a pre-defined threshold. The computer creates a second table of facet value ranges. The computer identifies a second facet value range that includes the first facet value range.
    Type: Grant
    Filed: December 19, 2017
    Date of Patent: August 4, 2020
    Assignee: International Business Machines Corporation
    Inventors: Marta Breno, Roberto Ragusa
  • Patent number: 10642831
    Abstract: Techniques are described herein to generate and to execute a query execution plan using static data buffering. After receiving a query with a clause that requires multiple iterations to execute, a database management system (DBMS) generates a plurality of plans that vary the order in which the database operations are executed. Within each plan, the DBMS identifies sets of rows within that plan that contain static data during execution of the query. Then, an additional step is added to each plan that includes loading the static set of rows in a database buffer cache. One or more database operations, from an iteration other than the first iteration, may be performed against the cached static set of rows. For each plan generated in this manner, a cost analysis model is applied, and the plan with the lowest estimated computational cost is selected for use as the query execution plan.
    Type: Grant
    Filed: September 16, 2016
    Date of Patent: May 5, 2020
    Assignee: ORACLE INTERNATIONAL CORPORATION
    Inventors: Mohamed Ziauddin, Yali Zhu
  • Patent number: 10621246
    Abstract: A method and apparatus of a device that indexes donatable content from a network site is described. In an exemplary embodiment, the device receives a requested document, where the requested document includes a plurality of tags. In addition, the device detects a donatable tag in the plurality tags that indicates the network site includes donatable content. In response to the detecting, the device sends a request for the donatable content to the network site. Furthermore, the device receives the donatable content from the network site. The device additionally indexes the donatable content into an on-device search index, where at least some of the index donatable content is further returned as a search result for an on-device search.
    Type: Grant
    Filed: September 29, 2017
    Date of Patent: April 14, 2020
    Assignee: Apple Inc.
    Inventors: Anubhav Malhotra, John M. Hörnkvist
  • Patent number: 10614031
    Abstract: The present disclosure relates to systems and methods for indexing and mapping data sets by feature matrices, comprising at least a processor and a non-transitory memory storing instructions that cause the processor to perform operations including receiving data sets of the same type, applying autoencoders to generate feature matrices, and generating a neural network model trained to generate synthetic data corresponding to the type of data files. Further, the processor performs operations to applying more autoencoders to part of the hidden layer of the neural network model to generate more corresponding feature matrices and indexing the data set using the feature matrices such that the data sets are searchable using an index wherein a search query is received and a third feature matrix is generated so that a data set can be retrieved and compared to the feature matrices using the index.
    Type: Grant
    Filed: July 8, 2019
    Date of Patent: April 7, 2020
    Assignee: Capital One Services, LLC
    Inventors: Austin Walters, Jeremy Goodsitt, Vincent Pham, Galen Rafferty, Anh Truong, Reza Farivar
  • Patent number: 10545936
    Abstract: Linear run length encoding is described. A system and method include storing a table of time series data in a database of a data platform, the table of time series data representing a set of time series blocks. Each time series block of the set of time series blocks has a time series of equally-incremented time intervals and a run length. Each time interval of the time series is associated with one or more values. The run length has a starting position with at least one starting value and an ending position with at least one ending value. The starting position and the at least one starting value is stored for each time series block in a column store of the database. Then, a compressed index is generated in the column store of the database for each time series block, the compressed index comprising the starting position and the at least one starting value.
    Type: Grant
    Filed: July 8, 2014
    Date of Patent: January 28, 2020
    Assignee: SAP SE
    Inventors: Gordon Gaumnitz, Robert Schulze, Lars Dannecker, Ivan Bowman, Dan Farrar
  • Patent number: 10540332
    Abstract: Technologies are described herein for denormalizing data instances. Schemas for data instances are embedded with annotations indicating how the denormalization is to be performed. Based on the annotations, one or more sub per object indexes (“sub POIs”) can be generated for each data instance and stored. The sub POIs can include a target sub POI containing data from the data instance, and at least one source sub POI containing data from another data instance, if the data instance depends on the other data instance. Data instance updates can be performed by identifying sub POIs that are related to the updated data instance in storage, and updating the related sub POIs according to the update to the data instance. The sub POIs can be sent to an indexing engine to generate an index for a search engine to facilitate searches on the data instances.
    Type: Grant
    Filed: August 3, 2016
    Date of Patent: January 21, 2020
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Christopher Clayton McConnell, Weipeng Liu, Shahin Shayandeh, Robert Lovejoy Goodwin
  • Patent number: 10521408
    Abstract: In general, embodiments of the technology relate to a method for servicing requests. The method includes receiving a search request from a client, determining a main path and a conditional subpath associated with the search request, determining a subpath index associated with the main path and the conditional subpath, obtaining, using at least a portion of the search request, a set of subpath index entries from the subpath index, wherein each of the subpath index entries specifies a facet subpath and content associated with the facet subpath, generating a final result using at least a portion of the contents in the set of subpath index entries, and providing the final result to the client.
    Type: Grant
    Filed: September 30, 2016
    Date of Patent: December 31, 2019
    Assignee: OPEN TEXT CORPORATION
    Inventors: Caroline Spruit, Petr Olegovich Pleshachkov
  • Patent number: 10445650
    Abstract: A processing unit can successively operate layers of a multilayer computational graph (MCG) according to a forward computational order to determine a topic value associated with a document based at least in part on content values associated with the document. The processing unit can successively determine, according to a reverse computational order, layer-specific deviation values associated with the layers based at least in part on the topic value, the content values, and a characteristic value associated with the document. The processing unit can determine a model adjustment value based at least in part on the layer-specific deviation values. The processing unit can modify at least one parameter associated with the MCG based at least in part on the model adjustment value. The MCG can be operated to provide a result characteristic value associated with test content values of a test document.
    Type: Grant
    Filed: November 23, 2015
    Date of Patent: October 15, 2019
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Jianfeng Gao, Li Deng, Xiaodong He, Lin Xiao, Xinying Song, Yelong Shen, Ji He, Jianshu Chen
  • Patent number: 10346511
    Abstract: The presently disclosed subject matter includes a computer-implemented system and method for receiving content from another computer device and dynamically adapting display of the received content within a container of a formatted document, the container defining a restricted area within the formatted document designated for displaying the content. Sub-elements within at least one content item are identified and tagged, the tagging enables to acquire display parameters of tagged sub-elements and calculate therefor a required adaptation of the content item such that it can be fitted within the respective container.
    Type: Grant
    Filed: June 1, 2017
    Date of Patent: July 9, 2019
    Assignee: TABOOLA.COM LTD.
    Inventor: Efraim Nadiv
  • Patent number: 10248681
    Abstract: A system and method for faster access for compressed time series data. A set of blocks are generated based on a table stored in a database of the data platform. The table stores data associated with multiple sources of data provided as consecutive values, each block containing index vectors having a range of the consecutive values. A block index is generated for each block having a field start vector representing a starting position of the block relative to the range of consecutive values, and a starting value vector representing a value of the block at the starting position. The field start vector of the block index is accessed to obtain the starting position of a field corresponding to a first block and to the range of the consecutive values of the first block. The starting value vector is then determined from the block index to determine an end and a length of the field of the first block.
    Type: Grant
    Filed: July 8, 2014
    Date of Patent: April 2, 2019
    Assignee: SAP SE
    Inventors: Gordon Gaumnitz, Robert Schulze, Lars Dannecker, Ivan Bowman, Dan Farrar
  • Patent number: 10235377
    Abstract: Innovations for adaptive compression and decompression for dictionaries of a column-store database can reduce the amount of memory used for columns of the database, allowing a system to keep column data in memory for more columns, while delays for access operations remain acceptable. For example, dictionary compression variants use different compression techniques and implementation options. Some dictionary compression variants provide more aggressive compression (reduced memory consumption) but result in slower run-time performance. Other dictionary compression variants provide less aggressive compression (higher memory consumption) but support faster run-time performance. As another example, a compression manager can automatically select a dictionary compression variant for a given column in a column-store database.
    Type: Grant
    Filed: December 23, 2013
    Date of Patent: March 19, 2019
    Assignee: SAP SE
    Inventors: Ingo Mueller, Cornelius Ratsch, Peter Sanders, Franz Faerber
  • Patent number: 10083089
    Abstract: A method to efficiently checkpoint and reconstruct an in-memory index associated with a log-structured object store includes enabling asynchronous write operations to occur to a log-structured object store. The log-structured object store utilizes an in-memory index to access objects therein. The method further enables checkpoint operations to occur to the log-structured object store without pausing the asynchronous write operations. When initiating checkpoint operations, the method establishes a “begin checkpoint” marker on the log-structured object store. This “begin checkpoint” marker is configured to point to an earliest address in the log-structured object store that is uncommitted to the in-memory index. In the event the in-memory index is lost, the method reconstructs the in-memory index by analyzing the log-structured object store starting from the earliest address uncommitted to the in-memory index. A corresponding system and computer program product are also disclosed.
    Type: Grant
    Filed: September 7, 2015
    Date of Patent: September 25, 2018
    Assignee: International Business Machines Corporation
    Inventors: Lawrence Y. Chiu, Paul H. Muench, Sangeetha Seshadri
  • Patent number: 10083082
    Abstract: A method to efficiently checkpoint and reconstruct an in-memory index associated with a log-structured object store includes enabling asynchronous write operations to occur to a log-structured object store. The log-structured object store utilizes an in-memory index to access objects therein. The method further enables checkpoint operations to occur to the log-structured object store without pausing the asynchronous write operations. When initiating checkpoint operations, the method establishes a “begin checkpoint” marker on the log-structured object store. This “begin checkpoint” marker is configured to point to an oldest known log location recorded in the in-memory index. In the event the in-memory index is lost, the method reconstructs the in-memory index by analyzing the log-structured object store starting from the oldest known log location. A corresponding system and computer program product are also disclosed and claimed herein.
    Type: Grant
    Filed: September 7, 2015
    Date of Patent: September 25, 2018
    Assignee: International Business Machines Corporation
    Inventors: Lawrence Y. Chiu, Paul H. Muench, Sangeetha Seshadri
  • Patent number: 10061808
    Abstract: Embodiments relate to view caching techniques that cache for a limited time, some of the (intermediate) results of a previous query execution, in order to avoid expensive re-computation of query results. Particular embodiments may utilize a cache manager to determine whether information relevant to a subsequent user request can be satisfied by an existing cache instance or view, or whether creation of an additional cache instance is appropriate. At design time, cache defining columns of a view are defined, with user input parameters automatically being cache defining. Cache instances are created for each tuple of literals for the cache defining columns, and for each explicit or implicit group by clause. Certain embodiments may feature enhanced reuse between cache instances, in order to limit memory footprint. Over time a cache instances may be evicted from memory based upon implementation of a policy such as a Least Recently Used (LRU) strategy.
    Type: Grant
    Filed: June 3, 2014
    Date of Patent: August 28, 2018
    Assignee: SAP SE
    Inventors: Ki Hong Kim, Norman May, Alexander Boehm, Sung Heun Wi, Jeong Ae Han, Sang Il Song, Yongsik Yoon
  • Patent number: 10019518
    Abstract: Methods and systems are disclosed that relate to ranking functions for multiple different domains. By way of example but not limitation, ranking functions for multiple different domains may be trained based on inter-domain loss, and such ranking functions may be used to rank search results from multiple different domains so that they may be blended without normalizing relevancy scores.
    Type: Grant
    Filed: October 9, 2009
    Date of Patent: July 10, 2018
    Assignee: Excalibur IP, LLC
    Inventors: Jiang Chen, Wei Chu, Zhenzhen Kou, Zhaohui Zheng
  • Patent number: 9990362
    Abstract: Profiling data includes processing an accessed collection of records, including: generating, for a first set of distinct values appearing in a first set of one or more fields, corresponding location information; generating, for the first set of fields, a corresponding list of entries identifying a distinct value from the first set of distinct values and the location information for the distinct value; generating, for a second set of one or more fields, a corresponding list of entries, with each entry identifying a distinct value from a second set of distinct values appearing in the second set of fields; and generating result information, based at least in part on: locating at least one record of the collection using the location information for at least one value appearing in the first set of fields, and determining at least one value appearing in the second set of fields of the located record.
    Type: Grant
    Filed: September 21, 2015
    Date of Patent: June 5, 2018
    Assignee: Ab Initio Technology LLC
    Inventor: Arlen Anderson
  • Patent number: 9952778
    Abstract: A data processing technology is provided, and is applied to a partition management device. The partition management device stores a partition view, the partition view records a correspondence between an ID of a current partition and an address of a storage disk, and a total quantity of current partitions may be less than a total quantity of final partitions. By using the technology, data forwarding may be performed on key-value data by using a current partition, thereby reducing complexity of a partition view.
    Type: Grant
    Filed: May 4, 2017
    Date of Patent: April 24, 2018
    Assignee: HUAWEI TECHNOLOGIES CO., LTD.
    Inventor: Xiong Luo
  • Patent number: 9921945
    Abstract: Aspects provide for automatic verification of JavaScript Object Notation (JSON) data by making a JSON call via an Extensible Markup Language (XML) Hypertext Transfer Protocol (HTTP) HTTP object against a data warehouse data item stored in a back end server. JSON response data returned from the back end server in response to the JSON call is converted into actual XML result data that includes a first plurality of XML statements. A Structured Query Language (SQL) query is executed against the data warehouse data item, and expected XML result data generated in response thereto that include a different (second) plurality of XML statements. The JSON response data returned from the back end server is thereby verified in response to matching the actual XML result data to the expected XML result data.
    Type: Grant
    Filed: April 6, 2015
    Date of Patent: March 20, 2018
    Assignee: ADP, LLC
    Inventors: Tista Das, Sachin V. Havaldar, Laiyuan Liu
  • Patent number: 9916314
    Abstract: An AND operation is performed for an integrated appearance map of a compression code of character data “”, an integrated appearance map of a compression code of character data “”, and an integrated deletion map for a segment. The AND result is “1100” and it is found that the character data “” and “” are likely to be present in the segments (sg1(1)) and (sg1(2)). Since the segments are specified from the AND result, the AND operations are performed. As a result, the segments are specified and the AND operations are performed. As a result, a file number 3 is specified from the segment (sg0(1)) and a file number 19 is specified from the segment (sg0(5)). Therefore, it is found that both of the character data “” and “” are present in compression files (f3) and (f19).
    Type: Grant
    Filed: March 10, 2014
    Date of Patent: March 13, 2018
    Assignee: FUJITSU LIMITED
    Inventors: Masahiro Kataoka, Ryo Matsumura
  • Patent number: 9846688
    Abstract: Techniques for use with electronic book readers include coordinating or translating position information between different versions of an electronic book. Positions within different versions can be translated for various purposes, such as transferring annotations between versions or synchronizing positions within different versions.
    Type: Grant
    Filed: December 28, 2010
    Date of Patent: December 19, 2017
    Assignee: Amazon Technologies, Inc.
    Inventors: Christopher F. Weight, Janna Hamaker, Tom Killalea, Bruno A. Posokhow, Daniel B. Rausch