Data Cleansing, Data Scrubbing, And Deleting Duplicates Patents (Class 707/692)
  • Patent number: 11055339
    Abstract: Displaying contact-related information is disclosed. An association between a contact address not specific to a source of contact-related information and an identity of an entity at the source of contact-related information may be determined. Information representing the association between the contact address and the identity of the entity at the source of contact-related information is stored. The information representing the association is stored at a node associated with a service configured to use the information representing the association to retrieve from the source of contact-related information a response data associated with the entity in response to an expression of interest in a contact with which the contact address is associated.
    Type: Grant
    Filed: February 5, 2019
    Date of Patent: July 6, 2021
    Assignee: SUGARCRM INC.
    Inventors: Somrat Niyogi, Jason McDowall, Pushkar Singh, Andreas Sandberg, Wiebke Poerschke
  • Patent number: 11055127
    Abstract: A method, computer program product, and a system where a processor(s), in a computing environment comprised of multiple containers comprising modules, includes a processor(s) parsing a module originating from a given container in the computing environment by copying various identifying aspects of a module file comprising the module and calculating, based on contents of the module file, a digest value as a unique identifier for the module file. The processor(s) stores the various identifying aspects of the module file and the digest value in one or more memory objects, wherein the one or more memory objects comprise a module content map to correlate the unique identifier for the module file with the contents of the module, images in the module file with the unique identifier for the module file, and layers with the unique identifier for the module file.
    Type: Grant
    Filed: July 25, 2018
    Date of Patent: July 6, 2021
    Assignee: International Business Machines Corporation
    Inventors: Qin Yue Chen, Shu Han Weng, Yong Xin Qi, Zhi Hong Li, Xi Xue Jia
  • Patent number: 11048426
    Abstract: A technique for performing deduplication identifies representative sub-blocks within candidate blocks and performs sub-block matching to entries in a digest database. When a representative sub-block is matched to a differently-aligned target sub-block that belongs to a target block, the technique effectuates storage of the candidate block using the target block and a block adjacent to the target block.
    Type: Grant
    Filed: October 30, 2019
    Date of Patent: June 29, 2021
    Assignee: EMC IP Holding Company LLC
    Inventors: Uri Shabi, Ronen Gazit
  • Patent number: 11042447
    Abstract: One or more processors scan to identify component resources of a record retention system and determine relationships among the component resources and data stored on the component resources. Rules corresponding to retention of record data stored on the component resources are received, and a deletion action is determined in response to receiving a request by a user for deletion of record data from the record retention system and the rules corresponding to the retention of data. The one or more processors perform the deletion action on the user's record data associated with the request and compliant with the rules corresponding to the retention of the data among the component resources of the record retention system, and the one or more processors record the deletion action and information associated with the deletion action in a deletion log of the record retention system.
    Type: Grant
    Filed: September 27, 2019
    Date of Patent: June 22, 2021
    Assignee: International Business Machines Corporation
    Inventors: Sharif Tarequr Rahman, Long Wang, Anca Sailer
  • Patent number: 11038960
    Abstract: A client host may be used to provide access to a shared storage. The client host may receive a read request from a local client for particular data of the shared storage. In response to the read request, the client host may obtain the particular data from a local storage device. The client host may receive a write request from the local client for the shared storage. In response to the write request, the client host may send data to a network-based stream service as one or more stream events for the shared storage. After sending the one or more stream events to the network-based stream service, the client host may receive, from the network-based stream service, an ordered stream event for the shared storage. Based at least in part on the ordered stream event, the client host may update the data stored at the local storage device.
    Type: Grant
    Filed: October 20, 2015
    Date of Patent: June 15, 2021
    Assignee: Amazon Technologies, Inc.
    Inventors: Michael Joseph Ruiz, David Ricardo Rocamora
  • Patent number: 11029871
    Abstract: Disclosed are techniques for data deduplication, which include methods, systems, or computer products for reducing data redundancy in a data storage system comprising searching a cluster of nearest neighbors, wherein the cluster has been created using a locality sensitive hashing algorithm, to determine if a data block has been stored in the data storage system prior to writing the data block. In alternate embodiments, the nearest neighbor clusters could be created using one or more of the following algorithms: k-means clustering algorithm, a k-medoids clustering algorithm, a mean shift algorithm, a generalized method of moment (GMM) algorithm, or a density based spatial clustering of applications with noise (DBSCAN) algorithm.
    Type: Grant
    Filed: May 15, 2019
    Date of Patent: June 8, 2021
    Assignee: EMC IP Holding Company LLC
    Inventors: Jonathan Krasner, Sweetesh Singh, Steven Chalmer
  • Patent number: 11010258
    Abstract: Illustrative storage manager and media agent are enhanced to interoperate with deduplication appliances. Advantages are realized when making secondary and tertiary copies and also when restoring from a deduplication appliance. Tiered indexing minimizes how much data is retained and stored at media agents. Tiered indexing enables media agents to efficiently extract needed information from deduplication appliances to make tertiary copies and to restore backed up copies. Interoperability techniques include media agents generating separate data streams to the deduplication appliance. Each data stream carries a different kind of data, e.g., payload data, metadata content, or high-level index information. On initial backup, the media agent instructs the deduplication appliance to deduplicate the payload data stream but not the other data streams, thus intelligently applying resources to data most likely to benefit from deduplication.
    Type: Grant
    Filed: November 27, 2018
    Date of Patent: May 18, 2021
    Assignee: Commvault Systems, Inc.
    Inventors: Ganesh Haridas, Manoj Kumar Vijayan
  • Patent number: 11010077
    Abstract: A machine and method of reducing duplicate transmission data employs one more more digests to track field/value pairs that have previously been distributed. Each digest contains a record table and a segment table. The record table includes anonymous identifier records, each of which contain an anonymous identifier and one or more indexes into the segment table. The segment table comprises an array of every existing data field/value pair. Before distribution of update data, each record is matched to an anonymous identifier record in the record table. The segment values in the prospective distribution record are compared to the digest's anonymous identifier record, used to determine which data has already been distributed, and thus will be suppressed in the distribution.
    Type: Grant
    Filed: February 25, 2019
    Date of Patent: May 18, 2021
    Assignee: LiveRamp, Inc.
    Inventors: James Arnold, Joshua Lang
  • Patent number: 11003629
    Abstract: An apparatus in one embodiment comprises at least one processing device comprising a processor coupled to a memory. The at least one processing device is configured to obtain a file and to determine a type of the file. A structure of the file may be determined based at least in part on the determined type of the file and at least one location in the file may be identified based at least in part on the determined structure. The file may be divided at the identified at least one location into a plurality of chunks and the plurality of chunks may be provided to a block deduplication module of a storage system where the block deduplication module is configured to perform a deduplication process based at least in part on the plurality of chunks.
    Type: Grant
    Filed: October 31, 2018
    Date of Patent: May 11, 2021
    Assignee: EMC IP Holding Company LLC
    Inventors: Amitai Alkalay, Zvi Schneider, Assaf Natanzon
  • Patent number: 10990310
    Abstract: Techniques for data processing may include: determining one or more sub-blocks of a target block that match one or more sub-blocks of a candidate block; creating a shared sub-block mapping (SSM) structure having a plurality of entries, wherein each of the plurality of entries corresponds to a different one of the sub-blocks in the candidate block and wherein a value stored in said each entry, corresponding to one of the sub-blocks of the candidate block, identifies a sub-block of the target block matching said one sub-block of the candidate block; and storing the candidate block as a deduplicated block sharing at least one sub-block with the target block. The SSM structure may be stored as a metadata structure of the candidate block to identify deduplicated sub-blocks of the candidate block and to identify sub-blocks of the target block providing content for the deduplicated sub-blocks of the candidate block.
    Type: Grant
    Filed: April 24, 2019
    Date of Patent: April 27, 2021
    Assignee: EMC IP Holding Company LLC
    Inventors: Istvan Gonczi, Ivan Bassov, Sorin Faibish, Philippe Armangau
  • Patent number: 10990518
    Abstract: Embodiments relating to garbage collection for a deduplicated and compressed storage device are described. One embodiment provides for a method comprising creating a first set of temporary files associated with a range of fingerprints for data within data files associated with a directory tree structure; creating a second set of temporary files associated with a range of fingerprints of storage segments stored on one or more deduplicated storage containers; sorting the fingerprints in each temporary file using distributed out of core sorting across each node in the set of multiple computing device nodes to generate a first set of sorted files and a second set of sorted files; determining an intersection of the fingerprints in the first set of sorted files and the second set of sorted files; and generating a garbage collection recipe for each of the one or more deduplicated storage containers.
    Type: Grant
    Filed: March 31, 2016
    Date of Patent: April 27, 2021
    Assignee: EMC IP HOLDING COMPANY LLC
    Inventor: Grant Wallace
  • Patent number: 10983958
    Abstract: Apparatus and associated methods relate to generating energy blocks on a blockchain corresponding to generation, transmission, and consumption of predetermined quanta of energy represented by corresponding records in an associated Merkle trie. In an illustrative example, individual energy data records may be hashed. Each hash may be stored in a leaf node of a Merkle trie. The individual energy data records may be aggregated to correspond to represent a predetermined quantum of energy. The individual energy data records may include energy generation records. The energy blocks may be associated with scheduling, delivery, and consumption data for the energy quantum. Various embodiments may advantageously provide secure, verifiable, and immutable tracking and processing of energy generation, transmission, and consumption of physical energy quanta across one or more distributed energy networks.
    Type: Grant
    Filed: November 25, 2020
    Date of Patent: April 20, 2021
    Assignee: ClearTrace Technologies, Inc.
    Inventors: Eric Miller, Evan Caron, Troy Martin
  • Patent number: 10983962
    Abstract: An apparatus in one embodiment comprises at least one processing device comprising a processor coupled to a memory. The processing device is configured to identify a dataset to be scanned to generate a deduplication estimate for that dataset, to designate a subset inclusion characteristic to be utilized in the scan, and for each of a plurality of pages of the dataset, to scan the page, where scanning the page includes computing a polynomial-based signature for the page, determining whether or not the polynomial-based signature satisfies the designated subset inclusion characteristic, and responsive to the polynomial-based signature satisfying the designated subset inclusion characteristic, computing a content-based signature for the page and updating a corresponding entry of a deduplication estimate table for the dataset based at least in part on the content-based signature.
    Type: Grant
    Filed: May 29, 2018
    Date of Patent: April 20, 2021
    Assignee: EMC IP Holding Company LLC
    Inventors: Anton Kucherov, David Meiri
  • Patent number: 10983732
    Abstract: A method for accessing a file in a storage system is provided. The method includes determining, for each file chunk of the file, an authority among differing storage nodes of the storage system and receiving from the authority having ownership of the file chunk, location information for the file chunk. The method includes accessing file chunks of the file as directed by each of the determined authorities.
    Type: Grant
    Filed: July 13, 2015
    Date of Patent: April 20, 2021
    Assignee: Pure Storage, Inc.
    Inventors: John Hayes, Robert Lee, Igor Ostrovsky, Peter Vajgel
  • Patent number: 10970328
    Abstract: Techniques are described that exclude use of “stop-fingerprints” from media database formation and search query to an automatic content recognition (ACR) systems based on media content fingerprints updated by stop-fingerprint analysis. A classification process is presented which takes in fingerprints from reference media files as an input and produces a modified set of fingerprints as an output by applying a novel stop-fingerprint classification algorithm. Architecture for the distributed stop-fingerprint generation is presented. Various cases, as stop-fingerprints generation for the entire reference database, stop-fingerprints generation for the individual reference fingerprint files, and temporal fingerprint classification obtained through intermediate steps of the temporal fingerprint classification algorithm are presented. A hash-based signature classification algorithm is also described.
    Type: Grant
    Filed: September 24, 2018
    Date of Patent: April 6, 2021
    Assignee: Gracenote, Inc.
    Inventors: Sunil Suresh Kulkarni, Pradipkumar Dineshbhai Gajjar, Jose Pio Pereira, Prashant Ramanathan, Mihailo M. Stojancic, Shashank Merchant
  • Patent number: 10969982
    Abstract: A data deduplication process for storage based on collision resistant hash digests is disclosed. The process accesses a first data message from a data storage appliance and accesses a second data message from the data storage appliance. The process then compares the hash digests of the first and second data messages. If the hash digests match, the process determines if the first and second data messages are the same message or if there is a collision between the compared hash digests by forming additional hash digests based on the first and second data messages by hashing the first and second data messages differently. If this new set of hash digests do not result in a collision, then the first and second data messages are different. If this new set of hash digests result in a collision, the first and second data messages are the same message.
    Type: Grant
    Filed: May 25, 2020
    Date of Patent: April 6, 2021
    Inventor: Tyson York Winarski
  • Patent number: 10970259
    Abstract: Improved techniques of managing a data storage system involve selectively inserting block virtualization structures (BVS) in access paths between data blocks of a file system and block pointers pointing to the data blocks. A BVS provides metadata for supporting deduplication of data in that data block. In some arrangements, a file system may support selective insertion of such a BVS between a block pointer and data block according to a specified criterion. For example, such a file system might support insertion of BVS's between block pointers and those data blocks storing cold data for which access latency is not important to overall performance of the data storage system.
    Type: Grant
    Filed: December 19, 2014
    Date of Patent: April 6, 2021
    Assignee: EMC IP Holding Company LLC
    Inventors: Jean-Pierre Bono, Philippe Armangau
  • Patent number: 10972581
    Abstract: A method includes acquiring a media content directory on at least one media server, and identifying media description information in at least two media resource objects in the media content directory and integrating the at least two media resource objects when media resources respectively corresponding to the at least two media resource objects have same media content, so the integrated at least two media resource objects are represented by one media identifier. The method also includes sending, according to capability information of a media playback device selected by a user, a media resource address corresponding to a first media resource object to the media playback device, so the media playback device acquires and plays a media resource corresponding to the media resource address, where the first media resource object is one of the at least two media resource objects that are suitable to be played on the media playback device.
    Type: Grant
    Filed: May 29, 2015
    Date of Patent: April 6, 2021
    Assignee: HUAWEI TECHNOLOGIES CO., LTD.
    Inventors: Yunsheng Kuang, Yu Zhu
  • Patent number: 10956484
    Abstract: Techniques are described that exclude use of “stop-fingerprints” from media database formation and search query to an automatic content recognition (ACR) systems based on media content fingerprints updated by stop-fingerprint analysis. A classification process is presented which takes in fingerprints from reference media files as an input and produces a modified set of fingerprints as an output by applying a novel stop-fingerprint classification algorithm. Architecture for the distributed stop-fingerprint generation is presented. Various cases, as stop-fingerprints generation for the entire reference database, stop-fingerprints generation for the individual reference fingerprint files, and temporal fingerprint classification obtained through intermediate steps of the temporal fingerprint classification algorithm are presented. A hash-based signature classification algorithm is also described.
    Type: Grant
    Filed: March 13, 2017
    Date of Patent: March 23, 2021
    Assignee: Gracenote, Inc.
    Inventors: Sunil Suresh Kulkarni, Pradipkumar Dineshbhai Gajjar, Jose Pio Pereira, Prashant Ramanathan, Mihailo M. Stojancic, Shashank Merchant
  • Patent number: 10949405
    Abstract: A data deduplication device reduces a processing load in deduplication. Storage target data includes a content including a plurality of blocks having a structure in which transaction data and a hash value of a preceding block are associated with each other. A storage includes a storage device and a processor, which (1) acquires a hash value associated with one or more blocks of a chunk including the block in the content, and specifies a fingerprint corresponding to the chunk based on the acquired one or more hash values of the block, (2) determines whether the fingerprint corresponding to the chunk is the same as a fingerprint of a chunk stored in the storage device, and (3) does not store the chunk in the storage device when it is determined to be the same, and stores the chunk in the storage device when it is determined to not be the same.
    Type: Grant
    Filed: February 27, 2019
    Date of Patent: March 16, 2021
    Assignee: HITACHI, LTD.
    Inventors: Shimpei Nomura, Mitsuo Hayasaka, Jun Nemoto
  • Patent number: 10936233
    Abstract: A method, computer program product, and computer system for preparing, by a computing device, for migration of data from a source to a target. Hash values of the data may be sorted at the source. The data may be migrated from the source to the target according to how the data was sorted at the source.
    Type: Grant
    Filed: January 31, 2019
    Date of Patent: March 2, 2021
    Assignee: EMC IP Holding Company, LLC
    Inventors: Anton Kucherov, David Meiri
  • Patent number: 10938961
    Abstract: A method for data reduction may comprise computing (i) a first sketch of a first segment and (ii) a second sketch of a second segment. The first sketch and the second sketch may each comprise a set of features that are representative of or unique to the corresponding first and second segments. The method also comprise processing the first sketch and the second sketch to generate a similarity metric indicative of whether the second segment is similar to the first segment. The method may further comprise (1) performing a differencing operation on the second segment relative to the first segment when the similarity metric is greater than or equal to a similarity threshold, or (2) storing the first segment and the second segment in a database without performing the differencing operation when the similarity metric is less than the similarity threshold.
    Type: Grant
    Filed: December 18, 2019
    Date of Patent: March 2, 2021
    Inventors: Santhosh Rahul Ponnala, Tarang Vaish
  • Patent number: 10936665
    Abstract: Disclosed herein are system, method, and computer program product embodiments for providing a graphical match policy for identifying duplicative data. An embodiment operates by receiving a selection of a match rule for identifying duplicate records within a database, the match rule comprising a candidate filter and a comparison filter. One or more candidate attributes of the candidate filter and one or more comparison attributes of the comparison filter are determined. A first subset of the records within the database that satisfy the candidate filter are identified. A second subset of the records from the first subset of records that satisfy the comparison filter are identified. The second subset of records that satisfy both the candidate filter and the comparison filter are returned.
    Type: Grant
    Filed: August 9, 2018
    Date of Patent: March 2, 2021
    Assignee: SAP SE
    Inventor: Ronald Dupey
  • Patent number: 10936560
    Abstract: Embodiments of the present disclosure disclose methods and devices of data de-duplication. The method of data de-duplication performed at a client comprises: in response to receiving data to be backed up at a client, sampling the data to be backed up to obtain the sampled data; generating a signature for the sampled data; transmitting the signature to a master storage node in a storage cluster including a plurality of storage nodes, to allow the master storage node to select one storage node from the plurality of storage nodes; receiving an indication of the selected storage node from the master storage node; and transmitting, based on the indication, the data to be backed up to the selected storage node. Embodiments of the present disclosure also provide methods of data de-duplication performed at the master storage node and the slave storage node, and corresponding devices.
    Type: Grant
    Filed: December 19, 2017
    Date of Patent: March 2, 2021
    Assignee: EMC IP Holding Company LLC
    Inventors: James Fei Wu, Colin Zou, Lin Xiao, Sean Cheng Ye, Peng Wu
  • Patent number: 10929246
    Abstract: A method computer program product and computer system for generating a backup of a primary object in an object store system. The object store system comprising: a proxy layer comprising: a plurality of proxy nodes; a backup module; and a ring; a storage layer in communication with the plurality of proxy nodes and the backup module through the ring, the storage layer comprising a plurality of storage nodes, with each storage node having a plurality of servers for managing accounts, a plurality of containers, at least one backup container, and objects stored within the containers and the at least one backup container; and a backup database in communication with the backup module for storing associations between versions of backup copies of the primary objects, the primary copies of objects, the containers, and the at least one backup container.
    Type: Grant
    Filed: October 7, 2015
    Date of Patent: February 23, 2021
    Assignee: International Business Machines Corporation
    Inventors: Ranganath Gorur Krishna Iyengar, Madhusudan K. Satyanarayana
  • Patent number: 10929031
    Abstract: A method of data reduction in a partially encrypted volume includes receiving data to be stored on a storage array, decrypting the data using a first encryption key to generate first decrypted data, and decrypting the data using a second encryption key to generate second decrypted data. The method further includes comparing, by a storage array controller, a first compressibility value of the first decrypted data to a second compressibility value of the second decrypted data. The method further includes storing the first decrypted data if the first compressibility value is greater than or equal to the second compressibility value. The method further includes storing the second decrypted data if the second compressibility value is greater than the first compressibility value.
    Type: Grant
    Filed: October 4, 2018
    Date of Patent: February 23, 2021
    Assignee: Pure Storage, Inc.
    Inventors: Constantine P. Sapuntzakis, Timothy W. Brennan, Yuval Frandzel
  • Patent number: 10922280
    Abstract: A data storage site receives data from different data producer sites. Each of the data producer sites has a particular relationship to the data storage site, and each particular relationship carries corresponding data storage policies, constraints and commitments. When a data storage site receives a data storage request from a data producer, and that particular data is already present from a prior storage operation at the data storage site, the characteristics of the policies, constraints and commitments that were applied when that data was saved by the prior storage operation are reconciled with the policies, constraints and commitments of the requesting data producer. Deduplication logic reconciles different sets of policies, constraints and commitments such that the data can be effectively deduplicated by saving data-producer-specific metadata. Alternatively, the data can be effectively deduplicated by promoting the storage of the data so it covers a broader set of policies, constraints and commitments.
    Type: Grant
    Filed: April 10, 2018
    Date of Patent: February 16, 2021
    Assignee: Nutanix, Inc.
    Inventors: Amit Jain, Hinal Gala, Karan Gupta, Kilol Surjan, Parthasarathy Ramachandran, Timothy Sujay Isaacs
  • Patent number: 10917766
    Abstract: A method includes communicating, to a first mobile user equipment device, a first user application configured to at least cause the first mobile user equipment device to generate configuration data representing a configuration of the first mobile user equipment device. The method also includes making an assessment, based at least in part on the configuration data, of at least: (1) a compatibility between a communication service supported by a mobile network operator and an operational characteristic of the first mobile user equipment device, and (2) a compatibility between a second mobile user equipment device and a content characteristic of the first mobile user equipment device. The method also includes getting a plan (that is based at least in part on the assessment) for associating the first mobile user equipment device or the second mobile user equipment device with a service subscription. The method also includes communicating the plan to the first mobile user equipment device.
    Type: Grant
    Filed: March 10, 2020
    Date of Patent: February 9, 2021
    Assignee: Sprint Communications Company L.P.
    Inventors: Michael A. Gailloux, Kenneth W. Samson
  • Patent number: 10915260
    Abstract: Disclosed herein are methods, systems, and processes to perform dual-mode deduplication based on backup history. A fingerprint of a data segment of a data stream is calculated and a determination is made as to whether the fingerprint of the data segment matches a corresponding fingerprint in a cache. If the fingerprint matches the corresponding fingerprint, another fingerprint of a subsequent data segment of the data stream is calculated. If the fingerprint does not match the corresponding fingerprint, a segment boundary of the data stream is calculated based on a hash value, a determination is made that a new fingerprint calculated based on the segment boundary does not match the corresponding fingerprint, segment boundaries and new fingerprints are calculated, and a determination is made that a first fingerprint matches another corresponding fingerprint in the cache.
    Type: Grant
    Filed: April 27, 2018
    Date of Patent: February 9, 2021
    Assignee: Veritas Technologies LLC
    Inventors: Chao Lei, Hui Yuan, Qing Fu Dong
  • Patent number: 10909079
    Abstract: Techniques are provided for data-driven reduction of log message data. An exemplary method comprises: obtaining log files and user-specified configuration parameters, wherein the log files each comprise one or more log messages; generating an event count matrix indicating a number of times each of a plurality of unique messages appeared in a given log file of the log files; generating a correlation graph by inserting similar messages with a mutual undirected edge, wherein similar messages are identified based on a predefined similarity measure; extracting redundant messages from the correlation graph by selecting log messages for inclusion in an uninformative log message filter from sub-graphs of the correlation graph in which any two nodes are connected together, except those log messages satisfying a predefined message frequency criteria; and identifying one or more redundant messages using the uninformative log message filter.
    Type: Grant
    Filed: March 29, 2018
    Date of Patent: February 2, 2021
    Assignee: EMC IP Holding Company LLC
    Inventors: Omer Sagi, Maor Sade, Avitan Gefen, Alon Shitrit
  • Patent number: 10901996
    Abstract: Some embodiments of the present invention include a method for identifying duplicate records from a group of records in a database system.
    Type: Grant
    Filed: February 24, 2016
    Date of Patent: January 26, 2021
    Assignee: salesforce.com, inc.
    Inventors: Dai Duong Doan, Arun Kumar Jagota, Chenghung Ker, Parth Vaishnav, Danil Dvinov, Dmytro Kudriavtsev
  • Patent number: 10901641
    Abstract: A method for storing data includes receiving, by a data cluster, a request to store data from a host, deduplicating, by the data cluster, the data to obtain deduplicated data on a first data node, replicating the deduplicated data to generate a plurality of replicas, and storing a first replica of the plurality of replicas on a second data node and a second replica of the plurality of replicas on a third data node, wherein the first data node, the second data node and the third data node are in the data cluster.
    Type: Grant
    Filed: January 29, 2019
    Date of Patent: January 26, 2021
    Assignee: Dell Products L.P.
    Inventors: Dharmesh M. Patel, Rizwan Ali, Ravikanth Chaganti
  • Patent number: 10891342
    Abstract: Implementations relate to content data determination, transmission, and storage for local devices. In some implementations, a computer-implemented method includes determining an event for a user based on user data associated with the user, and includes programmatically analyzing the user data having one or more references to at least one of persons, locations, and scheduled activities. The method determines a set of content items to be accessed at the event, where one or more content items of the set are determined based on the user data, and the set of content items includes content data related to the event. Prior to a time of the event, the set of content items are transmitted over a communication network from network storage to local device(s) associated with the user, where the content items are stored in local storage of the local device(s).
    Type: Grant
    Filed: January 12, 2017
    Date of Patent: January 12, 2021
    Assignee: Google LLC
    Inventor: Bernadette Alexia Carter
  • Patent number: 10891309
    Abstract: Embodiments of the invention provide a method, system and computer program product for data duplication detection in an in memory data grid (IMDG). A method for data duplication detection in an IMDG includes computing a hash value for each binary data value in a key value pair of a partition in an IMDG. The method also includes generating a map including an entry for each unique computed hash value and one or more keys corresponding to binary data values of respective key value pairs from which the hash value had been uniquely computed. Thereafter, only those hash values in the map with multiple keys associated therewith are identified and binary data corresponding to the multiple keys of the identified hash values are reported as potential duplicate data in the IMDG.
    Type: Grant
    Filed: March 15, 2015
    Date of Patent: January 12, 2021
    Assignee: International Business Machines Corporation
    Inventors: Douglas Berg, Nitin Gaur, Christopher D. Johnson, Brian K. Martin
  • Patent number: 10884643
    Abstract: A computer-implemented method for providing tenant aware, variable length, deduplication of data stored on a non-transitory computer readable storage medium. The method is performed at least in part by circuitry and the data comprises a plurality of data items. Each of the plurality of data items is associated with a particular tenant of a group of tenants that store data on the storage medium.
    Type: Grant
    Filed: April 19, 2019
    Date of Patent: January 5, 2021
    Assignee: Bottomline Technologies Limited
    Inventors: Zenon Buratta, Andy Dobbels
  • Patent number: 10884994
    Abstract: Various techniques are disclosed herein for storing and managing master data in hierarchical data systems. Several related concepts, embodiments, and examples are disclosed, including techniques for incremental rationalization in a hierarchical data model, techniques for implementing governance pools in a hierarchical data model, techniques for application materialization in a hierarchical data model, techniques for data intersection mastering in a hierarchical data model, techniques for change request visualization in a hierarchical data model, and techniques for hierarchy preparation in a hierarchical data model.
    Type: Grant
    Filed: July 12, 2017
    Date of Patent: January 5, 2021
    Assignee: Oracle International Corporation
    Inventors: Rahul R. Kamath, Anurag Garg, Mark Allen Brieden
  • Patent number: 10877936
    Abstract: The system, devices, and methods disclosed herein relate to data ratio reduction technology adapted to reduce storage costs by weeding out duplicative data write operations. The techniques and systems disclosed achieve deduplication benefits by reducing the size of hash values stored hash tables used to compare unwritten data blocks to data that has already been written and stored somewhere in physical storage. The data deduplication systems, methods, and products facilitate deduplication at the block level as well as for misaligned data chunks within data blocks, that is an unwritten data block that has been stored sequentially in two different physical locations. The deduplication teachings herein are amenable to varying data block sizes as well as data chunk sizes within blocks. Our embodiments enhance computer performance by substantially reducing computational speeds and storage requirements attendant to deduplication systems using larger hash table data sizes.
    Type: Grant
    Filed: May 2, 2018
    Date of Patent: December 29, 2020
    Assignee: EMC IP Holding Company LLC
    Inventors: Jeremy J. O'Hare, Rong Yu, Peng Wu, Michael J. Scharland
  • Patent number: 10871478
    Abstract: The present application relates to a method for tracking a user's exposure to air pollutants, comprising receiving pollutant information from a plurality of air quality data sources at one or more user locations, determining a weighting for at least one of the plurality of data sources, the weighting representing quality of the pollutant information from the respective data source, selecting data sources from the plurality of data sources based on the weighting and aggregating pollutant information from the selected data sources to determine the user's exposure over a predetermined period of time.
    Type: Grant
    Filed: December 21, 2016
    Date of Patent: December 22, 2020
    Assignee: KONINKLIJKE PHILIPS N.V.
    Inventors: Declan Patrick Kelly, Michael Martin Scheja, Wei Chen, Rim Helaoui
  • Patent number: 10860527
    Abstract: A method, computer program product, and computing system for storing a plurality of identifiers on a local data storage system. The plurality of identifiers locate a plurality of archived files at a plurality of defined remote addresses on a remote data storage system. The deletion of at least one of the plurality of identifiers is sensed, thus defining at least one deleted identifier. Temporal information of the at least one deleted identifier is compared to temporal information for a data protection operation performed on at least a portion of the local data storage system.
    Type: Grant
    Filed: May 4, 2018
    Date of Patent: December 8, 2020
    Assignee: EMC IP Holding Company, LLC
    Inventors: Jean-Pierre Bono, Sudhir Srinivasan, Marc A. De Souter
  • Patent number: 10853130
    Abstract: Embodiments in the disclosure are directed to the use of distributed computing to align reads against multiple portions of a reference dataset. Aligned portions of the reference dataset that correspond with an above-threshold alignment score can be assessed for the presence of sparse indicators that can be categorized and used to influence a determination of a state transition likelihood. Various tasks associated with the processing of reads (e.g., alignment, sparse indicator detection, and/or determination of a state transition likelihood) may be able to take advantage of parallel processing and can be distributed among the machines while considering the resource utilization of those machines. Different load-balancing mechanisms can be employed in order to achieve even resource utilization across the machines, and in some cases may involve assessing various processing characteristics that reflect a predicted resource expenditure and/or time profile for each task to be processed by a machine.
    Type: Grant
    Filed: September 19, 2017
    Date of Patent: December 1, 2020
    Assignee: Color Genomics, Inc.
    Inventors: Ryan Barrett, Taylor Sittler, Krishna Pant, Zhenghua Li, Katsuya Noguchi, Nishant Bhat, Othman Laraki, Jeroen Van den Akker, Kurt Smith
  • Patent number: 10853033
    Abstract: The present disclosure relates to fuse multiple database tables together. The fields of the database tables may be normalized using semantic fields. Under a first approach, database tables are deduplicated by consolidating redundant records. This may be done by performing pairwise comparisons to identify related pairs of records and then clustering the related pairs of records. Then, the deduplicated database tables are merged by performing another pairwise comparison. Under a second approach, the database tables may be concatenated. Thereafter, records are subject to pairwise comparisons and then clustered to create a merged database table.
    Type: Grant
    Filed: October 11, 2017
    Date of Patent: December 1, 2020
    Assignee: AMPERITY, INC.
    Inventors: Stephen Meyles, Yan Yan, Carlos Sakoda, Ian Wesley-Smith, Dan Suciu
  • Patent number: 10853058
    Abstract: A method of determining whether a program corresponds to a new version of an application is disclosed. A key value corresponding to the program is determined. A program descriptor corresponding to the program is determined. The program descriptor comprises fields extracted from a program file associated with the program. One or more versions of an application having the same key value are identified. A program descriptor corresponding to each of the one or more versions of the application is identified. The program descriptor corresponding to each of the one or more versions of the application comprises fields extracted from a program file associated with the version of the application. The determination of whether the program corresponds to a new version of the identified application is based on comparing the program descriptor corresponding to the program against the program descriptors corresponding to the one or more versions of the identified application.
    Type: Grant
    Filed: March 29, 2019
    Date of Patent: December 1, 2020
    Assignee: Nyotron (USA), Inc.
    Inventors: Freddy Ouzan, Tom Gonda, Eran Bida, Lior Moalem, Shachar Schidorsky, Rene Kolga
  • Patent number: 10846275
    Abstract: A method for deleting a set of keys from a storage server is provided. The method includes generating a probabilistic data structure for a first set of keys and for each key in a second set of keys, determining whether a key of the second set of keys is found in the probabilistic data structure. The method includes identifying the key as a candidate for deletion if the key is not found in the probabilistic data structure. A system is also provided.
    Type: Grant
    Filed: June 26, 2015
    Date of Patent: November 24, 2020
    Assignee: Pure Storage, Inc.
    Inventors: John Hayes, Ethan Miller, John Colgrove
  • Patent number: 10846611
    Abstract: A data processing system is disclosed for machine learning. The system comprises a sampling module (13) and a computational module (15) interconnected by a data communications link (17). The computational module is configured to store a parameter vector representing an energy function of a network having a plurality of visible units connected using links to a plurality of hidden units, each link being a relationship between two units. The sampling module is configured to receive the parameter vector from the first processing module and to sample from the probability distribution defined by the parameter vector to produce state vectors for the network. The computational module is further configured to receive the state vectors from the second processing module and to apply an algorithm to produce new data. The sampling and computational modules are configured to operate independently from one another.
    Type: Grant
    Filed: June 16, 2014
    Date of Patent: November 24, 2020
    Assignee: Nokia Technologies Oy
    Inventors: Joachim Wabnig, Antti Niskanen
  • Patent number: 10848585
    Abstract: Systems and methods of operating a distributed cache in a fast producer, slow consumer environment are disclosed. A system implements a distributed cache including a plurality of shards. Each shard includes a set of item containers selected from a plurality of containers. A first event related to a first item container in the set of item containers is received and the first item container is updated to include the first event. The first item container is positioned in at least one consumption queue. A second event related to the first item container in the set of item containers is received and the first item container is updated without changing the position of the first item container in the at least one consumption queue.
    Type: Grant
    Filed: December 3, 2018
    Date of Patent: November 24, 2020
    Assignee: Walmart Apollo, LLC
    Inventor: Andrew Torson
  • Patent number: 10845994
    Abstract: A technique accesses a non-resident segment and a resident segment of a segmented de-duplication index, the resident segment being currently loaded into primary memory from secondary storage for data block de-duplication, and the non-resident segment not being currently loaded into the primary memory from the secondary storage for de-duplication. The technique further discovers that a digest of a non-resident digest entry of the non-resident segment and a digest of a resident digest entry of the resident segment are duplicates. The non-resident digest entry includes a first reference to a first location of the secondary storage that holds a first data block copy, and the resident digest entry includes a second reference to a second location of the secondary storage that holds a second data block copy. The technique further performs reconciliation that conforms the non-resident segment and the resident segment of the index to reference only data block copy.
    Type: Grant
    Filed: July 31, 2017
    Date of Patent: November 24, 2020
    Assignee: EMC IP Holding Company LLC
    Inventors: Ilya Usvyatsky, Nickolay Alexandrovich Dalmatov
  • Patent number: 10846301
    Abstract: Disclosed herein are methods, systems, and processes to perform container reclamation using probabilistic data structures. A hash value associated with a data segment and stored in a data container is received. Elements in a probabilistic data structure are identified using one or more portions of the hash value and element values are determined for each element. In response to a determination that the element values indicate that the segment object should be maintained, the segment object is maintained during compaction of the data container.
    Type: Grant
    Filed: February 28, 2017
    Date of Patent: November 24, 2020
    Assignee: Veritas Technologies LLC
    Inventors: Yingsong Jia, Xin Wang, Guangbin Zhang
  • Patent number: 10838990
    Abstract: Techniques for improving data compression of a storage system using coarse and fine grained similarity are described herein. According to one embodiment, region sketches for a plurality of regions of the set of data are generated, each region storing a plurality of data chunks. A region sketch index having a plurality of entries is maintained, each corresponding to one of the region sketches of the regions. The entries of the region sketch index are sorted based on the sketches of the regions, such that regions with an identical region sketch are positioned adjacent to each other within the region sketch index, representing similar regions. The data chunks of the similar regions that are identified based on the sorted entries of the region sketch index are reorganized to improve data compression of the data chunks of the similar regions.
    Type: Grant
    Filed: September 26, 2013
    Date of Patent: November 17, 2020
    Assignee: EMC IP HOLDING COMPANY LLC
    Inventors: Philip Shilane, Grant Wallace, Frederick Douglis, Guanlin Lu
  • Patent number: 10839087
    Abstract: Disclosed herein are system, method, and computer program product embodiments for secure data aggregation in databases. An embodiment operates by identifying a value column and a group column of a plurality of columns of a dataset. Two distinct group values of the group column are identified. A first group value is replaced with a first substitute value, and a second group value is replaced with a second substitute value. A value of the value column of each of the plurality of records and the substitute values are encrypted. The plurality of encrypted records are uploaded to a server.
    Type: Grant
    Filed: July 30, 2018
    Date of Patent: November 17, 2020
    Assignee: SAP SE
    Inventors: Timon Hackenjos, Florian Hahn, Florian Kerschbaum
  • Patent number: 10831372
    Abstract: An embodiment of the present invention is directed to implementing an automated repository monitoring tool. The system comprises: a plurality of repositories that are accessed by one or more applications; an interactive interface that receives one or more user inputs and displays repository monitor data; and a processor coupled to the interactive interface and configured to perform the steps comprising: identifying a storage limit for the plurality of repositories; upon exceeding the storage limit, monitoring a set of repositories for storage consumption; determining a variance amount for each of the set of repositories for a predetermined time period that exceeds a predetermined parameter; identifying at least one culprit repository based on the variance amount; automatically modifying a first state of the at least one repository to a safeguard state; and generating a notification to one or more recipients responsible for the least one repository.
    Type: Grant
    Filed: June 6, 2018
    Date of Patent: November 10, 2020
    Assignee: JPMorgan Chase Bank, N.A.
    Inventors: James Todd Barnes, Farhan Ahmed, Brian J. Gordon, Stephen W. Terry