Data Cleansing, Data Scrubbing, And Deleting Duplicates Patents (Class 707/692)
  • Patent number: 11429634
    Abstract: In some embodiments, an interface of a content management system manages synchronized content on storage systems. For example, the interface stores, on a metadata storage structure, records of metadata associated with blocks of data stored on a storage, the records including block identifiers that uniquely identify the blocks and timestamps associated with the blocks. The interface identifies a batch of storage operations associated with the blocks, including one or more delete operations. For each delete operation, the interface queries the metadata storage structure for a timestamp corresponding to a block of data associated with the delete operation, determines whether the delete operation creates a race condition between the delete operation and an add operation associated with the block of data, and rejects the delete operation when the delete operation creates the race condition or the timestamp corresponding to the block of data is newer than a predetermined period of time.
    Type: Grant
    Filed: December 29, 2017
    Date of Patent: August 30, 2022
    Assignee: Dropbox, Inc.
    Inventors: Nipunn Koorapati, Daniel Horn, Elmer Charles Jubb, IV
  • Patent number: 11429575
    Abstract: Example methods, apparatus, systems and articles of manufacture (e.g., physical storage media) to deduplicate common devices across multiple data sources are disclosed. An example system includes a comparison controller to identify a first device in a first data source and a second device in a second data source as a possible common device.
    Type: Grant
    Filed: July 10, 2020
    Date of Patent: August 30, 2022
    Assignee: THE NIELSEN COMPANY (US), LLC
    Inventors: Rachel Worth Olson, Michael Evan Anderson, Rishi Sriram, Margaret M. Orton, Fatemehossadat Miri, Samantha M. Mowrer, David J. Kurzynski, Molly Poppie
  • Patent number: 11429573
    Abstract: A data deduplication system includes a data deduplication subsystem coupled to each of a host system and a storage system. The data deduplication system receives data from the host system, generates a data deduplication identifier for the data, and determines whether the data deduplication identifier for the data is stored in a data deduplication database. In response to determining that the data deduplication identifier is not stored in the data deduplication database, the data deduplication system stores the data deduplication identifier for the data in the data deduplication database in association with a data counter for the data, and transmits the data to the storage system for storage. In response to determining that the data deduplication identifier is stored in the data deduplication database, the data deduplication system increments a data counter that is associated with the data deduplication identifier in the data deduplication database, and discards the data.
    Type: Grant
    Filed: October 16, 2019
    Date of Patent: August 30, 2022
    Assignee: Dell Products L.P.
    Inventors: Dharmesh M. Patel, Ravikanth Chaganti, Rizwan Ali
  • Patent number: 11423027
    Abstract: A system and method for a text search of a database, including converting a text search expression to a query plan and implementing the text search as the query plan on the database. The implementing of the text search includes a one-pass indexing as a single scan of an inverse index table associated with the database.
    Type: Grant
    Filed: January 29, 2016
    Date of Patent: August 23, 2022
    Assignee: MICRO FOCUS LLC
    Inventors: Qiming Chen, Meichun Hsu, Malu G. Castellanos
  • Patent number: 11416316
    Abstract: A first-to-second correlation engine determines correlations between first objects from a first object feed, and second objects from a second object storage, and generates first correlation messages indicative of the correlations for a first-to-second object direction and a second-to-first object direction. A second-to-first correlation engine determines respective correlations between the second objects from a second object feed and the first objects from a first object storage, and generates second correlation messages indicative of the respective correlations for the second-to-first object direction and the first-to-second object direction. A first-to-second correlation storage engine receives the first and second correlation messages for the first-to-second object direction and updates first-to-second correlation storage based on the received messages.
    Type: Grant
    Filed: October 15, 2020
    Date of Patent: August 16, 2022
    Assignee: AMADEUS S.A.S.
    Inventors: Serge Beuzit, Jean-Samuel Pasquali
  • Patent number: 11409766
    Abstract: Disclosed herein is the creation of probabilistic data structures for container reclamation. One method involves retrieving a segment object list of a data container and creating a probabilistic data structure. The segment object list comprises a plurality of segment objects, the data container comprises the plurality of segment objects and a plurality of data objects, and each segment object of the plurality of segment objects comprises a hash value determined by performing a hashing function on a corresponding data object of the plurality of data objects. The creating includes, for each segment object in the segment object list, identifying an element of a plurality of elements of the probabilistic data structure using a hash value of the each segment object and setting the element to indicate the segment object references a corresponding data object of the plurality of data objects.
    Type: Grant
    Filed: October 26, 2020
    Date of Patent: August 9, 2022
    Assignee: Veritas Technologies LLC
    Inventors: Yingsong Jia, Xin Wang, Guangbin Zhang
  • Patent number: 11403266
    Abstract: A method for deleting a row from a table in a database system comprises logically deleting the row in the first table in the database system by inserting a key of the row into a corresponding row of a dedicated table in the database system; querying the dedicated table during a query against the first table to identify the corresponding row in the dedicated table; and in response to identifying the corresponding row in the dedicated table, deleting the row from the first table and the corresponding row from the dedicated table as part of query processing during a subsequent query.
    Type: Grant
    Filed: June 4, 2019
    Date of Patent: August 2, 2022
    Assignee: International Business Machines Corporation
    Inventors: Andreas Brodt, Oliver Koeth, Daniel Martin, Knut Stolze
  • Patent number: 11403019
    Abstract: A method includes receiving a request to write a data block to a volume resident on a multi-tenant storage array, wherein the request is associated with a first tenant of the multi-tenant storage array, and determining whether the data block matches an existing data block on the multi-tenant storage array, wherein the existing block corresponds to a second tenant. In response to determining that the decrypted data block matches the existing data block: encrypting the existing data block with a shared volume encryption key; encrypting the shared volume encryption key with a first tenant encryption key and providing the shared volume encryption key encrypted with the first tenant encryption key to the first tenant; and encrypting the shared volume encryption key with a second tenant encryption key and providing the shared volume encryption key encrypted with the second tenant encryption key to the second tenant.
    Type: Grant
    Filed: October 26, 2018
    Date of Patent: August 2, 2022
    Assignee: Pure Storage, Inc.
    Inventors: Swapnil Chandrashekhar Nagle, Virendra Prakashaiah, Ronald Karr
  • Patent number: 11360690
    Abstract: There is provided a storage device that is connected to a computer and receives an UNMAP command to cancel a relationship between a logical address and a physical address provided to the computer, in response to data deletion on the computer. The storage device includes a control unit configured to make data stored in a physical address specified by the UNMAP command irreversible.
    Type: Grant
    Filed: August 23, 2019
    Date of Patent: June 14, 2022
    Assignee: HITACHI, LTD.
    Inventors: Hirotaka Nakagawa, Akihiro Hara
  • Patent number: 11360954
    Abstract: A method, computer program product, and computing system for receiving a candidate data portion; calculating a distance-preserving hash for the candidate data portion; and performing an entropy analysis on the distance-preserving hash to generate a hash entropy for the candidate data portion.
    Type: Grant
    Filed: August 3, 2020
    Date of Patent: June 14, 2022
    Assignee: EMC IP HOLDING COMPANY, LLC
    Inventors: Sorin Faibish, Philip Shilane, Ivan Basov, Istvan Gonczi, Philippe Armangau, Vamsi Vankamamidi
  • Patent number: 11354200
    Abstract: One embodiment provides a system which facilitates organization of data. During operation, the system receives data associated with a logical block address (LBA) to be written to a non-volatile memory. The system stores, in a data structure, a mapping of a first physical block address (PBA) corresponding to the LBA to a first status for the data, wherein the first status indicates data validity and recovery being enabled for the data. Responsive to receiving a command to delete the data, the system modifies the first status to indicate data invalidity and recovery being enabled for the data. Responsive to receiving a command to recover the previously deleted data, the system modifies the first status to indicate data validity and recovery being enabled for the data.
    Type: Grant
    Filed: June 17, 2020
    Date of Patent: June 7, 2022
    Assignee: Alibaba Group Holding Limited
    Inventor: Shu Li
  • Patent number: 11347423
    Abstract: A method, computer program product, and computer system for identifying a plurality of blocks. At least one heuristic associated with at least a portion of the plurality of blocks may be determined. It may be determined whether at least the portion of the plurality of blocks is a candidate for deduplication based upon, at least in part, the at least one heuristic. At least the portion of the plurality of blocks may be deduplicated based upon, at least in part, the at least one heuristic.
    Type: Grant
    Filed: July 29, 2019
    Date of Patent: May 31, 2022
    Assignee: EMC IP HOLDING COMPANY, LLC
    Inventors: Ivan Basov, Sorin Faibish, Istvan Gonczi
  • Patent number: 11347690
    Abstract: A method includes retrieving, with a masker controller job, an object and an associated object ID from a masking bucket that is defined in storage, making a copy of the object, with a masker worker microservice, masking the copy of the object to create a masked object, transmitting the masked object to an object access microservice, with the object access microservice, transmitting the masked object to a deduplication microservice, with the deduplication microservice, deduplicating the masked object, and storing the masked object in the storage.
    Type: Grant
    Filed: May 20, 2020
    Date of Patent: May 31, 2022
    Assignee: EMC IP HOLDING COMPANY LLC
    Inventors: Kimberly R. Lu, Joseph S. Brandt, Philip N. Shilane
  • Patent number: 11341106
    Abstract: A deduplicated storage system is provided according to certain embodiments that uses one or more mechanisms to update the deduplication database and remove records corresponding to data blocks that have been or will be erased from the secondary copies, without using or tracking reference counting values. Some embodiments described herein use a secondary table (for tracking archive file contents) and a bitmap to mark which primary records are present in the secondary table. In another embodiment, once the marking phase is completed, the deduplication system uses the marked-up bitmap to identify the corresponding records from the primary table that can be moved to another table for storing “zero-reference” data blocks. In other embodiments, the system will then traverse the “zero-reference” table and remove those primary data blocks from secondary storage devices.
    Type: Grant
    Filed: September 12, 2019
    Date of Patent: May 24, 2022
    Assignee: Commvault Systems, Inc.
    Inventors: Deepak Raghunath Attarde, Manoj Kumar Vijayan
  • Patent number: 11327935
    Abstract: Examples of an intelligent data quality application are defined. In an example, the system receives a data quality requirement from a user. The system obtains target data from a plurality of data sources. The system implements an artificial intelligence component sort the target data into a data cascade. The data cascade may include a plurality of attributes associated with the data quality requirement. The system may evaluate the data cascade to identify a data pattern model for each of the attributes. The system may implement a first cognitive learning operation to determine a mapping context from the data cascade and a conversion rule from the data pattern model. The system may establish a data harmonization model corresponding to the data quality requirement by performing a second cognitive learning operation. The system may generate a data cleansing result corresponding to the data quality requirement.
    Type: Grant
    Filed: December 23, 2019
    Date of Patent: May 10, 2022
    Assignee: ACCENTURE GLOBAL SOLUTIONS LIMITED
    Inventors: Sabrina Yamashita, Armando Martines Neto, Vivek Likhar, Acyr Da Luz
  • Patent number: 11321165
    Abstract: A method for log data sampling is disclosed. The method includes receiving logs of a computer system. A log comprises information regarding an operation of the computer system. The method also includes determining a sample of the logs by applying a set of sampling methods to the logs. The method further includes providing the sample of the logs as an input to an anomaly detection model for the computer system. The anomaly detection model identifies a fault in the operation of the computer system based on the input.
    Type: Grant
    Filed: September 22, 2020
    Date of Patent: May 3, 2022
    Assignee: International Business Machines Corporation
    Inventors: Xiaotong Liu, Jiayun Zhao, Anbang Xu, Rama Kalyani T. Akkiraju
  • Patent number: 11314693
    Abstract: A computer implemented system and method for automated estimation of relationships among a plurality of data elements. The approach includes processing elements of one or more data sets to establish linkage relations among the data records, and then extending the linkage relations based on one or more equivalence relations, stored as linkage data structures. The generated data structures are used for computationally simplifying the data sets by consolidating data records or removing redundancies, such as duplicates, and may be used to yield a compressed data representation or data structure.
    Type: Grant
    Filed: March 14, 2019
    Date of Patent: April 26, 2022
    Assignee: ROYAL BANK OF CANADA
    Inventors: Hisham Abu-Abed, Xiuzhan Guo, Joel Ian Tousignant-Barnes
  • Patent number: 11314705
    Abstract: A technique for managing deduplication performs partial-block matching opportunistically by leveraging information acquired during times when a storage system has available resources. The information identifies anchor blocks that are likely targets for partial-block matches, based on discovering that the anchor blocks belong to populations of blocks that have high similarity. When processing write requests, inline activities access anchor blocks that closely match newly arriving candidate blocks and perform partial-block deduplication against those anchor blocks.
    Type: Grant
    Filed: October 30, 2019
    Date of Patent: April 26, 2022
    Assignee: EMC IP Holding Company LLC
    Inventors: Ronen Gazit, Uri Shabi
  • Patent number: 11308127
    Abstract: For a given cross-data-store transaction request at a storage service, a coordinator transmits respective voting transition requests to a plurality of log-based transaction managers (LTMs) configured for the respective data stores to which writes are directed in the transaction. The LTMs transmit responses to the coordinator based on data-store-specific conflict detection performed using contents of the voting transition requests and respective data-store-specific state transition logs. The coordinator determines a termination status of the cross-data-store transaction based on the LTMs' responses, and provides an indication of the termination status to the LTMs.
    Type: Grant
    Filed: February 26, 2018
    Date of Patent: April 19, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Uphendra Bhalchandra Shevade, Gregory Rustin Rogers, Christopher Ian Hendrie
  • Patent number: 11301169
    Abstract: A multi-platform data storage system that facilitates sharing of containers including one or more virtual storage resources. The multi-platform data storage system can, for example, include a storage interface configured to enable access to a plurality of storage platforms that use different storage access and/or management protocols, the plurality of storage platforms storing data objects in physical data storage; and a storage mobility and management layer providing virtual management of virtual storage resources corresponding to one or more data objects stored in the plurality of storage platforms, the storage mobility and management layer including at least a transfer module coupled to at least one network and configured to transfer at least one of the data objects. The transfer module can transfer the at least one of the data objects between the multi-platform data storage system and another data storage system.
    Type: Grant
    Filed: February 3, 2020
    Date of Patent: April 12, 2022
    Assignee: Arrikto Inc.
    Inventors: Konstantinos Venetsanopoulos, Evangelos Koukis, Christos Stavrakakis, Ilias Tsitsimpis, Dimitrios Aragiorgis, Alexios Pyrgiotis
  • Patent number: 11301427
    Abstract: Deduplication, including inline deduplication, of data for a file system can be implemented and managed. A data management component (DMC) can control inline and post-process deduplication of data during write and read operations associated with memory. DMC can determine whether inline data deduplication is to be performed to remove a data chunk from a write operation to prevent the data chunk from being written to a data store based on a whether a hash associated with the data chunk matches a stored hash stored in a memory index and associated with a stored data chunk stored in a shadow store. If there is a match, DMC can perform a byte-by-byte comparison of the data chunk and stored data chunk to determine whether they match. If they match, DMC can perform inline data deduplication to remove the data chunk from the write operation.
    Type: Grant
    Filed: October 15, 2019
    Date of Patent: April 12, 2022
    Assignee: EMC IP HOLDING COMPANY LLC
    Inventors: Lachlan McIlroy, Robert Shelton
  • Patent number: 11295049
    Abstract: A method implemented by a data processing system for processing data items of a stream of data items, including: accessing a specification that represents the executable logic, wherein a state of the specification for a particular value of the key specifies one or more portions of the executable logic that are executable in that state; receiving, over an input device or port, data items of a stream of data; for a first one of the data items of the stream, identifying a first state of the specification for a value of the key associated with that first one of the data items; processing, by the data processing system, the first one of the data items according to one or more portions of executable logic that are represented in the specification as being associated with the first state.
    Type: Grant
    Filed: February 3, 2020
    Date of Patent: April 5, 2022
    Assignee: Ab Initio Technology LLC
    Inventors: Joel Gould, Scott Studer, Craig W. Stanfill
  • Patent number: 11288132
    Abstract: Described is a system for distributing multiple phases of a deduplication processing amongst of set of nodes. The system may perform a load-balancing in configurations where multiple generations of backup data are redirected to the same host node, and thus, require the host node to perform certain storage processes such as writing new backup data to its associated physical storage. Accordingly, the system may perform an initial (or first phase) processing on a first node that is selected based on resource usage or classification (e.g. metadata storing node). The system may then perform a subsequent (or second phase) processing on a second, or host node, that is selected based on the node already storing previous generations of the backup data. Accordingly, the system still redirects processing to a host node, but provides the ability to delegate certain deduplication operations to additional nodes.
    Type: Grant
    Filed: September 30, 2019
    Date of Patent: March 29, 2022
    Assignee: EMC IP Holding Company LLC
    Inventors: Abhishek Rajimwale, George Mathew
  • Patent number: 11269755
    Abstract: Systems and methods for monitoring one or more social media accounts of one or more users to process potentially relevant or important activity. The system can employ automated filtering methods to select from all social media activity the data that is most likely to be relevant for review. The systems and methods can be employed with user accounts or services not associated with social media.
    Type: Grant
    Filed: March 19, 2019
    Date of Patent: March 8, 2022
    Assignee: Humanity X Technologies
    Inventors: Jordan T. Bates, Bin Hong Lee, Kacie McCollum, Pat Pataranutaporn, Ram N. Polur
  • Patent number: 11263087
    Abstract: Methods and systems for serverless data deduplication are disclosed. A blob of data is received at a cloud services platform, where the blob of data includes incremental data. The blob of data is used to create an object in a first object store included in the cloud services platform. A function as a service (FaaS) function is triggered when the object is created. The FaaS function deduplicates the object to generate a deduplicated object. The deduplicated object is stored in a second object store included in the cloud services platform.
    Type: Grant
    Filed: July 5, 2018
    Date of Patent: March 1, 2022
    Assignee: EMC IP HOLDING COMPANY LLC
    Inventors: Assaf Natanzon, Saar Cohen
  • Patent number: 11256746
    Abstract: A method and apparatus for a graph database instance (GDI) maintaining a secondary index, that indexes data from a sparse data map storing graph application data, within a sparse data map dedicated to the secondary index. The GDI formulates row-keys, for the secondary index map, by hashing the values of key/value pairs stored in rows of a map storing application data. The GDI stores for each formulated row-key, in the row of the secondary index that is indexed by the formulated row-key, references to rows of the map storing application data that match the key/value pair on which formulation of the row-key was based. The row-keys into the secondary index map may incorporate bucket identifiers, which, for each key/value pair, allows the GDI to spread the references to graph elements that match the key/value pair among a set number of “buckets” for the key/value pair within the secondary index map.
    Type: Grant
    Filed: April 21, 2017
    Date of Patent: February 22, 2022
    Assignee: Oracle International Corporation
    Inventors: Zhe Wu, Gabriela Montiel Moreno, Jiao Tao, Jayanta Banerjee
  • Patent number: 11243915
    Abstract: A current file is obtained in the data. It is determined whether a similar historical file exists based on a sampled data block from at least one predetermined location in the current file. In response to non-existence of the similar historical file, the current file and corresponding metadata are stored on a file basis. In response to existence of the similar historical file, a deduplication operation is applied on the current file on a block basis.
    Type: Grant
    Filed: August 28, 2018
    Date of Patent: February 8, 2022
    Assignee: International Business Machines Corporation
    Inventors: Min Fang, JiaYang Zheng, GuoFeng Zhu
  • Patent number: 11243702
    Abstract: In some aspects, devices, systems, and methods are provided that relate to data deduplication performed in data storage devices, such as solid-state drives (SSD) or drives of any other type. In some aspects, devices, systems, and methods are provided that relate to hierarchical data deduplication at a local and system level, such as in a storage system built with one or more SSDs having built-in data deduplication functionality. The hierarchical data deduplication utilizes the IDs in the data storage devices to decide if the incoming data has to be stored or if a copy of the incoming data is already stored. In hierarchical data deduplication, no IDs (or signatures) are required to be stored at a system level. In some aspects, data steering is provided that enables data storing coordination in a system that consists of a set of data storage device (e.g., SSDs) having built-in data deduplication.
    Type: Grant
    Filed: March 19, 2020
    Date of Patent: February 8, 2022
    Assignee: SMART IOPS, INC.
    Inventors: Manuel Antonio d'Abreu, Ashutosh Kumar Das
  • Patent number: 11237919
    Abstract: In certain systems disclosed herein, a distributed data monitoring and management system is provided that can replicate a distributed storage environment. The distributed data monitoring and management system can intelligently and automatically configure data access nodes to form a structure that matches the distributed storage environment. By matching the structure of the distributed storage environment, the distributed structure of the data may be maintained, enabling the data to be backed up from and/or restored to the distributed storage environment and/or migrated to another distributed storage environment without altering the distribution of the data. Further, embodiments herein enable the transfer of data from a non-distributed environment to a distributed storage environment. Thus, in some cases, an entity can migrate data from a local storage structure to a network-based distributed storage structure.
    Type: Grant
    Filed: April 12, 2019
    Date of Patent: February 1, 2022
    Assignee: Commvault Systems, Inc.
    Inventors: Manoj Kumar Pradhan, Paramasivam Kumarasamy, Dmitriy Borisovich Zakharkin, Arun Prabu Duraisamy
  • Patent number: 11238083
    Abstract: A method for identifying a desired document is provided to include forming K clusters of documents and, for each cluster: for each respective document of the cluster determining a sum of distances between (i) the respective document and (ii) each of the other documents of the cluster; and identifying a medoid document of the cluster as the document of the cluster having the smallest sum of determined distances of all of the documents of the cluster. The method also includes selecting M representative documents for each cluster, identifying for dynamic display toward the user K groupings of documents, wherein each of the K groupings of documents identifies the selected M representative documents of a corresponding cluster, and, in response to user selection of one of the K groupings of documents, identifying for dynamic display toward the user P documents of the cluster that corresponds to the selected grouping.
    Type: Grant
    Filed: May 11, 2018
    Date of Patent: February 1, 2022
    Assignee: Evolv Technology Solutions, Inc.
    Inventors: Robert Severn, Matthew J. Strom, Diego Guy M. Legrand, James O'Neill, Scott Henning
  • Patent number: 11232074
    Abstract: A search term is received at deduplicated storage storing data segmented into segments. Segment fingerprints are generated and metadata maintained to allow reconstruction of the segmented data. The metadata includes fingerprint listings indicating sequences according to which the segments should be reconstructed. The segments are read to determine whether there are any matches of the search term. Matches are recorded in a results table. A first fingerprint listing associated with a first object is read. The results table is queried for fingerprints in the first fingerprint listing to determine whether the first object references any matches in the results table.
    Type: Grant
    Filed: May 19, 2020
    Date of Patent: January 25, 2022
    Assignee: EMC IP Holding Company LLC
    Inventor: Philip Shilane
  • Patent number: 11218296
    Abstract: A data storage system allows data to be encrypted and de-duplicated at the same system. By way of example, a server of the data storage system may request a client device which intends to upload a data block to transmit a first fingerprint of the data block to the server. The first fingerprint may be derived from the plaintext of the data block. The server may apply a one-way function to the first fingerprint to generate an encryption key and transmit the encryption key to the client device. The client device uses the encryption key to encrypt the data block and generates a second fingerprint which is derived from the ciphertext of the data block. The server uses both the first fingerprint and the second fingerprint to verify the data block and the legitimacy of the client attempting to upload the data block.
    Type: Grant
    Filed: July 8, 2019
    Date of Patent: January 4, 2022
    Assignee: Druva Inc.
    Inventors: Srikiran Gottipati, Milind Borate
  • Patent number: 11216199
    Abstract: A technique for managing write requests in a data storage system checks whether newly-arriving data match previously-stored data that have been recorded in a deduplication database. If a match is found, the technique compares mapping metadata for the newly-arriving data with mapping metadata for the matching data. If both sets of metadata point to the same storage location, then the newly-arriving data is a same-data write and a new write to disk is avoided.
    Type: Grant
    Filed: October 31, 2018
    Date of Patent: January 4, 2022
    Assignee: EMC IP Holding Company LLC
    Inventors: Philippe Armangau, Monica Chaudhary, Ajay Karri, Alexander Daniel
  • Patent number: 11216428
    Abstract: A decision support system and method, which receives user inputs comprising: at least one user criterion, and at least one user input tuning parameter representing user tradeoff preferences for producing an output; and selectively produces an output of tagged data from a clustered database in dependence on the at least one user criterion, the at least one user input tuning parameter, and a distance function; receives at least one reference-user input parameter representing the at least one reference-user's analysis of the tagged data and the corresponding user inputs, to adapt the distance function in accordance with the reference-user inputs as a feedback signal; and clusters the database in dependence on at least the distance function, wherein the reference-user acts to optimize the distance function based on the user inputs and the output, and on at least one reference-user inference.
    Type: Grant
    Filed: June 7, 2019
    Date of Patent: January 4, 2022
    Assignee: Ool LLC
    Inventor: Gitanjali Swamy
  • Patent number: 11194495
    Abstract: A technique performs best-effort deduplication. The technique involves activating a front-end log deduplication service that is configured and operative to perform deduplication operations on data in front-end log-based storage prior to that data reaching back-end storage that is different from the front-end log-based storage. The technique further involves, after the front-end log deduplication service is activated, receiving new data in the front-end log-based storage. The technique further involves, providing the front-end log deduplication service to perform a data deduplication operation on the new data while the new data resides within the front-end log-based storage. The technique further involves, after the data deduplication operation is performed on the new data, updating the back-end storage to indicate storage of the new data within the back-end storage.
    Type: Grant
    Filed: April 27, 2017
    Date of Patent: December 7, 2021
    Assignee: EMC IP Holding Company LLC
    Inventor: Nickolay Alexandrovich Dalmatov
  • Patent number: 11194769
    Abstract: A system and method to ensure the consistency of a data warehouse or backup database with a source database are described. The method alleviates issues of comparing two sets of the same data on disparate network systems and eliminates having to reload the entire target database or compare every field to ensure reasonable consistency of the contents. The process involves loading a unique record identifier, an optional record change timestamp, and an optional record archive field of a source database into a work file or temporary database table. Source work file records or temporary database tables records that do not exist in the target database or have timestamp mismatches are retrieved from the source database and added to or updated in the target database. Target database records that are archived or missing in the work file or temporary database table are archived or deleted from the target database.
    Type: Grant
    Filed: April 27, 2020
    Date of Patent: December 7, 2021
    Inventor: Richard Banister
  • Patent number: 11194792
    Abstract: A computer-implemented method for blockchain data storage includes generating, by one or more processing devices, a snapshot of a current state tree associated with a fixed depth Merkle tree (FDMT) during creation of a block of a blockchain, wherein the current state tree stores state information corresponding to a newest block of the blockchain; and storing, by the one or more processing devices, the snapshot of the current state tree.
    Type: Grant
    Filed: October 30, 2020
    Date of Patent: December 7, 2021
    Assignee: Alipay (Hangzhou) Information Technology Co., Ltd.
    Inventor: Zhonghao Lu
  • Patent number: 11182492
    Abstract: According to some embodiments, a system and method are provided to prevent data on a portable data device from being compromised. The method comprises receive a password associated with an emergency situation. In response to the received password, destroying original data files in one or more of the plurality of partitions based on the received password.
    Type: Grant
    Filed: October 1, 2018
    Date of Patent: November 23, 2021
    Inventor: Ivy Wong
  • Patent number: 11184423
    Abstract: Techniques are described herein that are capable of offloading upload processing of a file in a distributed system. A request is received from a requestor to upload a file to a transactional database of a DBMS. Information regarding the requestor and/or the file is extracted from the request. A determination is made that the file is to be uploaded to a non-indexing file storage system in lieu of the transactional database based at least in part on the extracted information satisfying one or more criteria. A key that includes a hash is generated. The hash is created using attribute(s) of the requestor and/or the file from the extracted information. The key is provided to the requestor. The key and at least a portion of the file are received from the requestor. Uploading of the file to the non-indexing file storage system in lieu of the transactional database is initiated.
    Type: Grant
    Filed: October 24, 2018
    Date of Patent: November 23, 2021
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Arun Ramadasan Mannengal, Ashish Basran, Jawad Ahmed Ibrahim Katib, Avinash Chandru, Shreeja Subrata Datta
  • Patent number: 11182256
    Abstract: In some examples, in response to an event at the deduplication system, a system accesses item metadata of a backup item that is backed up to a remote object storage system, the item metadata of the backup item including range information indicating a range of identifier values for portion objects of the backup item stored in the remote object storage system. The system issues, based on the range information, requests to obtain respective attribute information of the portion objects of the backup item stored in the remote object storage system. The system determines, based on the attribute information, a name of a given portion object of the backup item already used.
    Type: Grant
    Filed: October 20, 2017
    Date of Patent: November 23, 2021
    Assignee: Hewlett Packard Enterprise Development LP
    Inventors: Richard Phillip Mayo, David Malcolm Falkinder, Andrew Todd, Peter Thomas Camble
  • Patent number: 11176107
    Abstract: A computer system processes data records in a multi-tenant environment to ensure data quality. A plurality of records from a plurality of data sources are processed to provide a data quality metric for each field of the plurality of records based on record values in the field. A threshold range satisfying a specificity level of a data source is selected for each data quality metric. The data quality metric is compared to the threshold range to determine whether the data quality metric violates the threshold. A data quality report is provided for the plurality of records, wherein the data quality report indicates whether the data quality metric of each field violates the selected threshold range. Embodiments of the present invention further include a method and program product for processing data records in a multi-tenant environment to ensure data quality in substantially the same manner described above.
    Type: Grant
    Filed: December 7, 2018
    Date of Patent: November 16, 2021
    Assignee: International Business Machines Corporation
    Inventors: Yifan Xu, James Natale, Matthew Hagenbuch, Matthew M. Pohlman
  • Patent number: 11175842
    Abstract: In general, the invention relates to a method for processing data. The method includes receiving a write request from a host, and in response to the write request, obtaining system metadata for a system, selecting, based on the system metadata, a selected component of the system to perform a data processing operation, and initiating the data processing operation on the selected component.
    Type: Grant
    Filed: March 6, 2020
    Date of Patent: November 16, 2021
    Assignee: Dell Products L.P.
    Inventors: Dharmesh M. Patel, Ravikanth Chaganti, Rizwan Ali
  • Patent number: 11169975
    Abstract: A recognition quality management system and method is used to determine a final group quality grade (FGQG) for a database containing data structures pertaining to objects, where the FGQG is a single numeric score indicative of the quality of the recognition that has occurred within the database. The FGQG is calculated using a weighted algorithm incorporating at least three components: a string quality score (SQS) that is determined by a string distance calculation; an input quality score (IQS) that is determined from address confidence codes; and a link quality score (LQS) that evaluates a key field to determine grouping quality. The system and method allows for the determination of recognition quality across an entire database rather than using sampling and extrapolation, and thus leads to a higher quality result, and because the system and method is objective it allows comparison of recognition quality across competing recognition quality solutions.
    Type: Grant
    Filed: July 14, 2017
    Date of Patent: November 9, 2021
    Assignee: Acxiom LLC
    Inventors: Chris Powell, John Tindell, Brandy Walsh, Sarah Davis
  • Patent number: 11169967
    Abstract: Methods and apparatuses for performing selective deduplication in a storage system are introduced here. Techniques are provided for determining a probability of deduplication for a data object based on a characteristic of the data object and performing a deduplication operation on the data object in the storage system prior to the data object being stored in persistent storage of the storage system if the probability of deduplication for the data object has a specified relationship to a specified threshold.
    Type: Grant
    Filed: December 17, 2019
    Date of Patent: November 9, 2021
    Assignee: NetApp Inc.
    Inventors: Damarugendra Mallaiah, Jayanta Basak
  • Patent number: 11157452
    Abstract: A method for in-band de-duplication, the method may include receiving by a hardware accelerator, a received packet of a first sequence of packets that conveys a first data chunk; applying a data chunk hash calculation process on the received packet while taking into account a hash calculation result obtained when applying the data chunk hash calculation process on a last packet of the first sequence that preceded the received packet; wherein the calculating of the first data chunk hash value is initiated before a completion of a reception of the entire first data chunk by the hardware accelerator.
    Type: Grant
    Filed: May 9, 2017
    Date of Patent: October 26, 2021
    Assignee: Amazon Technologies, Inc.
    Inventors: Nafea Bshara, Leah Shalev, Erez Izenberg, Georgy Machulsky, Ron Diamant
  • Patent number: 11157453
    Abstract: An approach for parallel deduplication using automatic chunk sizing. A dynamic chunk deduplicator receives a request to perform data deduplication where the request includes an identification of a dataset. The dynamic chunk deduplicator analyzes file level usage for one or more data files including the dataset to associate a deduplication chunk size with the one or more data files. The dynamic chunk deduplicator creates a collection of data segments from the dataset, based on the deduplication chunk size associated with the one or more data files. The dynamic chunk deduplicator creates a deduplication data chunk size plan where the deduplication data chunk size plan includes deduplication actions for the collection of data segments and outputs the deduplication data chunk size plan.
    Type: Grant
    Filed: October 15, 2019
    Date of Patent: October 26, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Debora A. Lowry, Jonathan Mendez Chacon, Jose Daniel Ramos Chaves, Blanca R. Navarro Piedra
  • Patent number: 11150827
    Abstract: When the hash of the divided data is not duplicate with the hash registered in an in-memory hash table, the hash of the divided data is registered in the in-memory non-duplication data list. When the hash registered in the in-memory non-duplication data list is duplicated with the hash registered in an on-disk hash table, a duplicate count of the duplicate hash registered in the on-disk hash table is increased by 1 and an update time of the hash is updated to the latest value. When the duplication count of the hash registered in the on-disk hash table exceeds a threshold, the hash which is registered in the on-disk hash table and in which the duplication count has exceeded the threshold is moved from the on-disk hash table to the in-memory hash table.
    Type: Grant
    Filed: September 1, 2020
    Date of Patent: October 19, 2021
    Assignee: HITACHI, LTD.
    Inventors: Kazumasa Matsubara, Mitsuo Hayasaka
  • Patent number: 11151128
    Abstract: Disclosed herein are system, method, and computer program product embodiments for providing data partitioning and transferring operations. An embodiment operates by determining a partition size and a number of partitions for an initial data set to be transferred from a first location to a second location. A uniqueness factor for at least a subset of the columns of the dataset is determined, and a set of unique columns is identified from the initial data set based on the uniqueness factor. Based on the partition size, a set of values from the row records from the set of unique columns is identified. Based on the identified set of values, the initial data set is partitioned into the number of partitions. One of transmitting or receiving at least one of the partitions is performed.
    Type: Grant
    Filed: March 25, 2019
    Date of Patent: October 19, 2021
    Assignee: SAP SE
    Inventors: Terrance Mihm, Babu Sathya, Benjamin Lorenz
  • Patent number: 11144533
    Abstract: A method is used in managing deduplication of data in storage systems. A candidate data object is identified for deduplicating a data object by evaluating digests stored in a current digest segment to determine whether another digest matching a digest associated with the data block is stored in the current digest segment. The current digest segment includes a set of digests associated with a set of data blocks previously received for deduplication. Based on the evaluation, a deduplicating technique is applied to the data object. The current digest segment is stored in an index table. A previous digest segment associated with a digest stored in the index table matches the digest associated with the data block is replaced by the current digest segment. A plurality of digest segments are organized into a segment group and a reference counter is associated with the segment group, wherein if the reference counter reaches zero, storage space consumed by the digest group is reclaimed.
    Type: Grant
    Filed: September 30, 2016
    Date of Patent: October 12, 2021
    Assignee: EMC IP Holding Company LLC
    Inventors: Nickolay Alexandrovich Dalamatov, Richard P. Ruef, Kurt William Everson
  • Patent number: 11144227
    Abstract: Techniques for implementing content-based post-process data deduplication are provided. In one set of embodiments, a computer system can receive a write request comprising write data to be persisted to a storage system and can sample a portion of the write data. The computer system can further execute one or more analyses on the sampled portion in order to determine whether the write data is a good deduplication candidate that is likely to contain redundancies which can be eliminated via data deduplication. If the one or more analyses indicate that the write data is a good deduplication candidate, the computer system can cause the write data to be persisted to a staging storage component of the storage system. Otherwise, the computer system can cause the write data to be persisted to a primary storage component of the storage system that is separate from the staging storage component.
    Type: Grant
    Filed: September 7, 2017
    Date of Patent: October 12, 2021
    Assignee: VMWARE, INC.
    Inventors: Adrian Marinescu, Glen McCready