Data Cleansing, Data Scrubbing, And Deleting Duplicates Patents (Class 707/692)
  • Patent number: 11681660
    Abstract: Embodiments presented herein describe techniques for deduplicating chunks of data across multiple clusters. A process executing in a storage system identifies one or more chunks in an incoming stream of data. For each chunk, a first fingerprint corresponding to the chunk is generated. The process determines whether the first fingerprint matches a second fingerprint listed in a corresponding entry in a deduplication map. Each entry of the deduplication map corresponds to a chunk stored in a location in one of the storage clusters. Upon determining that the first fingerprint matches the second fingerprint, the process writes, to a local persistent storage, a pointer referencing the location in that storage cluster.
    Type: Grant
    Filed: January 22, 2021
    Date of Patent: June 20, 2023
    Assignee: Cohesity, Inc.
    Inventor: Ganesha Shanmuganathan
  • Patent number: 11675741
    Abstract: Methods and systems for improving data back-up, recovery, and search across different cloud-based applications, services, and platforms are described. A data management and storage system may direct compute and storage resources within a customer's cloud-based data storage account to back-up and restore data while the customer retains full control of their data. The data management and storage system may direct the compute and storage resources within the customer's cloud-based data storage account to generate and store secondary layers that are used for generating search indexes, to generate and store shared space layers and user specific layers to facilitate the deduplication of email attachments and text blocks, to perform a controlled restoration of email snapshots such that sensitive information (e.g., restricted keywords) located within stored snapshots remains protected, and to detect and preserve emails that were received or transmitted and then deleted between two consecutive snapshots.
    Type: Grant
    Filed: July 8, 2021
    Date of Patent: June 13, 2023
    Assignee: Rubrik, Inc.
    Inventors: Noel Moldvai, Jihang Lim
  • Patent number: 11663196
    Abstract: In one example, a method includes receiving, at a cloud storage site, chunks that each take the form of a hash of a combination that includes two or more salts and a file object, and one of the salts is a retention salt shared by the chunks, monitoring a time period associated with the retention salt, when the time period has expired, removing the chunks that include the retention salt, and depositing the removed chunks in a deleted items cloud store.
    Type: Grant
    Filed: October 28, 2021
    Date of Patent: May 30, 2023
    Assignee: EMC IP HOLDING COMPANY LLC
    Inventor: Peter Marelas
  • Patent number: 11665377
    Abstract: Aspects of the subject disclosure may include, for example, a device having a processing system including a processor; and a memory that stores executable instructions that, when executed by the processing system, facilitate performance of operations including receiving encrypted hypertext transport protocol (HTTPS) traffic including media content; separating the HTTPS traffic into audio segments and video segments; calculating a size for each audio segment in the HTTPS traffic; maintaining a sliding window of a plurality of sizes of consecutive audio segments to form a fingerprint; and identifying the media content by matching the fingerprint with a reference in a catalog. Other embodiments are disclosed.
    Type: Grant
    Filed: April 23, 2021
    Date of Patent: May 30, 2023
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Yuan Ding, Natalia Schenck, Daniel Sanchez, Umut Akyol, Lawrence E. Bakst, Vinay Sharma
  • Patent number: 11663178
    Abstract: A deduplicated storage system is provided according to certain embodiments that uses one or more mechanisms to assign the deduplication databases based on the type of the client device and automatically create a new deduplication database when critical thresholds are reached. In other embodiments, deduplication databases are further split into multiple database partitions. Based on a data block distribution policy, each data block is then further assigned to a particular database partition within the deduplication database to further improve efficiency and speed of the deduplication process.
    Type: Grant
    Filed: October 23, 2020
    Date of Patent: May 30, 2023
    Assignee: Commvault Systems, Inc.
    Inventor: Prasad Nara
  • Patent number: 11663195
    Abstract: In one example, a method includes receiving, at a cloud storage site, chunks that each take the form of a hash of a combination that includes two or more salts and a file object, and one of the salts is a retention salt shared by the chunks, monitoring a time period associated with the retention salt, when the time period has expired, removing the chunks that include the retention salt, and depositing the removed chunks in a deleted items cloud store.
    Type: Grant
    Filed: October 28, 2021
    Date of Patent: May 30, 2023
    Assignee: EMC IP HOLDING COMPANY LLC
    Inventor: Peter Marelas
  • Patent number: 11663194
    Abstract: In one example, a method includes receiving, at a cloud storage site, chunks that each take the form of a hash of a combination that includes two or more salts and a file object, and one of the salts is a retention salt shared by the chunks, monitoring a time period associated with the retention salt, when the time period has expired, removing the chunks that include the retention salt, and depositing the removed chunks in a deleted items cloud store.
    Type: Grant
    Filed: October 28, 2021
    Date of Patent: May 30, 2023
    Assignee: EMC IP HOLDING COMPANY LLC
    Inventor: Peter Marelas
  • Patent number: 11659006
    Abstract: An assessment component that facilitates assessment and enforcement of policies within a computer environment can comprise a compliance component that determines whether a policy, that defines one or more requirements associated with usage of one or more enterprise components of an enterprise computing system, is in compliance with a plurality of standardized policies that govern operation of the one or more enterprise components of the enterprise computing system. The assessment component can also comprise a policy optimization component that determines one or more changes to the policy that achieve the compliance with the plurality of standardized polices based on a determination that the policy complies with a first standardized policy of the plurality of standardized policies and fails to comply with a second standardized policy of the plurality of standardized policies.
    Type: Grant
    Filed: December 23, 2020
    Date of Patent: May 23, 2023
    Assignee: Kyndryl, Inc.
    Inventors: Milton H. Hernandez, Anup Kalia, Brian Peterson, Vugranam C. Sreedhar, Sai Zeng
  • Patent number: 11625167
    Abstract: An embodiment of a semiconductor apparatus may include technology to determine if a threshold is met based on runtime memory usage, and enable foreground memory deduplication if the threshold is determined to be met. Other embodiments are disclosed and claimed.
    Type: Grant
    Filed: November 16, 2018
    Date of Patent: April 11, 2023
    Assignee: Intel Corporation
    Inventors: Dujian Wu, Yuping Yang, Donggui Yin
  • Patent number: 11609883
    Abstract: An apparatus in one embodiment comprises at least one processing device comprising a processor coupled to a memory. The processing device is configured to identify a dataset to be scanned to generate a compression estimate for that dataset, to designate a scan criterion to be utilized in the scan, and for each of a plurality of pages of the dataset, to scan the page, where scanning the page includes performing a computation on the page to obtain a page result, determining whether or not the page result satisfies the designated scan criterion, and responsive to the page result satisfying the designated scan criterion, updating a corresponding entry of a compression estimate table for the dataset. The processing device generates the compression estimate for the dataset based at least in part on contents of the compression estimate table.
    Type: Grant
    Filed: May 29, 2018
    Date of Patent: March 21, 2023
    Assignee: EMC IP Holding Company LLC
    Inventors: Anton Kucherov, David Meiri
  • Patent number: 11599507
    Abstract: A file system may include an object storage, a merged index, and a distributed database. When a file is stored in the file system, the file may be converted to an object and be stored in the object storage. The deduplication index of the file may be stored in the distributed database. The namespace metadata of the file may be stored in the merged index. The merged index generates namespace entries of the file when the file is created, deleted, and/or modified. A namespace entry may be associated with a specific file and may include a creation version and a deletion version. When a file is deleted or modified, instead of modifying the existing namespace entries, new entries associated with different versions and including different creation or deletion versions are created. The status of a file may be monitored by one or more entries associated with a file.
    Type: Grant
    Filed: December 9, 2021
    Date of Patent: March 7, 2023
    Assignee: Druva Inc.
    Inventors: Milind Borate, Alok Kumar, Aditya Agrawal, Anup Agarwal, Somesh Jain, Aditya Kelkar, Yogendra Acharya, Anand Apte, Amit Kulkarni
  • Patent number: 11593028
    Abstract: A method of operating a computing device for processing data is provided. The method includes (a) monitoring a set of performance characteristics of the processing of the data; (b) periodically calculating, using a predefined set of coefficients, a linear combination of the monitored set of performance characteristics to yield a combined metric; and (c) upon detecting that the combined metric exceeds a threshold while operating in a first processing mode, transitioning from operating in the first processing mode to operating in a second processing mode. (1) The second processing mode has a higher bandwidth than the first processing mode, and (2) processing of data in the second processing mode is less robust than processing of data in the first processing mode. An apparatus, system, and computer program product for performing a similar method are also provided.
    Type: Grant
    Filed: March 11, 2021
    Date of Patent: February 28, 2023
    Assignee: EMC IP Holding Company LLC
    Inventors: Vladimir Shveidel, Alexei Kabishcer
  • Patent number: 11573924
    Abstract: Methods and systems for storing and managing large numbers of small files. A data processing system includes clients that generate large numbers be stored on a storage device managed by a File System (FS). An Archive Server (AS) receives multiple files from the client, archives the files in larger archives, and sends the archives to the FS for storage. When requested to read a file, the AS retrieves the archive in which the file is stored, extracts the file and sends it to the requesting client. In other words, the AS communicates with the clients in individual file units, and with the storage device in archive units. The AS is typically constructed as an add-on layer on top of a conventional FS, which enables the FS to handle small files efficiently without modification.
    Type: Grant
    Filed: September 23, 2019
    Date of Patent: February 7, 2023
    Assignee: COGNYTE TECHNOLOGIES ISRAEL LTD.
    Inventor: Yossi Chai
  • Patent number: 11573928
    Abstract: Techniques for processing data may include: receiving a data block stored in a data set, wherein a hash value is derived from the data block; determining, in accordance with selection criteria, whether the hash value is included in a subset; responsive to determining the hash value is included in the subset, performing processing that updates a table in accordance with the hash value and the data set, and determining, in accordance with the information in the table, whether to perform deduplication processing for the data block to determine whether the data block is a duplicate of another stored data block. The table may include an entry for the hash value. The entry may include information identifying data sets referencing the data block and, for each of the data sets, may specify a reference count denoting a number of times the data set references the data block.
    Type: Grant
    Filed: March 13, 2020
    Date of Patent: February 7, 2023
    Assignee: EMC IP Holding Company LLC
    Inventors: Anton Kucherov, David Meiri
  • Patent number: 11570196
    Abstract: A method for determining duplication of a vulnerability may include a vulnerability extraction step of extracting vulnerability uniform resource locator (URL) addresses including the vulnerability from an analysis target server; a hash generation step of generating the URL hash value corresponding to the extracted vulnerability from the vulnerability URL address; and a duplication determination step of determining, when the URL hash value is present in the first comparison table, that the vulnerability is duplicated and excluding the corresponding vulnerability from vulnerability information.
    Type: Grant
    Filed: February 26, 2020
    Date of Patent: January 31, 2023
    Assignee: NAVER CLOUD CORPORATION
    Inventors: Bong Goo Kang, Min Seob Lee, Won Tae Jang, June Ahn, Jihwan Yoon
  • Patent number: 11561863
    Abstract: A method for enabling data set changes to be reverted to a prior point in time or state is disclosed. In one embodiment, such a method includes providing a data set comprising one or more data elements and a specified number of generations of the data elements. In certain embodiments, the data set is a partitioned data set extended (PDSE) data set, and the data elements are “members” within the PDSE data set. The method further includes tracking changes made by a job to data elements of the data set. The method further references, in a data structure (also referred to herein as a “cluster”) associated with the job, previous generations of the data elements changed by the job. In certain embodiments, the data structure is stored in the data set. A corresponding system and computer program product are also disclosed.
    Type: Grant
    Filed: August 20, 2015
    Date of Patent: January 24, 2023
    Assignee: International Business Machines Corporation
    Inventors: Trevor A. Geisler, David C. Reed, Thomas C. Reed, Max D. Smith
  • Patent number: 11539811
    Abstract: Systems, devices and methods for adaptive compression of stored information includes a memory management computing device programmed to monitor a size of a plurality of data structures stored in a data repository. The computing device compares the size of each of a plurality of data structures to a predetermined threshold. When a size of an uncompressed data structure meets the threshold, the memory management computing device calculates a value of a first compression parameter based on a value of a first parameter and a value of a second parameter of each data element of the uncompressed data structure, calculates a value of a second compression parameter based the value of the first parameter of each data element of the uncompressed data structure, generates a compressed data structure based on the value of the first compression parameter and the second compression parameter; and replaces, in the data repository, the uncompressed data structure with the compressed data structure.
    Type: Grant
    Filed: June 21, 2022
    Date of Patent: December 27, 2022
    Assignee: Chicago Mercantile Exchange Inc.
    Inventors: Fateen Sharaby, Sriram A. Raju Datla, Dhiraj Subhash Bawadhankar, John Charles Redfield, Justin Yeong-Juin Lee
  • Patent number: 11520744
    Abstract: Described is a system (and method) that intelligently distributes data within a clustered storage environment. To provide such a capability, the system may distribute backup files by considering a source of the data to be backed-up. In particular, the system may leverage the ability of front-end components such as a backup application to perform a granular data source identification of data. Such information may be propagated to back-end components such as a storage filesystem in the form of a data source identifier (e.g. placement tag). The data source identifiers may then be accessed by the clustered storage system to intelligently distribute backup files amongst a set of storage nodes forming a cluster. For example, backup files from the same data source may be stored on the same storage node to obtain the same deduplication efficiency as a single storage system.
    Type: Grant
    Filed: August 21, 2019
    Date of Patent: December 6, 2022
    Assignee: EMC IP Holding Company LLC
    Inventors: Abhishek Rajimwale, George Mathew, Murthy Mamidi, Donna Barry Lewis
  • Patent number: 11514054
    Abstract: Supervised partitioning is used to perform record matching. A request to identify matches between records is received. A graph representation that indicates similarities between the records is partitioned and an evaluation of the partitioning is performed according to a supervised machine learning technique to generate a confidence value in the partitioning. An indication of equivalent records according to the partitioning and the confidence value of the partitioning may be provided.
    Type: Grant
    Filed: September 27, 2018
    Date of Patent: November 29, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Andrew Borthwick, Robert Anthony Barton, Jr., Stephen Michael Ash, Russell Reas
  • Patent number: 11514025
    Abstract: Performing snapshot conscious internal file modification for network-attached storage is presented herein. A file system can comprise a first component configured to modify, during a service request, storage for a subset of data blocks of a file—the service request not being recognized by an external entity as a change of content of the file. Further, the file system can comprise a second component configured to prevent, based on the service request, a copy of the storage from being created for servicing of a snapshot—the snapshot comprising a point-in-time copy of the file system.
    Type: Grant
    Filed: August 19, 2019
    Date of Patent: November 29, 2022
    Assignee: EMC IP HOLDING COMPANY LLC
    Inventor: Ravi V. Batchu
  • Patent number: 11500841
    Abstract: Systems, computer-implemented methods, and computer program products that can facilitate encoding a tree data structure into a vector based on a set of constraints are provided. According to an embodiment, a system can comprise a memory that stores computer executable components and a processor that executes the computer executable components stored in the memory. The computer executable components can comprise a constraint former that can form a set of constraints based on a first tree data structure and a vector encoder that can encode the first tree data structure into a vector based on the set of constraints.
    Type: Grant
    Filed: January 4, 2019
    Date of Patent: November 15, 2022
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Achille Fokoue-Nkoutche, Maxwell Crouse, Michael Witbrock, Ryan A. Musa, Maria Chang
  • Patent number: 11474700
    Abstract: Technologies for compressing communications for accelerator devices are disclosed. An accelerator device may include a communication abstraction logic units to manage communication with one or more remote accelerator devices. The communication abstraction logic unit may receive communication to and from a kernel on the accelerator device. The communication abstraction logic unit may compress and decompress the communication without instruction from the corresponding kernel. The communication abstraction logic unit may choose when and how to compress communications based on telemetry of the accelerator device and the remote accelerator device.
    Type: Grant
    Filed: April 30, 2019
    Date of Patent: October 18, 2022
    Assignee: Intel Corporation
    Inventors: Susanne M. Balle, Evan Custodio, Francesc Guim Bernat
  • Patent number: 11468063
    Abstract: The subject technology provides information, corresponding to properties of a build side of a join operation, to a bloom filter. The subject technology, based at least in part on the information from the bloom filter, determines, during executing of a query plan, at least one property of the join operation to determine whether to switch an aggregation operator to a pass through mode, the at least one property comprising at least a reduction rate. The subject technology, switches, in response to the reduction rate being below a threshold value, the aggregation operator to the pass through mode during runtime of the query plan and, while the aggregation operator is in the pass through mode, an input stream of data goes through the aggregation operator without being analyzed and the input stream of data matches an output stream of data flowing out of the aggregation operator.
    Type: Grant
    Filed: April 16, 2021
    Date of Patent: October 11, 2022
    Assignee: Snowflake Inc.
    Inventors: Bowei Chen, Thierry Cruanes, Florian Andreas Funke, Allison Waingold Lee, Jiaqi Yan
  • Patent number: 11461269
    Abstract: A data management device includes a persistent storage and a processor. The persistent storage includes an object storage. The processor segments a file into file segments. The processor generates meta-data of the file segments. The processor stores a portion of the file segments in a data object of the object storage. The processor stores a portion of the meta-data of the file segments in a meta-data object of the object storage.
    Type: Grant
    Filed: July 21, 2017
    Date of Patent: October 4, 2022
    Assignee: EMC IP HOLDING COMPANY
    Inventors: Shuang Liang, Mahesh Kamat, Bhimsen Bhanjois
  • Patent number: 11429573
    Abstract: A data deduplication system includes a data deduplication subsystem coupled to each of a host system and a storage system. The data deduplication system receives data from the host system, generates a data deduplication identifier for the data, and determines whether the data deduplication identifier for the data is stored in a data deduplication database. In response to determining that the data deduplication identifier is not stored in the data deduplication database, the data deduplication system stores the data deduplication identifier for the data in the data deduplication database in association with a data counter for the data, and transmits the data to the storage system for storage. In response to determining that the data deduplication identifier is stored in the data deduplication database, the data deduplication system increments a data counter that is associated with the data deduplication identifier in the data deduplication database, and discards the data.
    Type: Grant
    Filed: October 16, 2019
    Date of Patent: August 30, 2022
    Assignee: Dell Products L.P.
    Inventors: Dharmesh M. Patel, Ravikanth Chaganti, Rizwan Ali
  • Patent number: 11429634
    Abstract: In some embodiments, an interface of a content management system manages synchronized content on storage systems. For example, the interface stores, on a metadata storage structure, records of metadata associated with blocks of data stored on a storage, the records including block identifiers that uniquely identify the blocks and timestamps associated with the blocks. The interface identifies a batch of storage operations associated with the blocks, including one or more delete operations. For each delete operation, the interface queries the metadata storage structure for a timestamp corresponding to a block of data associated with the delete operation, determines whether the delete operation creates a race condition between the delete operation and an add operation associated with the block of data, and rejects the delete operation when the delete operation creates the race condition or the timestamp corresponding to the block of data is newer than a predetermined period of time.
    Type: Grant
    Filed: December 29, 2017
    Date of Patent: August 30, 2022
    Assignee: Dropbox, Inc.
    Inventors: Nipunn Koorapati, Daniel Horn, Elmer Charles Jubb, IV
  • Patent number: 11429575
    Abstract: Example methods, apparatus, systems and articles of manufacture (e.g., physical storage media) to deduplicate common devices across multiple data sources are disclosed. An example system includes a comparison controller to identify a first device in a first data source and a second device in a second data source as a possible common device.
    Type: Grant
    Filed: July 10, 2020
    Date of Patent: August 30, 2022
    Assignee: THE NIELSEN COMPANY (US), LLC
    Inventors: Rachel Worth Olson, Michael Evan Anderson, Rishi Sriram, Margaret M. Orton, Fatemehossadat Miri, Samantha M. Mowrer, David J. Kurzynski, Molly Poppie
  • Patent number: 11423027
    Abstract: A system and method for a text search of a database, including converting a text search expression to a query plan and implementing the text search as the query plan on the database. The implementing of the text search includes a one-pass indexing as a single scan of an inverse index table associated with the database.
    Type: Grant
    Filed: January 29, 2016
    Date of Patent: August 23, 2022
    Assignee: MICRO FOCUS LLC
    Inventors: Qiming Chen, Meichun Hsu, Malu G. Castellanos
  • Patent number: 11416316
    Abstract: A first-to-second correlation engine determines correlations between first objects from a first object feed, and second objects from a second object storage, and generates first correlation messages indicative of the correlations for a first-to-second object direction and a second-to-first object direction. A second-to-first correlation engine determines respective correlations between the second objects from a second object feed and the first objects from a first object storage, and generates second correlation messages indicative of the respective correlations for the second-to-first object direction and the first-to-second object direction. A first-to-second correlation storage engine receives the first and second correlation messages for the first-to-second object direction and updates first-to-second correlation storage based on the received messages.
    Type: Grant
    Filed: October 15, 2020
    Date of Patent: August 16, 2022
    Assignee: AMADEUS S.A.S.
    Inventors: Serge Beuzit, Jean-Samuel Pasquali
  • Patent number: 11409766
    Abstract: Disclosed herein is the creation of probabilistic data structures for container reclamation. One method involves retrieving a segment object list of a data container and creating a probabilistic data structure. The segment object list comprises a plurality of segment objects, the data container comprises the plurality of segment objects and a plurality of data objects, and each segment object of the plurality of segment objects comprises a hash value determined by performing a hashing function on a corresponding data object of the plurality of data objects. The creating includes, for each segment object in the segment object list, identifying an element of a plurality of elements of the probabilistic data structure using a hash value of the each segment object and setting the element to indicate the segment object references a corresponding data object of the plurality of data objects.
    Type: Grant
    Filed: October 26, 2020
    Date of Patent: August 9, 2022
    Assignee: Veritas Technologies LLC
    Inventors: Yingsong Jia, Xin Wang, Guangbin Zhang
  • Patent number: 11403266
    Abstract: A method for deleting a row from a table in a database system comprises logically deleting the row in the first table in the database system by inserting a key of the row into a corresponding row of a dedicated table in the database system; querying the dedicated table during a query against the first table to identify the corresponding row in the dedicated table; and in response to identifying the corresponding row in the dedicated table, deleting the row from the first table and the corresponding row from the dedicated table as part of query processing during a subsequent query.
    Type: Grant
    Filed: June 4, 2019
    Date of Patent: August 2, 2022
    Assignee: International Business Machines Corporation
    Inventors: Andreas Brodt, Oliver Koeth, Daniel Martin, Knut Stolze
  • Patent number: 11403019
    Abstract: A method includes receiving a request to write a data block to a volume resident on a multi-tenant storage array, wherein the request is associated with a first tenant of the multi-tenant storage array, and determining whether the data block matches an existing data block on the multi-tenant storage array, wherein the existing block corresponds to a second tenant. In response to determining that the decrypted data block matches the existing data block: encrypting the existing data block with a shared volume encryption key; encrypting the shared volume encryption key with a first tenant encryption key and providing the shared volume encryption key encrypted with the first tenant encryption key to the first tenant; and encrypting the shared volume encryption key with a second tenant encryption key and providing the shared volume encryption key encrypted with the second tenant encryption key to the second tenant.
    Type: Grant
    Filed: October 26, 2018
    Date of Patent: August 2, 2022
    Assignee: Pure Storage, Inc.
    Inventors: Swapnil Chandrashekhar Nagle, Virendra Prakashaiah, Ronald Karr
  • Patent number: 11360690
    Abstract: There is provided a storage device that is connected to a computer and receives an UNMAP command to cancel a relationship between a logical address and a physical address provided to the computer, in response to data deletion on the computer. The storage device includes a control unit configured to make data stored in a physical address specified by the UNMAP command irreversible.
    Type: Grant
    Filed: August 23, 2019
    Date of Patent: June 14, 2022
    Assignee: HITACHI, LTD.
    Inventors: Hirotaka Nakagawa, Akihiro Hara
  • Patent number: 11360954
    Abstract: A method, computer program product, and computing system for receiving a candidate data portion; calculating a distance-preserving hash for the candidate data portion; and performing an entropy analysis on the distance-preserving hash to generate a hash entropy for the candidate data portion.
    Type: Grant
    Filed: August 3, 2020
    Date of Patent: June 14, 2022
    Assignee: EMC IP HOLDING COMPANY, LLC
    Inventors: Sorin Faibish, Philip Shilane, Ivan Basov, Istvan Gonczi, Philippe Armangau, Vamsi Vankamamidi
  • Patent number: 11354200
    Abstract: One embodiment provides a system which facilitates organization of data. During operation, the system receives data associated with a logical block address (LBA) to be written to a non-volatile memory. The system stores, in a data structure, a mapping of a first physical block address (PBA) corresponding to the LBA to a first status for the data, wherein the first status indicates data validity and recovery being enabled for the data. Responsive to receiving a command to delete the data, the system modifies the first status to indicate data invalidity and recovery being enabled for the data. Responsive to receiving a command to recover the previously deleted data, the system modifies the first status to indicate data validity and recovery being enabled for the data.
    Type: Grant
    Filed: June 17, 2020
    Date of Patent: June 7, 2022
    Assignee: Alibaba Group Holding Limited
    Inventor: Shu Li
  • Patent number: 11347690
    Abstract: A method includes retrieving, with a masker controller job, an object and an associated object ID from a masking bucket that is defined in storage, making a copy of the object, with a masker worker microservice, masking the copy of the object to create a masked object, transmitting the masked object to an object access microservice, with the object access microservice, transmitting the masked object to a deduplication microservice, with the deduplication microservice, deduplicating the masked object, and storing the masked object in the storage.
    Type: Grant
    Filed: May 20, 2020
    Date of Patent: May 31, 2022
    Assignee: EMC IP HOLDING COMPANY LLC
    Inventors: Kimberly R. Lu, Joseph S. Brandt, Philip N. Shilane
  • Patent number: 11347423
    Abstract: A method, computer program product, and computer system for identifying a plurality of blocks. At least one heuristic associated with at least a portion of the plurality of blocks may be determined. It may be determined whether at least the portion of the plurality of blocks is a candidate for deduplication based upon, at least in part, the at least one heuristic. At least the portion of the plurality of blocks may be deduplicated based upon, at least in part, the at least one heuristic.
    Type: Grant
    Filed: July 29, 2019
    Date of Patent: May 31, 2022
    Assignee: EMC IP HOLDING COMPANY, LLC
    Inventors: Ivan Basov, Sorin Faibish, Istvan Gonczi
  • Patent number: 11341106
    Abstract: A deduplicated storage system is provided according to certain embodiments that uses one or more mechanisms to update the deduplication database and remove records corresponding to data blocks that have been or will be erased from the secondary copies, without using or tracking reference counting values. Some embodiments described herein use a secondary table (for tracking archive file contents) and a bitmap to mark which primary records are present in the secondary table. In another embodiment, once the marking phase is completed, the deduplication system uses the marked-up bitmap to identify the corresponding records from the primary table that can be moved to another table for storing “zero-reference” data blocks. In other embodiments, the system will then traverse the “zero-reference” table and remove those primary data blocks from secondary storage devices.
    Type: Grant
    Filed: September 12, 2019
    Date of Patent: May 24, 2022
    Assignee: Commvault Systems, Inc.
    Inventors: Deepak Raghunath Attarde, Manoj Kumar Vijayan
  • Patent number: 11327935
    Abstract: Examples of an intelligent data quality application are defined. In an example, the system receives a data quality requirement from a user. The system obtains target data from a plurality of data sources. The system implements an artificial intelligence component sort the target data into a data cascade. The data cascade may include a plurality of attributes associated with the data quality requirement. The system may evaluate the data cascade to identify a data pattern model for each of the attributes. The system may implement a first cognitive learning operation to determine a mapping context from the data cascade and a conversion rule from the data pattern model. The system may establish a data harmonization model corresponding to the data quality requirement by performing a second cognitive learning operation. The system may generate a data cleansing result corresponding to the data quality requirement.
    Type: Grant
    Filed: December 23, 2019
    Date of Patent: May 10, 2022
    Assignee: ACCENTURE GLOBAL SOLUTIONS LIMITED
    Inventors: Sabrina Yamashita, Armando Martines Neto, Vivek Likhar, Acyr Da Luz
  • Patent number: 11321165
    Abstract: A method for log data sampling is disclosed. The method includes receiving logs of a computer system. A log comprises information regarding an operation of the computer system. The method also includes determining a sample of the logs by applying a set of sampling methods to the logs. The method further includes providing the sample of the logs as an input to an anomaly detection model for the computer system. The anomaly detection model identifies a fault in the operation of the computer system based on the input.
    Type: Grant
    Filed: September 22, 2020
    Date of Patent: May 3, 2022
    Assignee: International Business Machines Corporation
    Inventors: Xiaotong Liu, Jiayun Zhao, Anbang Xu, Rama Kalyani T. Akkiraju
  • Patent number: 11314693
    Abstract: A computer implemented system and method for automated estimation of relationships among a plurality of data elements. The approach includes processing elements of one or more data sets to establish linkage relations among the data records, and then extending the linkage relations based on one or more equivalence relations, stored as linkage data structures. The generated data structures are used for computationally simplifying the data sets by consolidating data records or removing redundancies, such as duplicates, and may be used to yield a compressed data representation or data structure.
    Type: Grant
    Filed: March 14, 2019
    Date of Patent: April 26, 2022
    Assignee: ROYAL BANK OF CANADA
    Inventors: Hisham Abu-Abed, Xiuzhan Guo, Joel Ian Tousignant-Barnes
  • Patent number: 11314705
    Abstract: A technique for managing deduplication performs partial-block matching opportunistically by leveraging information acquired during times when a storage system has available resources. The information identifies anchor blocks that are likely targets for partial-block matches, based on discovering that the anchor blocks belong to populations of blocks that have high similarity. When processing write requests, inline activities access anchor blocks that closely match newly arriving candidate blocks and perform partial-block deduplication against those anchor blocks.
    Type: Grant
    Filed: October 30, 2019
    Date of Patent: April 26, 2022
    Assignee: EMC IP Holding Company LLC
    Inventors: Ronen Gazit, Uri Shabi
  • Patent number: 11308127
    Abstract: For a given cross-data-store transaction request at a storage service, a coordinator transmits respective voting transition requests to a plurality of log-based transaction managers (LTMs) configured for the respective data stores to which writes are directed in the transaction. The LTMs transmit responses to the coordinator based on data-store-specific conflict detection performed using contents of the voting transition requests and respective data-store-specific state transition logs. The coordinator determines a termination status of the cross-data-store transaction based on the LTMs' responses, and provides an indication of the termination status to the LTMs.
    Type: Grant
    Filed: February 26, 2018
    Date of Patent: April 19, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Uphendra Bhalchandra Shevade, Gregory Rustin Rogers, Christopher Ian Hendrie
  • Patent number: 11301427
    Abstract: Deduplication, including inline deduplication, of data for a file system can be implemented and managed. A data management component (DMC) can control inline and post-process deduplication of data during write and read operations associated with memory. DMC can determine whether inline data deduplication is to be performed to remove a data chunk from a write operation to prevent the data chunk from being written to a data store based on a whether a hash associated with the data chunk matches a stored hash stored in a memory index and associated with a stored data chunk stored in a shadow store. If there is a match, DMC can perform a byte-by-byte comparison of the data chunk and stored data chunk to determine whether they match. If they match, DMC can perform inline data deduplication to remove the data chunk from the write operation.
    Type: Grant
    Filed: October 15, 2019
    Date of Patent: April 12, 2022
    Assignee: EMC IP HOLDING COMPANY LLC
    Inventors: Lachlan McIlroy, Robert Shelton
  • Patent number: 11301169
    Abstract: A multi-platform data storage system that facilitates sharing of containers including one or more virtual storage resources. The multi-platform data storage system can, for example, include a storage interface configured to enable access to a plurality of storage platforms that use different storage access and/or management protocols, the plurality of storage platforms storing data objects in physical data storage; and a storage mobility and management layer providing virtual management of virtual storage resources corresponding to one or more data objects stored in the plurality of storage platforms, the storage mobility and management layer including at least a transfer module coupled to at least one network and configured to transfer at least one of the data objects. The transfer module can transfer the at least one of the data objects between the multi-platform data storage system and another data storage system.
    Type: Grant
    Filed: February 3, 2020
    Date of Patent: April 12, 2022
    Assignee: Arrikto Inc.
    Inventors: Konstantinos Venetsanopoulos, Evangelos Koukis, Christos Stavrakakis, Ilias Tsitsimpis, Dimitrios Aragiorgis, Alexios Pyrgiotis
  • Patent number: 11295049
    Abstract: A method implemented by a data processing system for processing data items of a stream of data items, including: accessing a specification that represents the executable logic, wherein a state of the specification for a particular value of the key specifies one or more portions of the executable logic that are executable in that state; receiving, over an input device or port, data items of a stream of data; for a first one of the data items of the stream, identifying a first state of the specification for a value of the key associated with that first one of the data items; processing, by the data processing system, the first one of the data items according to one or more portions of executable logic that are represented in the specification as being associated with the first state.
    Type: Grant
    Filed: February 3, 2020
    Date of Patent: April 5, 2022
    Assignee: Ab Initio Technology LLC
    Inventors: Joel Gould, Scott Studer, Craig W. Stanfill
  • Patent number: 11288132
    Abstract: Described is a system for distributing multiple phases of a deduplication processing amongst of set of nodes. The system may perform a load-balancing in configurations where multiple generations of backup data are redirected to the same host node, and thus, require the host node to perform certain storage processes such as writing new backup data to its associated physical storage. Accordingly, the system may perform an initial (or first phase) processing on a first node that is selected based on resource usage or classification (e.g. metadata storing node). The system may then perform a subsequent (or second phase) processing on a second, or host node, that is selected based on the node already storing previous generations of the backup data. Accordingly, the system still redirects processing to a host node, but provides the ability to delegate certain deduplication operations to additional nodes.
    Type: Grant
    Filed: September 30, 2019
    Date of Patent: March 29, 2022
    Assignee: EMC IP Holding Company LLC
    Inventors: Abhishek Rajimwale, George Mathew
  • Patent number: 11269755
    Abstract: Systems and methods for monitoring one or more social media accounts of one or more users to process potentially relevant or important activity. The system can employ automated filtering methods to select from all social media activity the data that is most likely to be relevant for review. The systems and methods can be employed with user accounts or services not associated with social media.
    Type: Grant
    Filed: March 19, 2019
    Date of Patent: March 8, 2022
    Assignee: Humanity X Technologies
    Inventors: Jordan T. Bates, Bin Hong Lee, Kacie McCollum, Pat Pataranutaporn, Ram N. Polur
  • Patent number: 11263087
    Abstract: Methods and systems for serverless data deduplication are disclosed. A blob of data is received at a cloud services platform, where the blob of data includes incremental data. The blob of data is used to create an object in a first object store included in the cloud services platform. A function as a service (FaaS) function is triggered when the object is created. The FaaS function deduplicates the object to generate a deduplicated object. The deduplicated object is stored in a second object store included in the cloud services platform.
    Type: Grant
    Filed: July 5, 2018
    Date of Patent: March 1, 2022
    Assignee: EMC IP HOLDING COMPANY LLC
    Inventors: Assaf Natanzon, Saar Cohen
  • Patent number: 11256746
    Abstract: A method and apparatus for a graph database instance (GDI) maintaining a secondary index, that indexes data from a sparse data map storing graph application data, within a sparse data map dedicated to the secondary index. The GDI formulates row-keys, for the secondary index map, by hashing the values of key/value pairs stored in rows of a map storing application data. The GDI stores for each formulated row-key, in the row of the secondary index that is indexed by the formulated row-key, references to rows of the map storing application data that match the key/value pair on which formulation of the row-key was based. The row-keys into the secondary index map may incorporate bucket identifiers, which, for each key/value pair, allows the GDI to spread the references to graph elements that match the key/value pair among a set number of “buckets” for the key/value pair within the secondary index map.
    Type: Grant
    Filed: April 21, 2017
    Date of Patent: February 22, 2022
    Assignee: Oracle International Corporation
    Inventors: Zhe Wu, Gabriela Montiel Moreno, Jiao Tao, Jayanta Banerjee