Data Cleansing, Data Scrubbing, And Deleting Duplicates Patents (Class 707/692)
  • Patent number: 12074953
    Abstract: The present disclosure relates to generating, updating, modifying, and otherwise managing configurations for virtual services on a cloud computing system. The present disclosure provides example implementations of a configuration management system and configuration handlers on respective server nodes that receive and process requests for modifying one or more configurations that manage operation of virtual services on the cloud. Systems described herein involve leveraging a hierarchical model of configuration characteristics to facilitate both large and small scale modifications. Moreover, the systems described herein leverage a persistent store on server nodes to identify how to update a current base configuration and sub-version as well as synchronize modifications across a set of server nodes.
    Type: Grant
    Filed: September 23, 2021
    Date of Patent: August 27, 2024
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Sameer Kumar Patro, Aritra Basu, Raghavendra Subhash
  • Patent number: 12061581
    Abstract: Example implementations relate to metadata operations in a storage system. An example includes generating, by a storage controller of a deduplication storage system, a candidate list of container indexes for matching operations of a received data segment, each container index in the candidate list having an associated match cost; identifying, by the storage controller, a journal group associated with a first container index listed in the candidate list; reducing, by the storage controller, a match cost associated with the first container index in response to a determination that the identified journal group is in a modified state; and performing, by the storage controller, the matching operations of the received data segment based at least on the reduced match cost of the first container index.
    Type: Grant
    Filed: July 26, 2022
    Date of Patent: August 13, 2024
    Assignee: Hewlett Packard Enterprise Development LP
    Inventors: Aman Sahil, Richard Phillip Mayo
  • Patent number: 12050879
    Abstract: A device may generate first scores for sentences of text based on a cumulative frequency of words in each sentence, may generate second scores for the sentences based on a cumulative frequency of domain entities in each sentence, and may generate third scores for the sentences based on a sentiment analysis of each sentence. The device may generate a summary of the text, may filter the sentences to extract a first set of sentences, may filter the sentences to extract a second set of sentences, and may filter the sentences to extract a third set of sentences. The device may identify and assign weights to a first group of sentences, a second group of sentences, and a third group of sentences, may generate a ranked list of sentences based on the weighted first group, second group, and third group, and may perform actions based on the final summary.
    Type: Grant
    Filed: May 24, 2022
    Date of Patent: July 30, 2024
    Assignee: Verizon Patent and Licensing Inc.
    Inventors: Prakash Ranganathan, Miruna Jayakrishnasamy
  • Patent number: 12050790
    Abstract: Aspects of the present disclosure configure a memory sub-system processor to manage memory operations with repeating data patterns. The processor receives a request to write a block of data comprising a plurality of portions to a set of memory components and determines whether a pattern of data repeats across the plurality of portions of the block of data. In response to determining that the pattern of data repeats across the plurality of portions, the processor stores a representation of the pattern of data in a mapping table and discards the block of data to prevent storing the block of data on the set of memory components.
    Type: Grant
    Filed: August 16, 2022
    Date of Patent: July 30, 2024
    Assignee: Micron Technology, Inc.
    Inventor: Anoop Achuthan Rajendrababu
  • Patent number: 12045211
    Abstract: One example method includes collaborative deduplication. A deduplication engine implemented at a cloud level collaborates or coordinates with an extension engine of the deduplication at an edge node. This allows data ingested at a node to be collaboratively deduplicated prior to transfer to the cloud and after transfer to the cloud.
    Type: Grant
    Filed: October 27, 2020
    Date of Patent: July 23, 2024
    Assignee: EMC IP HOLDING COMPANY LLC
    Inventors: Mohamed Sohail, Karim Fathy, Robert A. Lincourt
  • Patent number: 12032534
    Abstract: A method and system is used in managing deduplication of data in storage systems. A first digest for a deduplication candidate is received. At least one stream associated with the deduplication candidate is detected. At least one neighboring digest segment of a first loaded digest segment associated with the at least one stream is loaded. Whether the digest is located in the at least one neighboring digest segment is determined. If the digest is not located in the at least one neighboring digest segment, the digest is processed.
    Type: Grant
    Filed: August 2, 2019
    Date of Patent: July 9, 2024
    Assignee: EMC IP Holding Company LLC
    Inventors: Nickolay Dalmatov, Richard Ruef, Kurt Everson
  • Patent number: 12026386
    Abstract: A method for differential compression includes receiving input data blocks that are selected for compression. For each input data block, the input data block is divided into at least two segments. For each of the at least two segments, a similarity degree between the respective segment and each of the data blocks excluding the respective data block is computed. For each of the at least two segments, the data block which has a biggest similarity degree with the respective segment among the data blocks excluding the respective data block is selected as an optimal reference data block for the respective segment. The differential compression is applied to the input data block and optimal reference blocks in response to determining a differential compression that is to be applied based on the similarity degree between the segments of the input data block and the corresponding optimal reference blocks.
    Type: Grant
    Filed: September 23, 2022
    Date of Patent: July 2, 2024
    Assignee: HUAWEI TECHNOLOGIES CO., LTD.
    Inventor: Assaf Natanzon
  • Patent number: 12001685
    Abstract: A plurality of data stripes and one or more parity stripes are generated using a plurality of data chunks stored in a write-ahead log based on an erasure coding configuration. The plurality of data stripes and the one or more parity stripes are stored on corresponding different storage devices. The plurality of data stripes and the one or more parity stripes are associated together under a data protection grouping container.
    Type: Grant
    Filed: March 31, 2022
    Date of Patent: June 4, 2024
    Assignee: Cohesity, Inc.
    Inventors: Apurv Gupta, Akshat Agarwal
  • Patent number: 11995467
    Abstract: Systems, devices, and methods are provided for validation, deletion, and/or recovery of resources in a service environment. A machine (e.g., server) may receive a request to identify or discover a list of resources that are unused in a service environment. A machine (e.g., server) may receive a request to delete one or more resources in a service environment. In at least one embodiment, deletion of a resource involves a two-stage process where the resource is recoverably deleted in a first stage (e.g., by deactivating or disabling the resource) such that the resource can be recovered prior to a predetermined time period by reactivating or re-enabling the resource and, in a second stage, the resource is unrecoverably deleted.
    Type: Grant
    Filed: July 14, 2021
    Date of Patent: May 28, 2024
    Assignee: Amazon Technologies, Inc.
    Inventors: Suresh Prakash Goacher, Arun Anilkumar, Nishit Nihal Vas
  • Patent number: 11977527
    Abstract: In certain embodiments, machine learning and lineage data may be used to manage data. In some embodiments, a computing system may use lineage data to identify two datasets that may be related. The computing system may determine that a user has access to a derivative dataset but does not have access to an original dataset that was used to create the derivative dataset. In response, the computing system may use a machine learning model to generate a similarity score indicating a level of similarity between the original dataset and the derivative dataset. If the similarity score satisfies a threshold score, the computing system may modify access rights of the user so that the user is unable to access a portion of the data in the derivative dataset.
    Type: Grant
    Filed: January 3, 2022
    Date of Patent: May 7, 2024
    Assignee: Capital One Services, LLC
    Inventors: William Ye, Jon Stofer, Thomas J. O'Connor, Jose Moreno
  • Patent number: 11966630
    Abstract: A data storage device includes a memory device and a controller coupled to the memory device. The controller is configured to segment a key to physical (K2P) table into two or more segments, wherein each segment of the two or more segments corresponds to a caching priority of key value (KV) pair data, organize the K2P table by storing and relocating one or more K2P table entries into a respective segment of the two or more segments, wherein the storing and relocating comprises moving a K2P table entry based on the caching priority of the KV pair data into the respective segment having the caching priority, and utilize the K2P table to manage KV pair data stored in the memory device, wherein utilizing the K2P table comprises applying a same management operation, such as prefetching, to each K2P table entry of a same segment.
    Type: Grant
    Filed: June 27, 2022
    Date of Patent: April 23, 2024
    Assignee: Western Digital Technologies, Inc.
    Inventors: Ran Zamir, Alexander Bazarsky, David Avraham
  • Patent number: 11954331
    Abstract: A computer-implemented method enables workload scheduling in a storage system for optimized deduplication. The method includes determining dynamic correlations of deduplications between workload processes in a prior time window. Workload processes include one or more tasks with defined execution timing parameters. The method further includes determining deduplication ratios based on the correlations of the deduplications between the workload processes. The method further includes scheduling multiple workload processes based on a highest determined deduplication ratio of the determined deduplication ratios.
    Type: Grant
    Filed: October 7, 2021
    Date of Patent: April 9, 2024
    Assignee: International Business Machines Corporation
    Inventors: Miles Mulholland, Anuj Chandra, Kirsty G. Rodwell, Jorden Luke Allcock
  • Patent number: 11949751
    Abstract: The present disclosure relates to restricting electronic activities from being linked with record objects. According to at least one aspect of the disclosure, a method can include accessing, by one or more processors, a plurality of electronic activities, accessing a plurality of record objects of one or more systems of record, identifying an electronic activity of the plurality of electronic activities to match to one or more record objects, determining a data source provider associated with providing access to the electronic activity, and identifying a system of record corresponding to the determined data source provider. The system of record can include a plurality of candidate record objects to which to match the electronic activity. The method can include restricting the electronic activity from being linked with the at least one record object.
    Type: Grant
    Filed: January 23, 2023
    Date of Patent: April 2, 2024
    Inventors: Oleg Rogynskyy, Tetiana Lutsaievska, John Wulf, Sathya Hariesh Prakash
  • Patent number: 11934346
    Abstract: A cloud computing infrastructure hosts a web service with customer accounts. In a customer account, files of the customer account are listed in an index. Files indicated in the index are arranged in groups, with files in each group being scanned using scanning serverless functions in the customer account. The files in the customer account include a compressed tar archive of a software container. Member files of a compressed tar archive in a customer account are randomly-accessed by way of locators that indicate a tar offset, a logical offset, and a decompressor state for a corresponding member file. A member file is accessed by seeking to the tar offset in the compressed tar archive, restoring a decompressor to the decompressor state, decompressing the compressed tar archive using the decompressor, and moving to the logical offset in the decompressed data.
    Type: Grant
    Filed: October 17, 2022
    Date of Patent: March 19, 2024
    Assignee: Trend Micro Incorporated
    Inventor: Brendan M. Johnson
  • Patent number: 11936931
    Abstract: Methods, apparatus, systems and articles of manufacture to perform media device asset qualification are disclosed. An example apparatus includes at least one memory, and at least one processor to execute instructions to at least identify a first set of candidate media device assets for disqualification, the candidate media device assets including A) a signature and B) a media identifier that identifies media, generate a hash table using a second set of the candidate media device assets, determine one or more counts of matches between C) a first signature and a first media identifier of a first candidate media device asset of the second set and D) respective signatures and media identifiers of multiple ones of the second set using the hash table, the multiple ones of the second set not including the first candidate media device asset, and load the first signature into a reference database as a reference signature.
    Type: Grant
    Filed: October 17, 2022
    Date of Patent: March 19, 2024
    Assignee: The Nielsen Company (US), LLC
    Inventors: Daniel Nelson, James Petro, Albert T. Borawski
  • Patent number: 11914554
    Abstract: Methods and systems for improving data back-up, recovery, and search across different cloud-based applications, services, and platforms are described. A data management and storage system may direct compute and storage resources within a customer's cloud-based data storage account to back-up and restore data while the customer retains full control of their data. The data management and storage system may direct the compute and storage resources within the customer's cloud-based data storage account to generate and store secondary layers that are used for generating search indexes, to generate and store shared space layers and user specific layers to facilitate the deduplication of email attachments and text blocks, to perform a controlled restoration of email snapshots such that sensitive information (e.g., restricted keywords) located within stored snapshots remains protected, and to detect and preserve emails that were received or transmitted and then deleted between two consecutive snapshots.
    Type: Grant
    Filed: January 30, 2023
    Date of Patent: February 27, 2024
    Assignee: Rubrik, Inc.
    Inventors: Noel Moldvai, Jihang Lim
  • Patent number: 11907133
    Abstract: Standardized address generation from address substrings includes receiving an address string for a place-of-interest, one-to-many mapping at least one of a plurality of address substrings of the address string to respective address components, concatenating the address substrings using a template that specifies an order of concatenating the address substrings, and making the concatenated address substrings available for further use.
    Type: Grant
    Filed: July 29, 2022
    Date of Patent: February 20, 2024
    Assignee: SafeGraph, Inc.
    Inventor: Vera Sazonova
  • Patent number: 11893373
    Abstract: Techniques are disclosed for deploying functions in a cloud computing environment. Parameters are annotated in a plurality of Helm charts with a predetermined token. Duplicated values in the Helm charts are identified and the predetermined token is reused for the duplicated values. Schema files from the plurality of Helm charts are parsed to extract the predetermined tokens. Input data are received as values for the predetermined tokens. The function is deployed in the cloud computing environment using the values for the predetermined tokens as parameters in the Helm charts.
    Type: Grant
    Filed: January 28, 2022
    Date of Patent: February 6, 2024
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Frank John D'Innocenzo, Kam Yee Lee
  • Patent number: 11886397
    Abstract: Provided are methods and systems for determining multi-faceted trust scores for data. A method may commence with receiving data and determining a plurality of metadata items associated with the data. The method may continue with determining one or more facets associated with each of the plurality of metadata items. The method may further include determining a parameter and a weight associated with each of the one or more facets. Upon determining the parameter and the weight, a trust score associated with each of the plurality of metadata items may be calculated based on the parameter and the weight associated with each of the one or more facets. The method may further include calculating a multi-faceted trust score of the data based on the trust score of each of the plurality of metadata items.
    Type: Grant
    Filed: February 19, 2020
    Date of Patent: January 30, 2024
    Assignee: ASG Technologies Group, Inc.
    Inventors: Jean-Philippe Moresmau, Marcus MacNeill
  • Patent number: 11888936
    Abstract: A method for providing a proxy redirect to facilitate a storage and a retrieval of an object is disclosed. The method includes receiving a mapping of a user to a logical container that stores the object and to a storage provider that stores the logical container; receiving a key corresponding to the logical container and associated with the user; storing the mapping and the key in a database; generating, for the user, an application protocol that redirects to a pre-signed web address based on the stored mapping and the stored key; and transmitting, via a communication interface, the application protocol to the one user. The method further includes the user using the application protocol to directly access the storage provider and retrieve the object.
    Type: Grant
    Filed: July 1, 2020
    Date of Patent: January 30, 2024
    Assignee: JPMORGAN CHASE BANK, N.A.
    Inventor: Zachariah Antonas
  • Patent number: 11853326
    Abstract: A technology for retrieving data from a database. The technology includes receiving a search query specifying a target attribute and a target attribute value, accessing an index to determine one or more target files in which the target attribute value appears, the index including a plurality of attribute values, and for each of the attribute values, one or more files in which the attribute value appears, and retrieving data from the one or more target files.
    Type: Grant
    Filed: October 14, 2021
    Date of Patent: December 26, 2023
    Assignee: Google LLC
    Inventors: Hossein Ahmadi, Guang Cheng, Yannis Sismanis, Huong Thi Thu Phan, Shiyu Xie, Leo Chen, Zewen Zhang, Jing Jing Long, Amir Hossein Hormati
  • Patent number: 11836175
    Abstract: Semantic search techniques via focused summarizations are described. For example, a search query is received for a text-based content item in a data set comprising a plurality of text-based content items. A first feature vector representative of the search query is obtained. A respective semantic similarity score is determined between the first feature vector and each of a plurality of second feature vectors. Each of the second feature vectors is representative of a machine-generated summarization of a respective text-based content item. The machine-generated summarization comprises a plurality of multi-word fragments that are selected from the respective text-based content item via a transformer-based machine learning model. A search result is provided responsive to the search query.
    Type: Grant
    Filed: June 29, 2022
    Date of Patent: December 5, 2023
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Itzik Malkiel, Noam Koenigstein, Oren Barkan, Jonathan Ephrath, Yonathan Weill, Nir Nice
  • Patent number: 11797486
    Abstract: A device configured to identify a file in a network device, to generate a first set of block hash codes for data blocks for a first instance of the file, and to generate a second set of block hash codes for data blocks for a second instance of the file. The device is further configured to determine the first set of block hash codes matches the second set of block hash codes and to generate an entry in a file list for the instances of the file. The device is further configured to count the number of entries that are associated with the file and to determine the number of entries is greater than the redundancy threshold value. The device is further configured to delete one or more instances of the file in response to determining that the number of entries is greater than the redundancy threshold value.
    Type: Grant
    Filed: January 3, 2022
    Date of Patent: October 24, 2023
    Assignee: Bank of America Corporation
    Inventors: Pratap Dande, Gilberto R. Dos Santos, Jayabalaji Murugan, Murali M. Atyam, Manoj Bohra
  • Patent number: 11797220
    Abstract: Data is ingested from a source system including by storing a plurality of data chunks in one or more chunk files and storing corresponding chunk identifiers associated with the plurality of data chunks in a first data structure. After data ingestion is complete, one or more duplicate data chunks that were stored during the data ingestion are determined and a second data structure is updated to include one or more entries corresponding to one or more determined duplicate data chunks.
    Type: Grant
    Filed: August 20, 2021
    Date of Patent: October 24, 2023
    Assignee: Cohesity, Inc.
    Inventors: Zhihuan Qiu, Sachin Jain, Anubhav Gupta, Apurv Gupta, Mohit Aron
  • Patent number: 11782878
    Abstract: A deduplicated storage system storing objects receives a search term. Storage includes metadata and segments into which the objects have been split and deduplicated. The metadata includes fingerprint sequences according to which the segments should be assembled. A partial match is found when a prefix of the term is found at an end of a segment or a suffix is found at a beginning of the segment. A fingerprint of the segment having the partial match is recorded. A first sequence of fingerprints associated with a first object is read to check whether any fingerprints in the first sequence have been recorded. When a fingerprint in the sequence has been recorded, a check of a next fingerprint in the sequence is made to see if it has been recorded as having the partial match. If the next fingerprint has been recorded, the first object is reported as having the term.
    Type: Grant
    Filed: December 14, 2021
    Date of Patent: October 10, 2023
    Assignee: EMC IP Holding Company LLC
    Inventor: Philip Shilane
  • Patent number: 11748014
    Abstract: Host computers running applications that store data on a block-based storage system such as a SAN provide hints that differentiate IO data based on which application generated the IO. The hints may include tags that are associated with IO commands sent to the block-based storage system. Each host application is associated with a unique identifier that is placed in the tag. Application name-to-identifier mappings may be sent from the hosts to the block-based storage system. Per-identifier/application deduplication statistics are maintained by the block-based storage system and shared with other block-based storage system. Deduplication is disabled or de-emphasized for IO data generated by applications with statistically low deduplication ratios.
    Type: Grant
    Filed: February 14, 2020
    Date of Patent: September 5, 2023
    Assignee: DELL PRODUCTS L.P.
    Inventors: Kurumurthy Gokam, Md Haris Iqbal, Prasad Paple, Kundan Kumar
  • Patent number: 11734239
    Abstract: A record processing and storage system is operable to receive a plurality of labeled row data from a data source. Each labeled row data of the plurality of labeled row data includes at least one record and a corresponding row number of a plurality of row numbers. A plurality of pages are generated from records included in the labeled row data. The plurality of pages are stored via a page storage system. A plurality of page metadata corresponding to the plurality of pages is generated, where each of the plurality of page metadata is generated based on at least corresponding one row number of at least one labeled row data with records included in a corresponding one of the plurality of pages. Deduplication of duplicated records included the plurality of pages is facilitated based on the plurality of page metadata.
    Type: Grant
    Filed: March 15, 2022
    Date of Patent: August 22, 2023
    Assignee: Ocient Holdings LLC
    Inventors: George Kondiles, Ravi V. Khadiwala, Donald Scott Clark, Anna Veselova
  • Patent number: 11704036
    Abstract: Systems and method for implementing deduplication process based on performance analyses. The system may include a processing device to determine a first performance metric associated with retrieving a second stored data block that is within a specified range of a duplicate of the first data block and a second performance metric associated with retrieving a hash value corresponding to the second stored data block. The processing device further to retrieve the second stored data block within a specified range of the duplicate of the first data block in response to the first performance metric not exceeding the second performance metric.
    Type: Grant
    Filed: November 16, 2018
    Date of Patent: July 18, 2023
    Assignee: PURE STORAGE, INC.
    Inventors: John Colgrove, Ronald Karr, Ethan L. Miller
  • Patent number: 11681660
    Abstract: Embodiments presented herein describe techniques for deduplicating chunks of data across multiple clusters. A process executing in a storage system identifies one or more chunks in an incoming stream of data. For each chunk, a first fingerprint corresponding to the chunk is generated. The process determines whether the first fingerprint matches a second fingerprint listed in a corresponding entry in a deduplication map. Each entry of the deduplication map corresponds to a chunk stored in a location in one of the storage clusters. Upon determining that the first fingerprint matches the second fingerprint, the process writes, to a local persistent storage, a pointer referencing the location in that storage cluster.
    Type: Grant
    Filed: January 22, 2021
    Date of Patent: June 20, 2023
    Assignee: Cohesity, Inc.
    Inventor: Ganesha Shanmuganathan
  • Patent number: 11675741
    Abstract: Methods and systems for improving data back-up, recovery, and search across different cloud-based applications, services, and platforms are described. A data management and storage system may direct compute and storage resources within a customer's cloud-based data storage account to back-up and restore data while the customer retains full control of their data. The data management and storage system may direct the compute and storage resources within the customer's cloud-based data storage account to generate and store secondary layers that are used for generating search indexes, to generate and store shared space layers and user specific layers to facilitate the deduplication of email attachments and text blocks, to perform a controlled restoration of email snapshots such that sensitive information (e.g., restricted keywords) located within stored snapshots remains protected, and to detect and preserve emails that were received or transmitted and then deleted between two consecutive snapshots.
    Type: Grant
    Filed: July 8, 2021
    Date of Patent: June 13, 2023
    Assignee: Rubrik, Inc.
    Inventors: Noel Moldvai, Jihang Lim
  • Patent number: 11663195
    Abstract: In one example, a method includes receiving, at a cloud storage site, chunks that each take the form of a hash of a combination that includes two or more salts and a file object, and one of the salts is a retention salt shared by the chunks, monitoring a time period associated with the retention salt, when the time period has expired, removing the chunks that include the retention salt, and depositing the removed chunks in a deleted items cloud store.
    Type: Grant
    Filed: October 28, 2021
    Date of Patent: May 30, 2023
    Assignee: EMC IP HOLDING COMPANY LLC
    Inventor: Peter Marelas
  • Patent number: 11665377
    Abstract: Aspects of the subject disclosure may include, for example, a device having a processing system including a processor; and a memory that stores executable instructions that, when executed by the processing system, facilitate performance of operations including receiving encrypted hypertext transport protocol (HTTPS) traffic including media content; separating the HTTPS traffic into audio segments and video segments; calculating a size for each audio segment in the HTTPS traffic; maintaining a sliding window of a plurality of sizes of consecutive audio segments to form a fingerprint; and identifying the media content by matching the fingerprint with a reference in a catalog. Other embodiments are disclosed.
    Type: Grant
    Filed: April 23, 2021
    Date of Patent: May 30, 2023
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Yuan Ding, Natalia Schenck, Daniel Sanchez, Umut Akyol, Lawrence E. Bakst, Vinay Sharma
  • Patent number: 11663194
    Abstract: In one example, a method includes receiving, at a cloud storage site, chunks that each take the form of a hash of a combination that includes two or more salts and a file object, and one of the salts is a retention salt shared by the chunks, monitoring a time period associated with the retention salt, when the time period has expired, removing the chunks that include the retention salt, and depositing the removed chunks in a deleted items cloud store.
    Type: Grant
    Filed: October 28, 2021
    Date of Patent: May 30, 2023
    Assignee: EMC IP HOLDING COMPANY LLC
    Inventor: Peter Marelas
  • Patent number: 11663178
    Abstract: A deduplicated storage system is provided according to certain embodiments that uses one or more mechanisms to assign the deduplication databases based on the type of the client device and automatically create a new deduplication database when critical thresholds are reached. In other embodiments, deduplication databases are further split into multiple database partitions. Based on a data block distribution policy, each data block is then further assigned to a particular database partition within the deduplication database to further improve efficiency and speed of the deduplication process.
    Type: Grant
    Filed: October 23, 2020
    Date of Patent: May 30, 2023
    Assignee: Commvault Systems, Inc.
    Inventor: Prasad Nara
  • Patent number: 11663196
    Abstract: In one example, a method includes receiving, at a cloud storage site, chunks that each take the form of a hash of a combination that includes two or more salts and a file object, and one of the salts is a retention salt shared by the chunks, monitoring a time period associated with the retention salt, when the time period has expired, removing the chunks that include the retention salt, and depositing the removed chunks in a deleted items cloud store.
    Type: Grant
    Filed: October 28, 2021
    Date of Patent: May 30, 2023
    Assignee: EMC IP HOLDING COMPANY LLC
    Inventor: Peter Marelas
  • Patent number: 11659006
    Abstract: An assessment component that facilitates assessment and enforcement of policies within a computer environment can comprise a compliance component that determines whether a policy, that defines one or more requirements associated with usage of one or more enterprise components of an enterprise computing system, is in compliance with a plurality of standardized policies that govern operation of the one or more enterprise components of the enterprise computing system. The assessment component can also comprise a policy optimization component that determines one or more changes to the policy that achieve the compliance with the plurality of standardized polices based on a determination that the policy complies with a first standardized policy of the plurality of standardized policies and fails to comply with a second standardized policy of the plurality of standardized policies.
    Type: Grant
    Filed: December 23, 2020
    Date of Patent: May 23, 2023
    Assignee: Kyndryl, Inc.
    Inventors: Milton H. Hernandez, Anup Kalia, Brian Peterson, Vugranam C. Sreedhar, Sai Zeng
  • Patent number: 11625167
    Abstract: An embodiment of a semiconductor apparatus may include technology to determine if a threshold is met based on runtime memory usage, and enable foreground memory deduplication if the threshold is determined to be met. Other embodiments are disclosed and claimed.
    Type: Grant
    Filed: November 16, 2018
    Date of Patent: April 11, 2023
    Assignee: Intel Corporation
    Inventors: Dujian Wu, Yuping Yang, Donggui Yin
  • Patent number: 11609883
    Abstract: An apparatus in one embodiment comprises at least one processing device comprising a processor coupled to a memory. The processing device is configured to identify a dataset to be scanned to generate a compression estimate for that dataset, to designate a scan criterion to be utilized in the scan, and for each of a plurality of pages of the dataset, to scan the page, where scanning the page includes performing a computation on the page to obtain a page result, determining whether or not the page result satisfies the designated scan criterion, and responsive to the page result satisfying the designated scan criterion, updating a corresponding entry of a compression estimate table for the dataset. The processing device generates the compression estimate for the dataset based at least in part on contents of the compression estimate table.
    Type: Grant
    Filed: May 29, 2018
    Date of Patent: March 21, 2023
    Assignee: EMC IP Holding Company LLC
    Inventors: Anton Kucherov, David Meiri
  • Patent number: 11599507
    Abstract: A file system may include an object storage, a merged index, and a distributed database. When a file is stored in the file system, the file may be converted to an object and be stored in the object storage. The deduplication index of the file may be stored in the distributed database. The namespace metadata of the file may be stored in the merged index. The merged index generates namespace entries of the file when the file is created, deleted, and/or modified. A namespace entry may be associated with a specific file and may include a creation version and a deletion version. When a file is deleted or modified, instead of modifying the existing namespace entries, new entries associated with different versions and including different creation or deletion versions are created. The status of a file may be monitored by one or more entries associated with a file.
    Type: Grant
    Filed: December 9, 2021
    Date of Patent: March 7, 2023
    Assignee: Druva Inc.
    Inventors: Milind Borate, Alok Kumar, Aditya Agrawal, Anup Agarwal, Somesh Jain, Aditya Kelkar, Yogendra Acharya, Anand Apte, Amit Kulkarni
  • Patent number: 11593028
    Abstract: A method of operating a computing device for processing data is provided. The method includes (a) monitoring a set of performance characteristics of the processing of the data; (b) periodically calculating, using a predefined set of coefficients, a linear combination of the monitored set of performance characteristics to yield a combined metric; and (c) upon detecting that the combined metric exceeds a threshold while operating in a first processing mode, transitioning from operating in the first processing mode to operating in a second processing mode. (1) The second processing mode has a higher bandwidth than the first processing mode, and (2) processing of data in the second processing mode is less robust than processing of data in the first processing mode. An apparatus, system, and computer program product for performing a similar method are also provided.
    Type: Grant
    Filed: March 11, 2021
    Date of Patent: February 28, 2023
    Assignee: EMC IP Holding Company LLC
    Inventors: Vladimir Shveidel, Alexei Kabishcer
  • Patent number: 11573924
    Abstract: Methods and systems for storing and managing large numbers of small files. A data processing system includes clients that generate large numbers be stored on a storage device managed by a File System (FS). An Archive Server (AS) receives multiple files from the client, archives the files in larger archives, and sends the archives to the FS for storage. When requested to read a file, the AS retrieves the archive in which the file is stored, extracts the file and sends it to the requesting client. In other words, the AS communicates with the clients in individual file units, and with the storage device in archive units. The AS is typically constructed as an add-on layer on top of a conventional FS, which enables the FS to handle small files efficiently without modification.
    Type: Grant
    Filed: September 23, 2019
    Date of Patent: February 7, 2023
    Assignee: COGNYTE TECHNOLOGIES ISRAEL LTD.
    Inventor: Yossi Chai
  • Patent number: 11573928
    Abstract: Techniques for processing data may include: receiving a data block stored in a data set, wherein a hash value is derived from the data block; determining, in accordance with selection criteria, whether the hash value is included in a subset; responsive to determining the hash value is included in the subset, performing processing that updates a table in accordance with the hash value and the data set, and determining, in accordance with the information in the table, whether to perform deduplication processing for the data block to determine whether the data block is a duplicate of another stored data block. The table may include an entry for the hash value. The entry may include information identifying data sets referencing the data block and, for each of the data sets, may specify a reference count denoting a number of times the data set references the data block.
    Type: Grant
    Filed: March 13, 2020
    Date of Patent: February 7, 2023
    Assignee: EMC IP Holding Company LLC
    Inventors: Anton Kucherov, David Meiri
  • Patent number: 11570196
    Abstract: A method for determining duplication of a vulnerability may include a vulnerability extraction step of extracting vulnerability uniform resource locator (URL) addresses including the vulnerability from an analysis target server; a hash generation step of generating the URL hash value corresponding to the extracted vulnerability from the vulnerability URL address; and a duplication determination step of determining, when the URL hash value is present in the first comparison table, that the vulnerability is duplicated and excluding the corresponding vulnerability from vulnerability information.
    Type: Grant
    Filed: February 26, 2020
    Date of Patent: January 31, 2023
    Assignee: NAVER CLOUD CORPORATION
    Inventors: Bong Goo Kang, Min Seob Lee, Won Tae Jang, June Ahn, Jihwan Yoon
  • Patent number: 11561863
    Abstract: A method for enabling data set changes to be reverted to a prior point in time or state is disclosed. In one embodiment, such a method includes providing a data set comprising one or more data elements and a specified number of generations of the data elements. In certain embodiments, the data set is a partitioned data set extended (PDSE) data set, and the data elements are “members” within the PDSE data set. The method further includes tracking changes made by a job to data elements of the data set. The method further references, in a data structure (also referred to herein as a “cluster”) associated with the job, previous generations of the data elements changed by the job. In certain embodiments, the data structure is stored in the data set. A corresponding system and computer program product are also disclosed.
    Type: Grant
    Filed: August 20, 2015
    Date of Patent: January 24, 2023
    Assignee: International Business Machines Corporation
    Inventors: Trevor A. Geisler, David C. Reed, Thomas C. Reed, Max D. Smith
  • Patent number: 11539811
    Abstract: Systems, devices and methods for adaptive compression of stored information includes a memory management computing device programmed to monitor a size of a plurality of data structures stored in a data repository. The computing device compares the size of each of a plurality of data structures to a predetermined threshold. When a size of an uncompressed data structure meets the threshold, the memory management computing device calculates a value of a first compression parameter based on a value of a first parameter and a value of a second parameter of each data element of the uncompressed data structure, calculates a value of a second compression parameter based the value of the first parameter of each data element of the uncompressed data structure, generates a compressed data structure based on the value of the first compression parameter and the second compression parameter; and replaces, in the data repository, the uncompressed data structure with the compressed data structure.
    Type: Grant
    Filed: June 21, 2022
    Date of Patent: December 27, 2022
    Assignee: Chicago Mercantile Exchange Inc.
    Inventors: Fateen Sharaby, Sriram A. Raju Datla, Dhiraj Subhash Bawadhankar, John Charles Redfield, Justin Yeong-Juin Lee
  • Patent number: 11520744
    Abstract: Described is a system (and method) that intelligently distributes data within a clustered storage environment. To provide such a capability, the system may distribute backup files by considering a source of the data to be backed-up. In particular, the system may leverage the ability of front-end components such as a backup application to perform a granular data source identification of data. Such information may be propagated to back-end components such as a storage filesystem in the form of a data source identifier (e.g. placement tag). The data source identifiers may then be accessed by the clustered storage system to intelligently distribute backup files amongst a set of storage nodes forming a cluster. For example, backup files from the same data source may be stored on the same storage node to obtain the same deduplication efficiency as a single storage system.
    Type: Grant
    Filed: August 21, 2019
    Date of Patent: December 6, 2022
    Assignee: EMC IP Holding Company LLC
    Inventors: Abhishek Rajimwale, George Mathew, Murthy Mamidi, Donna Barry Lewis
  • Patent number: 11514054
    Abstract: Supervised partitioning is used to perform record matching. A request to identify matches between records is received. A graph representation that indicates similarities between the records is partitioned and an evaluation of the partitioning is performed according to a supervised machine learning technique to generate a confidence value in the partitioning. An indication of equivalent records according to the partitioning and the confidence value of the partitioning may be provided.
    Type: Grant
    Filed: September 27, 2018
    Date of Patent: November 29, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Andrew Borthwick, Robert Anthony Barton, Jr., Stephen Michael Ash, Russell Reas
  • Patent number: 11514025
    Abstract: Performing snapshot conscious internal file modification for network-attached storage is presented herein. A file system can comprise a first component configured to modify, during a service request, storage for a subset of data blocks of a file—the service request not being recognized by an external entity as a change of content of the file. Further, the file system can comprise a second component configured to prevent, based on the service request, a copy of the storage from being created for servicing of a snapshot—the snapshot comprising a point-in-time copy of the file system.
    Type: Grant
    Filed: August 19, 2019
    Date of Patent: November 29, 2022
    Assignee: EMC IP HOLDING COMPANY LLC
    Inventor: Ravi V. Batchu
  • Patent number: 11500841
    Abstract: Systems, computer-implemented methods, and computer program products that can facilitate encoding a tree data structure into a vector based on a set of constraints are provided. According to an embodiment, a system can comprise a memory that stores computer executable components and a processor that executes the computer executable components stored in the memory. The computer executable components can comprise a constraint former that can form a set of constraints based on a first tree data structure and a vector encoder that can encode the first tree data structure into a vector based on the set of constraints.
    Type: Grant
    Filed: January 4, 2019
    Date of Patent: November 15, 2022
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Achille Fokoue-Nkoutche, Maxwell Crouse, Michael Witbrock, Ryan A. Musa, Maria Chang
  • Patent number: 11474700
    Abstract: Technologies for compressing communications for accelerator devices are disclosed. An accelerator device may include a communication abstraction logic units to manage communication with one or more remote accelerator devices. The communication abstraction logic unit may receive communication to and from a kernel on the accelerator device. The communication abstraction logic unit may compress and decompress the communication without instruction from the corresponding kernel. The communication abstraction logic unit may choose when and how to compress communications based on telemetry of the accelerator device and the remote accelerator device.
    Type: Grant
    Filed: April 30, 2019
    Date of Patent: October 18, 2022
    Assignee: Intel Corporation
    Inventors: Susanne M. Balle, Evan Custodio, Francesc Guim Bernat