Data Cleansing, Data Scrubbing, And Deleting Duplicates Patents (Class 707/692)
-
Patent number: 12074953Abstract: The present disclosure relates to generating, updating, modifying, and otherwise managing configurations for virtual services on a cloud computing system. The present disclosure provides example implementations of a configuration management system and configuration handlers on respective server nodes that receive and process requests for modifying one or more configurations that manage operation of virtual services on the cloud. Systems described herein involve leveraging a hierarchical model of configuration characteristics to facilitate both large and small scale modifications. Moreover, the systems described herein leverage a persistent store on server nodes to identify how to update a current base configuration and sub-version as well as synchronize modifications across a set of server nodes.Type: GrantFiled: September 23, 2021Date of Patent: August 27, 2024Assignee: Microsoft Technology Licensing, LLCInventors: Sameer Kumar Patro, Aritra Basu, Raghavendra Subhash
-
Patent number: 12061581Abstract: Example implementations relate to metadata operations in a storage system. An example includes generating, by a storage controller of a deduplication storage system, a candidate list of container indexes for matching operations of a received data segment, each container index in the candidate list having an associated match cost; identifying, by the storage controller, a journal group associated with a first container index listed in the candidate list; reducing, by the storage controller, a match cost associated with the first container index in response to a determination that the identified journal group is in a modified state; and performing, by the storage controller, the matching operations of the received data segment based at least on the reduced match cost of the first container index.Type: GrantFiled: July 26, 2022Date of Patent: August 13, 2024Assignee: Hewlett Packard Enterprise Development LPInventors: Aman Sahil, Richard Phillip Mayo
-
Patent number: 12050879Abstract: A device may generate first scores for sentences of text based on a cumulative frequency of words in each sentence, may generate second scores for the sentences based on a cumulative frequency of domain entities in each sentence, and may generate third scores for the sentences based on a sentiment analysis of each sentence. The device may generate a summary of the text, may filter the sentences to extract a first set of sentences, may filter the sentences to extract a second set of sentences, and may filter the sentences to extract a third set of sentences. The device may identify and assign weights to a first group of sentences, a second group of sentences, and a third group of sentences, may generate a ranked list of sentences based on the weighted first group, second group, and third group, and may perform actions based on the final summary.Type: GrantFiled: May 24, 2022Date of Patent: July 30, 2024Assignee: Verizon Patent and Licensing Inc.Inventors: Prakash Ranganathan, Miruna Jayakrishnasamy
-
Patent number: 12050790Abstract: Aspects of the present disclosure configure a memory sub-system processor to manage memory operations with repeating data patterns. The processor receives a request to write a block of data comprising a plurality of portions to a set of memory components and determines whether a pattern of data repeats across the plurality of portions of the block of data. In response to determining that the pattern of data repeats across the plurality of portions, the processor stores a representation of the pattern of data in a mapping table and discards the block of data to prevent storing the block of data on the set of memory components.Type: GrantFiled: August 16, 2022Date of Patent: July 30, 2024Assignee: Micron Technology, Inc.Inventor: Anoop Achuthan Rajendrababu
-
Patent number: 12045211Abstract: One example method includes collaborative deduplication. A deduplication engine implemented at a cloud level collaborates or coordinates with an extension engine of the deduplication at an edge node. This allows data ingested at a node to be collaboratively deduplicated prior to transfer to the cloud and after transfer to the cloud.Type: GrantFiled: October 27, 2020Date of Patent: July 23, 2024Assignee: EMC IP HOLDING COMPANY LLCInventors: Mohamed Sohail, Karim Fathy, Robert A. Lincourt
-
Patent number: 12032534Abstract: A method and system is used in managing deduplication of data in storage systems. A first digest for a deduplication candidate is received. At least one stream associated with the deduplication candidate is detected. At least one neighboring digest segment of a first loaded digest segment associated with the at least one stream is loaded. Whether the digest is located in the at least one neighboring digest segment is determined. If the digest is not located in the at least one neighboring digest segment, the digest is processed.Type: GrantFiled: August 2, 2019Date of Patent: July 9, 2024Assignee: EMC IP Holding Company LLCInventors: Nickolay Dalmatov, Richard Ruef, Kurt Everson
-
Patent number: 12026386Abstract: A method for differential compression includes receiving input data blocks that are selected for compression. For each input data block, the input data block is divided into at least two segments. For each of the at least two segments, a similarity degree between the respective segment and each of the data blocks excluding the respective data block is computed. For each of the at least two segments, the data block which has a biggest similarity degree with the respective segment among the data blocks excluding the respective data block is selected as an optimal reference data block for the respective segment. The differential compression is applied to the input data block and optimal reference blocks in response to determining a differential compression that is to be applied based on the similarity degree between the segments of the input data block and the corresponding optimal reference blocks.Type: GrantFiled: September 23, 2022Date of Patent: July 2, 2024Assignee: HUAWEI TECHNOLOGIES CO., LTD.Inventor: Assaf Natanzon
-
Patent number: 12001685Abstract: A plurality of data stripes and one or more parity stripes are generated using a plurality of data chunks stored in a write-ahead log based on an erasure coding configuration. The plurality of data stripes and the one or more parity stripes are stored on corresponding different storage devices. The plurality of data stripes and the one or more parity stripes are associated together under a data protection grouping container.Type: GrantFiled: March 31, 2022Date of Patent: June 4, 2024Assignee: Cohesity, Inc.Inventors: Apurv Gupta, Akshat Agarwal
-
Patent number: 11995467Abstract: Systems, devices, and methods are provided for validation, deletion, and/or recovery of resources in a service environment. A machine (e.g., server) may receive a request to identify or discover a list of resources that are unused in a service environment. A machine (e.g., server) may receive a request to delete one or more resources in a service environment. In at least one embodiment, deletion of a resource involves a two-stage process where the resource is recoverably deleted in a first stage (e.g., by deactivating or disabling the resource) such that the resource can be recovered prior to a predetermined time period by reactivating or re-enabling the resource and, in a second stage, the resource is unrecoverably deleted.Type: GrantFiled: July 14, 2021Date of Patent: May 28, 2024Assignee: Amazon Technologies, Inc.Inventors: Suresh Prakash Goacher, Arun Anilkumar, Nishit Nihal Vas
-
Patent number: 11977527Abstract: In certain embodiments, machine learning and lineage data may be used to manage data. In some embodiments, a computing system may use lineage data to identify two datasets that may be related. The computing system may determine that a user has access to a derivative dataset but does not have access to an original dataset that was used to create the derivative dataset. In response, the computing system may use a machine learning model to generate a similarity score indicating a level of similarity between the original dataset and the derivative dataset. If the similarity score satisfies a threshold score, the computing system may modify access rights of the user so that the user is unable to access a portion of the data in the derivative dataset.Type: GrantFiled: January 3, 2022Date of Patent: May 7, 2024Assignee: Capital One Services, LLCInventors: William Ye, Jon Stofer, Thomas J. O'Connor, Jose Moreno
-
Patent number: 11966630Abstract: A data storage device includes a memory device and a controller coupled to the memory device. The controller is configured to segment a key to physical (K2P) table into two or more segments, wherein each segment of the two or more segments corresponds to a caching priority of key value (KV) pair data, organize the K2P table by storing and relocating one or more K2P table entries into a respective segment of the two or more segments, wherein the storing and relocating comprises moving a K2P table entry based on the caching priority of the KV pair data into the respective segment having the caching priority, and utilize the K2P table to manage KV pair data stored in the memory device, wherein utilizing the K2P table comprises applying a same management operation, such as prefetching, to each K2P table entry of a same segment.Type: GrantFiled: June 27, 2022Date of Patent: April 23, 2024Assignee: Western Digital Technologies, Inc.Inventors: Ran Zamir, Alexander Bazarsky, David Avraham
-
Patent number: 11954331Abstract: A computer-implemented method enables workload scheduling in a storage system for optimized deduplication. The method includes determining dynamic correlations of deduplications between workload processes in a prior time window. Workload processes include one or more tasks with defined execution timing parameters. The method further includes determining deduplication ratios based on the correlations of the deduplications between the workload processes. The method further includes scheduling multiple workload processes based on a highest determined deduplication ratio of the determined deduplication ratios.Type: GrantFiled: October 7, 2021Date of Patent: April 9, 2024Assignee: International Business Machines CorporationInventors: Miles Mulholland, Anuj Chandra, Kirsty G. Rodwell, Jorden Luke Allcock
-
Patent number: 11949751Abstract: The present disclosure relates to restricting electronic activities from being linked with record objects. According to at least one aspect of the disclosure, a method can include accessing, by one or more processors, a plurality of electronic activities, accessing a plurality of record objects of one or more systems of record, identifying an electronic activity of the plurality of electronic activities to match to one or more record objects, determining a data source provider associated with providing access to the electronic activity, and identifying a system of record corresponding to the determined data source provider. The system of record can include a plurality of candidate record objects to which to match the electronic activity. The method can include restricting the electronic activity from being linked with the at least one record object.Type: GrantFiled: January 23, 2023Date of Patent: April 2, 2024Inventors: Oleg Rogynskyy, Tetiana Lutsaievska, John Wulf, Sathya Hariesh Prakash
-
Patent number: 11934346Abstract: A cloud computing infrastructure hosts a web service with customer accounts. In a customer account, files of the customer account are listed in an index. Files indicated in the index are arranged in groups, with files in each group being scanned using scanning serverless functions in the customer account. The files in the customer account include a compressed tar archive of a software container. Member files of a compressed tar archive in a customer account are randomly-accessed by way of locators that indicate a tar offset, a logical offset, and a decompressor state for a corresponding member file. A member file is accessed by seeking to the tar offset in the compressed tar archive, restoring a decompressor to the decompressor state, decompressing the compressed tar archive using the decompressor, and moving to the logical offset in the decompressed data.Type: GrantFiled: October 17, 2022Date of Patent: March 19, 2024Assignee: Trend Micro IncorporatedInventor: Brendan M. Johnson
-
Patent number: 11936931Abstract: Methods, apparatus, systems and articles of manufacture to perform media device asset qualification are disclosed. An example apparatus includes at least one memory, and at least one processor to execute instructions to at least identify a first set of candidate media device assets for disqualification, the candidate media device assets including A) a signature and B) a media identifier that identifies media, generate a hash table using a second set of the candidate media device assets, determine one or more counts of matches between C) a first signature and a first media identifier of a first candidate media device asset of the second set and D) respective signatures and media identifiers of multiple ones of the second set using the hash table, the multiple ones of the second set not including the first candidate media device asset, and load the first signature into a reference database as a reference signature.Type: GrantFiled: October 17, 2022Date of Patent: March 19, 2024Assignee: The Nielsen Company (US), LLCInventors: Daniel Nelson, James Petro, Albert T. Borawski
-
Patent number: 11914554Abstract: Methods and systems for improving data back-up, recovery, and search across different cloud-based applications, services, and platforms are described. A data management and storage system may direct compute and storage resources within a customer's cloud-based data storage account to back-up and restore data while the customer retains full control of their data. The data management and storage system may direct the compute and storage resources within the customer's cloud-based data storage account to generate and store secondary layers that are used for generating search indexes, to generate and store shared space layers and user specific layers to facilitate the deduplication of email attachments and text blocks, to perform a controlled restoration of email snapshots such that sensitive information (e.g., restricted keywords) located within stored snapshots remains protected, and to detect and preserve emails that were received or transmitted and then deleted between two consecutive snapshots.Type: GrantFiled: January 30, 2023Date of Patent: February 27, 2024Assignee: Rubrik, Inc.Inventors: Noel Moldvai, Jihang Lim
-
Patent number: 11907133Abstract: Standardized address generation from address substrings includes receiving an address string for a place-of-interest, one-to-many mapping at least one of a plurality of address substrings of the address string to respective address components, concatenating the address substrings using a template that specifies an order of concatenating the address substrings, and making the concatenated address substrings available for further use.Type: GrantFiled: July 29, 2022Date of Patent: February 20, 2024Assignee: SafeGraph, Inc.Inventor: Vera Sazonova
-
Patent number: 11893373Abstract: Techniques are disclosed for deploying functions in a cloud computing environment. Parameters are annotated in a plurality of Helm charts with a predetermined token. Duplicated values in the Helm charts are identified and the predetermined token is reused for the duplicated values. Schema files from the plurality of Helm charts are parsed to extract the predetermined tokens. Input data are received as values for the predetermined tokens. The function is deployed in the cloud computing environment using the values for the predetermined tokens as parameters in the Helm charts.Type: GrantFiled: January 28, 2022Date of Patent: February 6, 2024Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Frank John D'Innocenzo, Kam Yee Lee
-
Patent number: 11886397Abstract: Provided are methods and systems for determining multi-faceted trust scores for data. A method may commence with receiving data and determining a plurality of metadata items associated with the data. The method may continue with determining one or more facets associated with each of the plurality of metadata items. The method may further include determining a parameter and a weight associated with each of the one or more facets. Upon determining the parameter and the weight, a trust score associated with each of the plurality of metadata items may be calculated based on the parameter and the weight associated with each of the one or more facets. The method may further include calculating a multi-faceted trust score of the data based on the trust score of each of the plurality of metadata items.Type: GrantFiled: February 19, 2020Date of Patent: January 30, 2024Assignee: ASG Technologies Group, Inc.Inventors: Jean-Philippe Moresmau, Marcus MacNeill
-
Patent number: 11888936Abstract: A method for providing a proxy redirect to facilitate a storage and a retrieval of an object is disclosed. The method includes receiving a mapping of a user to a logical container that stores the object and to a storage provider that stores the logical container; receiving a key corresponding to the logical container and associated with the user; storing the mapping and the key in a database; generating, for the user, an application protocol that redirects to a pre-signed web address based on the stored mapping and the stored key; and transmitting, via a communication interface, the application protocol to the one user. The method further includes the user using the application protocol to directly access the storage provider and retrieve the object.Type: GrantFiled: July 1, 2020Date of Patent: January 30, 2024Assignee: JPMORGAN CHASE BANK, N.A.Inventor: Zachariah Antonas
-
Patent number: 11853326Abstract: A technology for retrieving data from a database. The technology includes receiving a search query specifying a target attribute and a target attribute value, accessing an index to determine one or more target files in which the target attribute value appears, the index including a plurality of attribute values, and for each of the attribute values, one or more files in which the attribute value appears, and retrieving data from the one or more target files.Type: GrantFiled: October 14, 2021Date of Patent: December 26, 2023Assignee: Google LLCInventors: Hossein Ahmadi, Guang Cheng, Yannis Sismanis, Huong Thi Thu Phan, Shiyu Xie, Leo Chen, Zewen Zhang, Jing Jing Long, Amir Hossein Hormati
-
Patent number: 11836175Abstract: Semantic search techniques via focused summarizations are described. For example, a search query is received for a text-based content item in a data set comprising a plurality of text-based content items. A first feature vector representative of the search query is obtained. A respective semantic similarity score is determined between the first feature vector and each of a plurality of second feature vectors. Each of the second feature vectors is representative of a machine-generated summarization of a respective text-based content item. The machine-generated summarization comprises a plurality of multi-word fragments that are selected from the respective text-based content item via a transformer-based machine learning model. A search result is provided responsive to the search query.Type: GrantFiled: June 29, 2022Date of Patent: December 5, 2023Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Itzik Malkiel, Noam Koenigstein, Oren Barkan, Jonathan Ephrath, Yonathan Weill, Nir Nice
-
Patent number: 11797486Abstract: A device configured to identify a file in a network device, to generate a first set of block hash codes for data blocks for a first instance of the file, and to generate a second set of block hash codes for data blocks for a second instance of the file. The device is further configured to determine the first set of block hash codes matches the second set of block hash codes and to generate an entry in a file list for the instances of the file. The device is further configured to count the number of entries that are associated with the file and to determine the number of entries is greater than the redundancy threshold value. The device is further configured to delete one or more instances of the file in response to determining that the number of entries is greater than the redundancy threshold value.Type: GrantFiled: January 3, 2022Date of Patent: October 24, 2023Assignee: Bank of America CorporationInventors: Pratap Dande, Gilberto R. Dos Santos, Jayabalaji Murugan, Murali M. Atyam, Manoj Bohra
-
Patent number: 11797220Abstract: Data is ingested from a source system including by storing a plurality of data chunks in one or more chunk files and storing corresponding chunk identifiers associated with the plurality of data chunks in a first data structure. After data ingestion is complete, one or more duplicate data chunks that were stored during the data ingestion are determined and a second data structure is updated to include one or more entries corresponding to one or more determined duplicate data chunks.Type: GrantFiled: August 20, 2021Date of Patent: October 24, 2023Assignee: Cohesity, Inc.Inventors: Zhihuan Qiu, Sachin Jain, Anubhav Gupta, Apurv Gupta, Mohit Aron
-
Patent number: 11782878Abstract: A deduplicated storage system storing objects receives a search term. Storage includes metadata and segments into which the objects have been split and deduplicated. The metadata includes fingerprint sequences according to which the segments should be assembled. A partial match is found when a prefix of the term is found at an end of a segment or a suffix is found at a beginning of the segment. A fingerprint of the segment having the partial match is recorded. A first sequence of fingerprints associated with a first object is read to check whether any fingerprints in the first sequence have been recorded. When a fingerprint in the sequence has been recorded, a check of a next fingerprint in the sequence is made to see if it has been recorded as having the partial match. If the next fingerprint has been recorded, the first object is reported as having the term.Type: GrantFiled: December 14, 2021Date of Patent: October 10, 2023Assignee: EMC IP Holding Company LLCInventor: Philip Shilane
-
Patent number: 11748014Abstract: Host computers running applications that store data on a block-based storage system such as a SAN provide hints that differentiate IO data based on which application generated the IO. The hints may include tags that are associated with IO commands sent to the block-based storage system. Each host application is associated with a unique identifier that is placed in the tag. Application name-to-identifier mappings may be sent from the hosts to the block-based storage system. Per-identifier/application deduplication statistics are maintained by the block-based storage system and shared with other block-based storage system. Deduplication is disabled or de-emphasized for IO data generated by applications with statistically low deduplication ratios.Type: GrantFiled: February 14, 2020Date of Patent: September 5, 2023Assignee: DELL PRODUCTS L.P.Inventors: Kurumurthy Gokam, Md Haris Iqbal, Prasad Paple, Kundan Kumar
-
Patent number: 11734239Abstract: A record processing and storage system is operable to receive a plurality of labeled row data from a data source. Each labeled row data of the plurality of labeled row data includes at least one record and a corresponding row number of a plurality of row numbers. A plurality of pages are generated from records included in the labeled row data. The plurality of pages are stored via a page storage system. A plurality of page metadata corresponding to the plurality of pages is generated, where each of the plurality of page metadata is generated based on at least corresponding one row number of at least one labeled row data with records included in a corresponding one of the plurality of pages. Deduplication of duplicated records included the plurality of pages is facilitated based on the plurality of page metadata.Type: GrantFiled: March 15, 2022Date of Patent: August 22, 2023Assignee: Ocient Holdings LLCInventors: George Kondiles, Ravi V. Khadiwala, Donald Scott Clark, Anna Veselova
-
Patent number: 11704036Abstract: Systems and method for implementing deduplication process based on performance analyses. The system may include a processing device to determine a first performance metric associated with retrieving a second stored data block that is within a specified range of a duplicate of the first data block and a second performance metric associated with retrieving a hash value corresponding to the second stored data block. The processing device further to retrieve the second stored data block within a specified range of the duplicate of the first data block in response to the first performance metric not exceeding the second performance metric.Type: GrantFiled: November 16, 2018Date of Patent: July 18, 2023Assignee: PURE STORAGE, INC.Inventors: John Colgrove, Ronald Karr, Ethan L. Miller
-
Patent number: 11681660Abstract: Embodiments presented herein describe techniques for deduplicating chunks of data across multiple clusters. A process executing in a storage system identifies one or more chunks in an incoming stream of data. For each chunk, a first fingerprint corresponding to the chunk is generated. The process determines whether the first fingerprint matches a second fingerprint listed in a corresponding entry in a deduplication map. Each entry of the deduplication map corresponds to a chunk stored in a location in one of the storage clusters. Upon determining that the first fingerprint matches the second fingerprint, the process writes, to a local persistent storage, a pointer referencing the location in that storage cluster.Type: GrantFiled: January 22, 2021Date of Patent: June 20, 2023Assignee: Cohesity, Inc.Inventor: Ganesha Shanmuganathan
-
Patent number: 11675741Abstract: Methods and systems for improving data back-up, recovery, and search across different cloud-based applications, services, and platforms are described. A data management and storage system may direct compute and storage resources within a customer's cloud-based data storage account to back-up and restore data while the customer retains full control of their data. The data management and storage system may direct the compute and storage resources within the customer's cloud-based data storage account to generate and store secondary layers that are used for generating search indexes, to generate and store shared space layers and user specific layers to facilitate the deduplication of email attachments and text blocks, to perform a controlled restoration of email snapshots such that sensitive information (e.g., restricted keywords) located within stored snapshots remains protected, and to detect and preserve emails that were received or transmitted and then deleted between two consecutive snapshots.Type: GrantFiled: July 8, 2021Date of Patent: June 13, 2023Assignee: Rubrik, Inc.Inventors: Noel Moldvai, Jihang Lim
-
Patent number: 11663195Abstract: In one example, a method includes receiving, at a cloud storage site, chunks that each take the form of a hash of a combination that includes two or more salts and a file object, and one of the salts is a retention salt shared by the chunks, monitoring a time period associated with the retention salt, when the time period has expired, removing the chunks that include the retention salt, and depositing the removed chunks in a deleted items cloud store.Type: GrantFiled: October 28, 2021Date of Patent: May 30, 2023Assignee: EMC IP HOLDING COMPANY LLCInventor: Peter Marelas
-
Patent number: 11665377Abstract: Aspects of the subject disclosure may include, for example, a device having a processing system including a processor; and a memory that stores executable instructions that, when executed by the processing system, facilitate performance of operations including receiving encrypted hypertext transport protocol (HTTPS) traffic including media content; separating the HTTPS traffic into audio segments and video segments; calculating a size for each audio segment in the HTTPS traffic; maintaining a sliding window of a plurality of sizes of consecutive audio segments to form a fingerprint; and identifying the media content by matching the fingerprint with a reference in a catalog. Other embodiments are disclosed.Type: GrantFiled: April 23, 2021Date of Patent: May 30, 2023Assignee: AT&T Intellectual Property I, L.P.Inventors: Yuan Ding, Natalia Schenck, Daniel Sanchez, Umut Akyol, Lawrence E. Bakst, Vinay Sharma
-
Patent number: 11663194Abstract: In one example, a method includes receiving, at a cloud storage site, chunks that each take the form of a hash of a combination that includes two or more salts and a file object, and one of the salts is a retention salt shared by the chunks, monitoring a time period associated with the retention salt, when the time period has expired, removing the chunks that include the retention salt, and depositing the removed chunks in a deleted items cloud store.Type: GrantFiled: October 28, 2021Date of Patent: May 30, 2023Assignee: EMC IP HOLDING COMPANY LLCInventor: Peter Marelas
-
Patent number: 11663178Abstract: A deduplicated storage system is provided according to certain embodiments that uses one or more mechanisms to assign the deduplication databases based on the type of the client device and automatically create a new deduplication database when critical thresholds are reached. In other embodiments, deduplication databases are further split into multiple database partitions. Based on a data block distribution policy, each data block is then further assigned to a particular database partition within the deduplication database to further improve efficiency and speed of the deduplication process.Type: GrantFiled: October 23, 2020Date of Patent: May 30, 2023Assignee: Commvault Systems, Inc.Inventor: Prasad Nara
-
Patent number: 11663196Abstract: In one example, a method includes receiving, at a cloud storage site, chunks that each take the form of a hash of a combination that includes two or more salts and a file object, and one of the salts is a retention salt shared by the chunks, monitoring a time period associated with the retention salt, when the time period has expired, removing the chunks that include the retention salt, and depositing the removed chunks in a deleted items cloud store.Type: GrantFiled: October 28, 2021Date of Patent: May 30, 2023Assignee: EMC IP HOLDING COMPANY LLCInventor: Peter Marelas
-
Patent number: 11659006Abstract: An assessment component that facilitates assessment and enforcement of policies within a computer environment can comprise a compliance component that determines whether a policy, that defines one or more requirements associated with usage of one or more enterprise components of an enterprise computing system, is in compliance with a plurality of standardized policies that govern operation of the one or more enterprise components of the enterprise computing system. The assessment component can also comprise a policy optimization component that determines one or more changes to the policy that achieve the compliance with the plurality of standardized polices based on a determination that the policy complies with a first standardized policy of the plurality of standardized policies and fails to comply with a second standardized policy of the plurality of standardized policies.Type: GrantFiled: December 23, 2020Date of Patent: May 23, 2023Assignee: Kyndryl, Inc.Inventors: Milton H. Hernandez, Anup Kalia, Brian Peterson, Vugranam C. Sreedhar, Sai Zeng
-
Patent number: 11625167Abstract: An embodiment of a semiconductor apparatus may include technology to determine if a threshold is met based on runtime memory usage, and enable foreground memory deduplication if the threshold is determined to be met. Other embodiments are disclosed and claimed.Type: GrantFiled: November 16, 2018Date of Patent: April 11, 2023Assignee: Intel CorporationInventors: Dujian Wu, Yuping Yang, Donggui Yin
-
Patent number: 11609883Abstract: An apparatus in one embodiment comprises at least one processing device comprising a processor coupled to a memory. The processing device is configured to identify a dataset to be scanned to generate a compression estimate for that dataset, to designate a scan criterion to be utilized in the scan, and for each of a plurality of pages of the dataset, to scan the page, where scanning the page includes performing a computation on the page to obtain a page result, determining whether or not the page result satisfies the designated scan criterion, and responsive to the page result satisfying the designated scan criterion, updating a corresponding entry of a compression estimate table for the dataset. The processing device generates the compression estimate for the dataset based at least in part on contents of the compression estimate table.Type: GrantFiled: May 29, 2018Date of Patent: March 21, 2023Assignee: EMC IP Holding Company LLCInventors: Anton Kucherov, David Meiri
-
Patent number: 11599507Abstract: A file system may include an object storage, a merged index, and a distributed database. When a file is stored in the file system, the file may be converted to an object and be stored in the object storage. The deduplication index of the file may be stored in the distributed database. The namespace metadata of the file may be stored in the merged index. The merged index generates namespace entries of the file when the file is created, deleted, and/or modified. A namespace entry may be associated with a specific file and may include a creation version and a deletion version. When a file is deleted or modified, instead of modifying the existing namespace entries, new entries associated with different versions and including different creation or deletion versions are created. The status of a file may be monitored by one or more entries associated with a file.Type: GrantFiled: December 9, 2021Date of Patent: March 7, 2023Assignee: Druva Inc.Inventors: Milind Borate, Alok Kumar, Aditya Agrawal, Anup Agarwal, Somesh Jain, Aditya Kelkar, Yogendra Acharya, Anand Apte, Amit Kulkarni
-
Patent number: 11593028Abstract: A method of operating a computing device for processing data is provided. The method includes (a) monitoring a set of performance characteristics of the processing of the data; (b) periodically calculating, using a predefined set of coefficients, a linear combination of the monitored set of performance characteristics to yield a combined metric; and (c) upon detecting that the combined metric exceeds a threshold while operating in a first processing mode, transitioning from operating in the first processing mode to operating in a second processing mode. (1) The second processing mode has a higher bandwidth than the first processing mode, and (2) processing of data in the second processing mode is less robust than processing of data in the first processing mode. An apparatus, system, and computer program product for performing a similar method are also provided.Type: GrantFiled: March 11, 2021Date of Patent: February 28, 2023Assignee: EMC IP Holding Company LLCInventors: Vladimir Shveidel, Alexei Kabishcer
-
Patent number: 11573924Abstract: Methods and systems for storing and managing large numbers of small files. A data processing system includes clients that generate large numbers be stored on a storage device managed by a File System (FS). An Archive Server (AS) receives multiple files from the client, archives the files in larger archives, and sends the archives to the FS for storage. When requested to read a file, the AS retrieves the archive in which the file is stored, extracts the file and sends it to the requesting client. In other words, the AS communicates with the clients in individual file units, and with the storage device in archive units. The AS is typically constructed as an add-on layer on top of a conventional FS, which enables the FS to handle small files efficiently without modification.Type: GrantFiled: September 23, 2019Date of Patent: February 7, 2023Assignee: COGNYTE TECHNOLOGIES ISRAEL LTD.Inventor: Yossi Chai
-
Patent number: 11573928Abstract: Techniques for processing data may include: receiving a data block stored in a data set, wherein a hash value is derived from the data block; determining, in accordance with selection criteria, whether the hash value is included in a subset; responsive to determining the hash value is included in the subset, performing processing that updates a table in accordance with the hash value and the data set, and determining, in accordance with the information in the table, whether to perform deduplication processing for the data block to determine whether the data block is a duplicate of another stored data block. The table may include an entry for the hash value. The entry may include information identifying data sets referencing the data block and, for each of the data sets, may specify a reference count denoting a number of times the data set references the data block.Type: GrantFiled: March 13, 2020Date of Patent: February 7, 2023Assignee: EMC IP Holding Company LLCInventors: Anton Kucherov, David Meiri
-
Patent number: 11570196Abstract: A method for determining duplication of a vulnerability may include a vulnerability extraction step of extracting vulnerability uniform resource locator (URL) addresses including the vulnerability from an analysis target server; a hash generation step of generating the URL hash value corresponding to the extracted vulnerability from the vulnerability URL address; and a duplication determination step of determining, when the URL hash value is present in the first comparison table, that the vulnerability is duplicated and excluding the corresponding vulnerability from vulnerability information.Type: GrantFiled: February 26, 2020Date of Patent: January 31, 2023Assignee: NAVER CLOUD CORPORATIONInventors: Bong Goo Kang, Min Seob Lee, Won Tae Jang, June Ahn, Jihwan Yoon
-
Patent number: 11561863Abstract: A method for enabling data set changes to be reverted to a prior point in time or state is disclosed. In one embodiment, such a method includes providing a data set comprising one or more data elements and a specified number of generations of the data elements. In certain embodiments, the data set is a partitioned data set extended (PDSE) data set, and the data elements are “members” within the PDSE data set. The method further includes tracking changes made by a job to data elements of the data set. The method further references, in a data structure (also referred to herein as a “cluster”) associated with the job, previous generations of the data elements changed by the job. In certain embodiments, the data structure is stored in the data set. A corresponding system and computer program product are also disclosed.Type: GrantFiled: August 20, 2015Date of Patent: January 24, 2023Assignee: International Business Machines CorporationInventors: Trevor A. Geisler, David C. Reed, Thomas C. Reed, Max D. Smith
-
Patent number: 11539811Abstract: Systems, devices and methods for adaptive compression of stored information includes a memory management computing device programmed to monitor a size of a plurality of data structures stored in a data repository. The computing device compares the size of each of a plurality of data structures to a predetermined threshold. When a size of an uncompressed data structure meets the threshold, the memory management computing device calculates a value of a first compression parameter based on a value of a first parameter and a value of a second parameter of each data element of the uncompressed data structure, calculates a value of a second compression parameter based the value of the first parameter of each data element of the uncompressed data structure, generates a compressed data structure based on the value of the first compression parameter and the second compression parameter; and replaces, in the data repository, the uncompressed data structure with the compressed data structure.Type: GrantFiled: June 21, 2022Date of Patent: December 27, 2022Assignee: Chicago Mercantile Exchange Inc.Inventors: Fateen Sharaby, Sriram A. Raju Datla, Dhiraj Subhash Bawadhankar, John Charles Redfield, Justin Yeong-Juin Lee
-
Patent number: 11520744Abstract: Described is a system (and method) that intelligently distributes data within a clustered storage environment. To provide such a capability, the system may distribute backup files by considering a source of the data to be backed-up. In particular, the system may leverage the ability of front-end components such as a backup application to perform a granular data source identification of data. Such information may be propagated to back-end components such as a storage filesystem in the form of a data source identifier (e.g. placement tag). The data source identifiers may then be accessed by the clustered storage system to intelligently distribute backup files amongst a set of storage nodes forming a cluster. For example, backup files from the same data source may be stored on the same storage node to obtain the same deduplication efficiency as a single storage system.Type: GrantFiled: August 21, 2019Date of Patent: December 6, 2022Assignee: EMC IP Holding Company LLCInventors: Abhishek Rajimwale, George Mathew, Murthy Mamidi, Donna Barry Lewis
-
Patent number: 11514054Abstract: Supervised partitioning is used to perform record matching. A request to identify matches between records is received. A graph representation that indicates similarities between the records is partitioned and an evaluation of the partitioning is performed according to a supervised machine learning technique to generate a confidence value in the partitioning. An indication of equivalent records according to the partitioning and the confidence value of the partitioning may be provided.Type: GrantFiled: September 27, 2018Date of Patent: November 29, 2022Assignee: Amazon Technologies, Inc.Inventors: Andrew Borthwick, Robert Anthony Barton, Jr., Stephen Michael Ash, Russell Reas
-
Patent number: 11514025Abstract: Performing snapshot conscious internal file modification for network-attached storage is presented herein. A file system can comprise a first component configured to modify, during a service request, storage for a subset of data blocks of a file—the service request not being recognized by an external entity as a change of content of the file. Further, the file system can comprise a second component configured to prevent, based on the service request, a copy of the storage from being created for servicing of a snapshot—the snapshot comprising a point-in-time copy of the file system.Type: GrantFiled: August 19, 2019Date of Patent: November 29, 2022Assignee: EMC IP HOLDING COMPANY LLCInventor: Ravi V. Batchu
-
Patent number: 11500841Abstract: Systems, computer-implemented methods, and computer program products that can facilitate encoding a tree data structure into a vector based on a set of constraints are provided. According to an embodiment, a system can comprise a memory that stores computer executable components and a processor that executes the computer executable components stored in the memory. The computer executable components can comprise a constraint former that can form a set of constraints based on a first tree data structure and a vector encoder that can encode the first tree data structure into a vector based on the set of constraints.Type: GrantFiled: January 4, 2019Date of Patent: November 15, 2022Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Achille Fokoue-Nkoutche, Maxwell Crouse, Michael Witbrock, Ryan A. Musa, Maria Chang
-
Patent number: 11474700Abstract: Technologies for compressing communications for accelerator devices are disclosed. An accelerator device may include a communication abstraction logic units to manage communication with one or more remote accelerator devices. The communication abstraction logic unit may receive communication to and from a kernel on the accelerator device. The communication abstraction logic unit may compress and decompress the communication without instruction from the corresponding kernel. The communication abstraction logic unit may choose when and how to compress communications based on telemetry of the accelerator device and the remote accelerator device.Type: GrantFiled: April 30, 2019Date of Patent: October 18, 2022Assignee: Intel CorporationInventors: Susanne M. Balle, Evan Custodio, Francesc Guim Bernat