Patents by Inventor Guilherme Menezes
Guilherme Menezes has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20220365852Abstract: Methods and systems for backing up and restoring files that have multiple hard links using master file references and index node-based mappings are described. In some cases, file fetching and restoration may be performed by a storage appliance using master file references in which a master file is identified for each multi-link file that is backed-up on the storage appliance and then referenced by one or more hard links to the multi-link file. In other cases, file fetching and restoration may be performed by a storage appliance using index node-based mappings for multi-link files that provide mappings between index node identifiers (e.g., inode numbers) for the multi-link files on a primary system and hard link paths for storing the file contents of the multi-link files on a storage appliance used for backing up the primary system.Type: ApplicationFiled: July 20, 2022Publication date: November 17, 2022Inventors: Looi Chow Lee, Ziqi Liu, Guilherme Menezes
-
Patent number: 11474912Abstract: Methods and systems for backing up and restoring files that have multiple hard links using master file references and index node-based mappings are described. In some cases, file fetching and restoration may be performed by a storage appliance using master file references in which a master file is identified for each multi-link file that is backed-up on the storage appliance and then referenced by one or more hard links to the multi-link file. In other cases, file fetching and restoration may be performed by a storage appliance using index node-based mappings for multi-link files that provide mappings between index node identifiers (e.g., inode numbers) for the multi-link files on a primary system and hard link paths for storing the file contents of the multi-link files on a storage appliance used for backing up the primary system.Type: GrantFiled: January 31, 2019Date of Patent: October 18, 2022Assignee: Rubrik, Inc.Inventors: Looi Chow Lee, Ziqi Liu, Guilherme Menezes
-
Patent number: 11269817Abstract: In one example, a method includes measuring an amount of physical storage space used, or expected to be used, by a portion of a dataset S of segments, and measuring the amount of physical storage space includes receiving information that identifies an ad-hoc group of size ‘n’ of files F1 . . . Fn that makes up a subset of the dataset S, determining a number of unique segments in the dataset S, identifying a respective unique segment set UF1 . . . UFN for each of the ‘n’ files in the ad-hoc group of files, performing a set union operation on the unique segment sets UF1 . . . UFN, and determining a sum of sizes of the unique segment sets UF1 . . . UFN, where the sum is the amount of physical storage space used or expected to be used by the ad-hoc group of size ‘n’ of files F1 . . . Fn.Type: GrantFiled: April 10, 2019Date of Patent: March 8, 2022Assignee: EMC IP HOLDING COMPANY LLCInventors: Guilherme Menezes, Fabiano Botelho, Abdullah Reza
-
Patent number: 11151030Abstract: A first set of garbage collection (GC) features and non-GC features associated with a storage system are received, the first set of features being associated with a predetermined start date and a time window. A learning equation is generated having a plurality of vectors of GC features and a plurality of vectors of non-GC features. For a current iteration representing a current GC process, it is determined whether a first prior GC process was started within the time window. An entry of vectors of the non-GC features of the learning equation is populated based on corresponding feature values of the first set of non-GC features, in response to determining that the first prior GC process was started within the time window. A predetermined regression algorithm is applied to the learning equation to generate a GC duration predictive model to predict a GC duration of a subsequent GC process.Type: GrantFiled: August 31, 2016Date of Patent: October 19, 2021Assignee: EMC IP HOLDING COMPANY LLCInventors: Fabiano C. Botelho, Mark Chamness, Dmitry Serdyuk, Guilherme Menezes
-
Patent number: 10838923Abstract: Identifying files that do not deduplicate well in a storage system with deduplication facilitates optimizing storage capacity by moving the identified files to less expensive storage without deduplication. Any set of files can be examined to remove files that are identified as files that do not deduplicate well. The process of identification includes arranging the files in a predefined order and using bitmap representations of the unique segments in the files to determine a count of different segments in neighboring next files compared to the previous files, and removing from deduplication any next files that exceed a difference threshold. The bitmap representations of the files allows the identification processes to be performed efficiently for large datasets. Any over-identification of files is minimized by repeating the identification processes on the set of files after arranging them in the reverse order.Type: GrantFiled: December 18, 2015Date of Patent: November 17, 2020Assignee: EMC IP HOLDING COMPANY LLCInventors: Guilherme Menezes, Abdullah Reza
-
Publication number: 20200250049Abstract: Methods and systems for backing up and restoring files that have multiple hard links using master file references and index node-based mappings are described. In some cases, file fetching and restoration may be performed by a storage appliance using master file references in which a master file is identified for each multi-link file that is backed-up on the storage appliance and then referenced by one or more hard links to the multi-link file. In other cases, file fetching and restoration may be performed by a storage appliance using index node-based mappings for multi-link files that provide mappings between index node identifiers (e.g., inode numbers) for the multi-link files on a primary system and hard link paths for storing the file contents of the multi-link files on a storage appliance used for backing up the primary system.Type: ApplicationFiled: January 31, 2019Publication date: August 6, 2020Applicant: RUBRIK, INC.Inventors: Looi Chow Lee, Ziqi Liu, Guilherme Menezes
-
Patent number: 10459648Abstract: File measurements are computed and stored in persistent memory of a deduplicated storage system as files are written or on demand, where the file measurements are used to estimate storage requirements for storing a subset of files. The file measurements are accumulated into an initial measurement at a first point in time and a final measurement at a second point in time to obtain an estimate of any change in a quantity of unique segments required to store the subset of files in the deduplicated storage system between the first and second points in time. Future storage requirements can be estimated based on a computed rate of change in the amount of storage required to store the subset of files between the first and second points in time.Type: GrantFiled: December 14, 2015Date of Patent: October 29, 2019Assignee: EMC IP Holding Company LLCInventors: Guilherme Menezes, Abdullah Reza
-
Patent number: 10430383Abstract: In one example, a method for processing data includes receiving information that identifies an ad hoc group of size ‘n’ of files F1 . . . Fn, each file F including a respective file sequence S that includes K data segments. Next, each file sequence S is sampled to obtain a sequence SS of data segments from the file sequence S, and a non-random sampling of data segments is sampled from each sequence SS to obtain a set SSU of the sequence SS. The data segments of each set SSU are then sampled to obtain a sample subset SSUS of the set SSU, and a compression ratio is determined for each data segment in each sample subset SSUS. Finally, an average data compression RF1 . . . Fn is estimated and output for the files F in the group of size ‘n’, based on the compression ratios.Type: GrantFiled: September 30, 2015Date of Patent: October 1, 2019Assignee: EMC IP HOLDING COMPANY LLCInventors: Guilherme Menezes, Teng Xu, Abdullah Reza
-
Publication number: 20190236054Abstract: In one example, a method includes measuring an amount of physical storage space used, or expected to be used, by a portion of a dataset S of segments, and measuring the amount of physical storage space includes receiving information that identifies an ad-hoc group of size ‘n’ of files F1 . . . Fn that makes up a subset of the dataset S, determining a number of unique segments in the dataset S, identifying a respective unique segment set UF1 . . . UFN for each of the ‘n’ files in the ad-hoc group of files, performing a set union operation on the unique segment sets UF1 . . . UFN, and determining a sum of sizes of the unique segment sets UF1 . . . UFN, where the sum is the amount of physical storage space used or expected to be used by the ad-hoc group of size ‘n’ of files F1 . . . Fn.Type: ApplicationFiled: April 10, 2019Publication date: August 1, 2019Inventors: Guilherme Menezes, Fabiano Botelho, Abdullah Reza
-
Patent number: 10303662Abstract: In one example, a method for processing data includes receiving information that identifies an ad-hoc group of size ‘n’ of files F1. . . Fn, each file F including a respective segment set S, and then sampling a representation of each unique segment in the segment set S to obtain a sampled unique segment count for each file F. A unique segment count is then obtained for each file F by applying a sampling ratio R to each sampled unique segment count, and an average segment size for each file F is determined. Next, a physical space measurement is generated for each file F based on the average segment size and the unique segment count, and then a total physical space measurement p is generated based on the individual physical space measurements for each file F.Type: GrantFiled: September 30, 2015Date of Patent: May 28, 2019Assignee: EMC IP HOLDING COMPANY LLCInventors: Guilherme Menezes, Fabiano Botelho, Abdullah Reza
-
Patent number: 10303797Abstract: Clustering files in deduplication systems is based on an estimate of similarity between files in a file system. The estimates of similarity are based on how much content the files share, where the estimate of how much content is shared is based on an estimate of segments shared. The estimate of segments shared is based on segment offsets found in the files' bitmap vectors of segment offsets. The found segment offsets are used to generate a cluster definition approximating an optimal data structure for clustering files that share content. The approximated optimal data structure defines clusters hierarchically arranged based on the offset numbers of the found segment offsets.Type: GrantFiled: December 18, 2015Date of Patent: May 28, 2019Assignee: EMC IP HOLDING COMPANY LLCInventors: Guilherme Menezes, Abdullah Reza
-
Patent number: 9460389Abstract: Mechanisms for predicting a GC duration are described herein. In one embodiment, the mechanisms include receiving a first set of features determined based on current operating status and prior garbage collection (GC) statistics of a first storage system. In one embodiment, the mechanisms include predicting a GC duration of a first GC process being performed at the first storage system by applying a predictive model on the first set of features, wherein the predictive model was generated based on a second set of features received periodically from a plurality of storage systems.Type: GrantFiled: May 31, 2013Date of Patent: October 4, 2016Assignee: EMC CorporationInventors: Fabiano C. Botelho, Mark Chamness, Dmitry Serdyuk, Guilherme Menezes