Patents by Inventor Sorin Faibish
Sorin Faibish has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11210230Abstract: Techniques are provided for inline deduplication based on a number of physical blocks having common fingerprints among multiple entries of a buffer cache. One method comprises storing input/output operations in a first cache comprising a plurality of entries each corresponding to a physical storage entity comprising a plurality of physical blocks. A given entry is maintained in the first cache based on a first number of physical blocks of the given entry having a duplicate fingerprint with at least one physical block of another entry in the first cache. A second number can be determined of the physical blocks of each entry having a fingerprint in a second cache, and a first ratio is determined for two entries in the first cache using the second number and the first number. A comparison of the first ratios can be performed to sort and possibly evict entries in the first cache based on the comparison.Type: GrantFiled: April 30, 2020Date of Patent: December 28, 2021Assignee: EMC IP Holding Company LLCInventors: Sorin Faibish, Philip Shilane, Philippe Armangau
-
Publication number: 20210342271Abstract: Techniques are provided for inline deduplication based on a number of physical blocks having common fingerprints among multiple entries of a buffer cache. One method comprises storing input/output operations in a first cache comprising a plurality of entries each corresponding to a physical storage entity comprising a plurality of physical blocks. A given entry is maintained in the first cache based on a first number of physical blocks of the given entry having a duplicate fingerprint with at least one physical block of another entry in the first cache. A second number is determined of the physical blocks of each entry having a fingerprint in a second cache, and a first ratio is determined for two entries in the first cache using the second number and the first number. A comparison of the first ratios can be performed to sort and possibly evict entries in the first cache based on the comparison.Type: ApplicationFiled: April 30, 2020Publication date: November 4, 2021Inventors: Sorin Faibish, Philip Shilane, Philippe Armangau
-
Patent number: 11163449Abstract: A method of accepting writes in a multilayered storage system is provided. The method includes (a) monitoring a rate of flushing of data from a first data storage component to a second data storage component; (b) setting an intake rate for the first data storage component based on the monitored flushing rate; and (c) throttling writes to the first data storage component based on the set intake rate. An apparatus, system, and computer program product for performing a similar method are also provided.Type: GrantFiled: October 17, 2019Date of Patent: November 2, 2021Assignee: EMC IP Holding Company LLCInventors: Sorin Faibish, Istvan Gonczi, Ivan Bassov
-
Patent number: 11157188Abstract: Techniques for processing data may include: receiving a candidate data block; computing a distance using a distance function, wherein the distance is an entropy-based distance and denotes a measurement of similarity between the candidate data block and a target data block; and determining, using the distance, whether to perform data deduplication of the candidate data block with respect to the target data block to identify at least one sub-block of the candidate data block that is a duplicate of at least one sub-block of the target data block. If the distance is less than a threshold, it may be expected to have a matching sub-block between the candidate and target data blocks. The distance may be a difference between entropy values for the candidate and target data blocks. The first entropy value may be used to determine whether to compress or perform partial deduplication for the candidate data block.Type: GrantFiled: April 24, 2019Date of Patent: October 26, 2021Assignee: EMC IP Holding Company LLCInventors: Ivan Bassov, Sorin Faibish, Istvan Gonczi, Philippe Armangau
-
Patent number: 11153385Abstract: A technique for transferring data over a network leverages a standard NAS (Network Attached Storage) protocol to augment its inherent file-copying ability with fingerprint matching, enabling the NAS protocol to limit its data copying over the network to unique data segments while avoiding copying of redundant data segments.Type: GrantFiled: August 22, 2019Date of Patent: October 19, 2021Assignee: EMC IP Holding Company LLCInventors: Sorin Faibish, Philip Shilane
-
Patent number: 11144206Abstract: A method and system for sharing data reduction metadata with storage systems. Specifically, the disclosed method and system entail communicating, to a storage system, information known to host devices from which data (submitted to-be-written to the storage system) may originate. This a priori reduction-pertinent information, which may include the potential to improve storage system efficiency and/or performance at least with respect to data reduction processing of the data submitted to-be-written, had previously been considered incommunicable to the storage system. The disclosed method and system, however, lift this previous limitation and enable communication of any storage system performance-improving information, applicable to the data submitted to-be-written, to the storage system.Type: GrantFiled: November 1, 2019Date of Patent: October 12, 2021Assignee: EMC IP Holding Company LLCInventors: Jeremy O'Hare, Alexandre Lemay, Matthew Fredette, Sorin Faibish
-
Patent number: 11138154Abstract: A method, computer program product, and computing system for performing an entropy analysis on each of a plurality of candidate data chunks associated with a potential candidate to generate a plurality of candidate data chunk entropies; performing an entropy analysis on each of a plurality of target data chunks associated with a potential target to generate a plurality of target data chunk entropies; identifying a candidate data chunk entropy limit, chosen from the plurality of candidate data chunk entropies, and a target data chunk entropy limit, chosen from the plurality of candidate data chunk entropies; and comparing a specific candidate data chunk associated with the candidate data chunk entropy limit to a specific target data chunk associated with the target data chunk entropy limit to determine if the specific candidate data chunk and the specific target data chunk are identical.Type: GrantFiled: May 3, 2019Date of Patent: October 5, 2021Assignee: EMC IP Holding Company, LLCInventors: Sorin Faibish, Philip Shilane, Ivan Basov, Istvan Gonczi, Vamsi Vankamamidi
-
Patent number: 11132334Abstract: Methods and apparatus are provided for filtering dynamically loadable namespaces (DLNs). An exemplary method comprises, in response to a job submitted by an application, obtaining a DLN portion of a global single namespace of a file system, wherein the DLN is associated with the job and is maintained in a capacity tier of a storage system; obtaining filtering directives from a user; reducing the DLN using a filtering mechanism on a directory tree associated with the DLN, based on the filtering directives, by removing files in the directory tree of the DLN that do not satisfy requirements of the filtering directives to generate a filtered DLN; and dynamically loading the filtered DLN, including reduced metadata for the filtered DLN relative to the DLN, from the capacity tier into a performance tier of the storage system for processing by the application.Type: GrantFiled: September 21, 2018Date of Patent: September 28, 2021Assignee: EMC IP Holding Company LLCInventors: John M. Bent, Sorin Faibish, Patrick S. Combes, Eriks S. Paegle, James M. Pedone, Jr.
-
Publication number: 20210286783Abstract: A technique for performing data deduplication operates at sub-block granularity by searching a deduplication database for a match between a candidate sub-block of a candidate block and a target sub-block of a previously-stored target block. When a match is found, the technique identifies a duplicate range shared between the candidate block and the target block and effects persistent storage of the duplicate range by configuring mapping metadata of the candidate block so that it points to the duplicate range in the target block.Type: ApplicationFiled: March 17, 2021Publication date: September 16, 2021Inventors: Philippe Armangau, Sorin Faibish, Istvan Gonczi, Ivan Bassov, Vamsi K. Vankamamidi
-
Patent number: 11112987Abstract: Techniques for processing data may include: receiving a candidate block; performing partial deduplication processing of the candidate block; receiving a second candidate block subsequent to performing partial deduplication processing for the candidate block; and performing first processing to determine whether to perform promotion processing for the entry, The partial deduplication processing may include: partially deduplicating at least one sub-block of the candidate block; and creating an entry in a deduplication database for the candidate block, wherein the entry includes a digest of the candidate block and the entry denotes a potential target block having the digest, and wherein the entry includes a counter that tracks a number of missed full block deduplications between the potential target block and subsequently processed candidate blocks. The promotion processing promotes the potential target block, having the first digest of the entry, to a new target block.Type: GrantFiled: April 24, 2019Date of Patent: September 7, 2021Assignee: EMC IP Holding Company LLCInventors: Istvan Gonczi, Philippe Armangau, Sorin Faibish, Ivan Bassov
-
Patent number: 11112985Abstract: Techniques for processing data may include: receiving a candidate data block; computing a distance using a distance function, wherein the distance denotes a measurement of similarity between the candidate data block and a target data block; and determining, using the distance, whether to perform data deduplication of the candidate data block with respect to the target data block to identify at least one sub-block of the candidate data block that is a duplicate of at least one sub-block of the target data block. The distance may be computed using a bit-wise logical exclusive-or operation of the contents of the candidate data block and the target data block. The distance may be computed using a bit-wise logical exclusive-or operation of digests computed for the candidate and target data blocks using a distance preserving hash function. The target and candidate block may be similar if the distance is less than a threshold.Type: GrantFiled: April 24, 2019Date of Patent: September 7, 2021Assignee: EMC IP Holding Company LLCInventors: Ivan Bassov, Philippe Armangau, Sorin Faibish, Istvan Gonczi
-
Patent number: 11093468Abstract: A computer-executable method, system, and computer program product for managing metadata in a distributed data storage system, wherein the distributed data storage system includes a first burst buffer having a key-value store enabled to store metadata, the computer-executable method, system, and computer program product comprising receiving, from a compute node, metadata related to data stored within the distributed data storage system, indexing the metadata at the first burst buffer, and processing the metadata in the first burst buffer.Type: GrantFiled: March 31, 2014Date of Patent: August 17, 2021Assignee: EMC IP Holding Company LLCInventors: John M. Bent, Sorin Faibish, Zhenhua Zhang, Xuezhao Liu, Jingwang Zhang
-
Patent number: 11080196Abstract: Techniques are provided for pattern-aware prefetching using a parallel log-structured file system. At least a portion of one or more files is accessed by detecting at least one pattern in a non-sequential access of the one or more files; and obtaining at least a portion of the one or more files based on the detected at least one pattern. The obtaining step comprises, for example, a prefetching or pre-allocation of the at least the portion of the one or more files. A prefetch cache can store the portion of the one or more obtained files. The cached portion of the one or more files can be provided from the prefetch cache to an application requesting the at least a portion of the one or more files.Type: GrantFiled: December 17, 2019Date of Patent: August 3, 2021Assignees: EMC IP Holding Company LLC, Triad National Security, LLCInventors: John M. Bent, Sorin Faibish, Gary Grider, Aaron Torres, Jun He
-
Publication number: 20210132814Abstract: A method and system for sharing data reduction metadata with storage systems. Specifically, the disclosed method and system entail communicating, to a storage system, information known to host devices from which data (submitted to-be-written to the storage system) may originate. This a priori reduction-pertinent information, which may include the potential to improve storage system efficiency and/or performance at least with respect to data reduction processing of the data submitted to-be-written, had previously been considered incommunicable to the storage system. The disclosed method and system, however, lift this previous limitation and enable communication of any storage system performance-improving information, applicable to the data submitted to-be-written, to the storage system.Type: ApplicationFiled: November 1, 2019Publication date: May 6, 2021Inventors: Jeremy O'Hare, Alexandre Lemay, Matthew Fredette, Sorin Faibish
-
Patent number: 10997126Abstract: Methods and apparatus are provided for reorganizing dynamically loadable namespaces (DLNs). In one exemplary embodiment, a method comprises the steps of, in response to a job submitted by an application, obtaining a DLN portion of a global single namespace of a file system, wherein the DLN is associated with the job and is maintained in a capacity tier of object storage of a storage system; obtaining one or more reordering directives from a user; rearranging one or more files in the DLN into a new directory hierarchy based on the one or more reordering directives to generate a reordered DLN; and dynamically loading the reordered DLN, including the metadata only for the reordered DLN, from the capacity tier of object storage into a performance tier of storage of the storage system for processing by the application. The reordered DLN is merged into the DLN following one or more modifications to the reordered DLN.Type: GrantFiled: December 8, 2015Date of Patent: May 4, 2021Assignee: EMC IP Holding Company LLCInventors: John M. Bent, Sorin Faibish, Patrick S. Combes, Eriks S. Paegle, James M. Pedone
-
Publication number: 20210125053Abstract: Continuous learning may include receiving a first neural network trained using a first training data set to predict outputs; determining whether the first neural network has a successful prediction rate greater than a prediction threshold; and responsive to determining the first neural network does not have a successful prediction rate greater than the prediction threshold, performing processing.Type: ApplicationFiled: October 25, 2019Publication date: April 29, 2021Applicant: EMC IP Holding Company LLCInventor: Sorin Faibish
-
Patent number: 10990565Abstract: A method, computer program product, and computing system for processing a data portion to divide the data portion into a plurality of data chunks; performing an entropy analysis on each of the plurality of data chunks to generate a plurality of data chunk entropies; and determining an average data chunk entropy from the plurality of data chunk entropies.Type: GrantFiled: May 3, 2019Date of Patent: April 27, 2021Assignee: EMC IP Holding Company, LLCInventors: Sorin Faibish, Philip Shilane, Ivan Basov, Istvan Gonczi, Philippe Armangau, Vamsi Vankamamidi
-
Patent number: 10990310Abstract: Techniques for data processing may include: determining one or more sub-blocks of a target block that match one or more sub-blocks of a candidate block; creating a shared sub-block mapping (SSM) structure having a plurality of entries, wherein each of the plurality of entries corresponds to a different one of the sub-blocks in the candidate block and wherein a value stored in said each entry, corresponding to one of the sub-blocks of the candidate block, identifies a sub-block of the target block matching said one sub-block of the candidate block; and storing the candidate block as a deduplicated block sharing at least one sub-block with the target block. The SSM structure may be stored as a metadata structure of the candidate block to identify deduplicated sub-blocks of the candidate block and to identify sub-blocks of the target block providing content for the deduplicated sub-blocks of the candidate block.Type: GrantFiled: April 24, 2019Date of Patent: April 27, 2021Assignee: EMC IP Holding Company LLCInventors: Istvan Gonczi, Ivan Bassov, Sorin Faibish, Philippe Armangau
-
Publication number: 20210117799Abstract: A method of monitoring storage performance of a remote data storage apparatus (DSA) is provided. The method includes (a) receiving performance metrics of the DSA and a first set of behavioral estimates generated by a first neural network (NN) running on the DSA operating on the performance metrics; (b) operating a second NN on the computing device with the received performance metrics as inputs, the second NN configured to produce a second set of behavioral estimates as outputs in response to the performance metrics, the second NN running at a higher level of precision than the first NN; and (c) sending to the remote DSA updated parameters of an updated version of the first NN based at least in part on the performance metrics and the first and second sets of behavioral estimates. Apparatuses, systems, and computer program products for performing similar methods are also provided.Type: ApplicationFiled: October 17, 2019Publication date: April 22, 2021Inventors: Sorin Faibish, Istvan Gonczi, Ivan Bassov
-
Publication number: 20210117099Abstract: A method of accepting writes in a multilayered storage system is provided. The method includes (a) monitoring a rate of flushing of data from a first data storage component to a second data storage component; (b) setting an intake rate for the first data storage component based on the monitored flushing rate; and (c) throttling writes to the first data storage component based on the set intake rate. An apparatus, system, and computer program product for performing a similar method are also provided.Type: ApplicationFiled: October 17, 2019Publication date: April 22, 2021Inventors: Sorin Faibish, Istvan Gonczi, Ivan Bassov