Patents by Inventor Istvan Gonczi

Istvan Gonczi has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20200349132
    Abstract: A method, computer program product, and computing system for identifying a potential deduplication candidate and a related deduplication target; executing a comparison operation with respect to the potential deduplication candidate and the related deduplication target to generate a comparison result; and determining a level of similarity between the potential deduplication candidate and the related deduplication target by processing the comparison result.
    Type: Application
    Filed: May 3, 2019
    Publication date: November 5, 2020
    Inventors: Istvan Gonczi, Ivan Basov, Sorin Faibish, Philippe Armangau, Anton Kucherov
  • Publication number: 20200341666
    Abstract: Techniques for processing data may include: receiving a candidate data block; computing a distance using a distance function, wherein the distance denotes a measurement of similarity between the candidate data block and a target data block; and determining, using the distance, whether to perform data deduplication of the candidate data block with respect to the target data block to identify at least one sub-block of the candidate data block that is a duplicate of at least one sub-block of the target data block. The distance may be computed using a bit-wise logical exclusive-or operation of the contents of the candidate data block and the target data block. The distance may be computed using a bit-wise logical exclusive-or operation of digests computed for the candidate and target data blocks using a distance preserving hash function. The target and candidate block may be similar if the distance is less than a threshold.
    Type: Application
    Filed: April 24, 2019
    Publication date: October 29, 2020
    Applicant: EMC IP Holding Company LLC
    Inventors: Ivan Bassov, Philippe Armangau, Sorin Faibish, Istvan Gonczi
  • Publication number: 20200341667
    Abstract: Techniques for processing data may include: receiving a candidate data block; computing a distance using a distance function, wherein the distance is an entropy-based distance and denotes a measurement of similarity between the candidate data block and a target data block; and determining, using the distance, whether to perform data deduplication of the candidate data block with respect to the target data block to identify at least one sub-block of the candidate data block that is a duplicate of at least one sub-block of the target data block. If the distance is less than a threshold, it may be expected to have a matching sub-block between the candidate and target data blocks. The distance may be a difference between entropy values for the candidate and target data blocks. The first entropy value may be used to determine whether to compress or perform partial deduplication for the candidate data block.
    Type: Application
    Filed: April 24, 2019
    Publication date: October 29, 2020
    Applicant: EMC IP Holding Company LLC
    Inventors: Ivan Bassov, Sorin Faibish, Istvan Gonczi, Philippe Armangau
  • Publication number: 20200341671
    Abstract: Techniques for processing data may include: receiving a candidate block; performing partial deduplication processing of the candidate block; receiving a second candidate block subsequent to performing partial deduplication processing for the candidate block; and performing first processing to determine whether to perform promotion processing for the entry, The partial deduplication processing may include: partially deduplicating at least one sub-block of the candidate block; and creating an entry in a deduplication database for the candidate block, wherein the entry includes a digest of the candidate block and the entry denotes a potential target block having the digest, and wherein the entry includes a counter that tracks a number of missed full block deduplications between the potential target block and subsequently processed candidate blocks. The promotion processing promotes the potential target block, having the first digest of the entry, to a new target block.
    Type: Application
    Filed: April 24, 2019
    Publication date: October 29, 2020
    Applicant: EMC IP Holding Company LLC
    Inventors: Istvan Gonczi, Philippe Armangau, Sorin Faibish, Ivan Bassov
  • Publication number: 20200341668
    Abstract: Techniques for data processing may include: determining one or more sub-blocks of a target block that match one or more sub-blocks of a candidate block; creating a shared sub-block mapping (SSM) structure having a plurality of entries, wherein each of the plurality of entries corresponds to a different one of the sub-blocks in the candidate block and wherein a value stored in said each entry, corresponding to one of the sub-blocks of the candidate block, identifies a sub-block of the target block matching said one sub-block of the candidate block; and storing the candidate block as a deduplicated block sharing at least one sub-block with the target block. The SSM structure may be stored as a metadata structure of the candidate block to identify deduplicated sub-blocks of the candidate block and to identify sub-blocks of the target block providing content for the deduplicated sub-blocks of the candidate block.
    Type: Application
    Filed: April 24, 2019
    Publication date: October 29, 2020
    Applicant: EMC IP Holding Company LLC
    Inventors: Istvan Gonczi, Ivan Bassov, Sorin Faibish, Philippe Armangau
  • Patent number: 10817475
    Abstract: 4th 115078 A method, computer program product, and computing system for encoding a candidate data portion to generate an encoded candidate data portion; identifying one or more portion similarities between the encoded candidate data portion and an encoded target data portion to position the one or more portion similarities with respect to the encoded target data portion, thus generating one or more portion similarity measurements; identifying one or more portion differences between the encoded candidate data portion and the encoded target data portion to generate one or more portion difference measurements; and combining the one or more portion similarity measurements and the one or more portion difference measurements to generate a candidate similarity measurement for the candidate data portion.
    Type: Grant
    Filed: May 3, 2019
    Date of Patent: October 27, 2020
    Assignee: EMC IP Holding Company, LLC
    Inventors: Sorin Faibish, Philip Shilane, Ivan Basov, Istvan Gonczi, Philippe Armangau, Vamsi Vankamamidi
  • Publication number: 20200327098
    Abstract: Techniques for processing data may include: receiving a plurality of data chunks for a data set; performing data deduplication processing for the plurality of data chunks; determining, in accordance with one or more criteria, whether a frequency distribution of a frequency histogram of digest byte frequencies is sufficiently uniform; and responsive to determining that the frequency distribution of the frequency histogram is not sufficiently uniform, performing processing to update data deduplication settings for the data set. Updating the data deduplication settings may include using a stronger hash algorithm and/or a larger size digest when generating subsequent digests. The data deduplication processing may include: determining, using a current hash algorithm, a plurality of digests for the plurality of data chunks of the data set; and updating the frequency histogram of digest byte frequencies for the data set in accordance the plurality of digests.
    Type: Application
    Filed: April 11, 2019
    Publication date: October 15, 2020
    Applicant: EMC IP Holding Company LLC
    Inventors: Istvan Gonczi, Ivan Bassov, Sorin Faibish
  • Patent number: 10768843
    Abstract: Techniques for data processing may include: receiving a candidate block including a plurality of uniformly-sized sub-blocks, wherein a tag is stored at a first location in the candidate block; performing data deduplication processing of the candidate block, wherein the data deduplication processing excludes content stored from a first offset to a second offset corresponding to the first location; determining whether at least one sub-block of the candidate block has been deduplicated by the data deduplication processing; and responsive to determining that at least one sub-block of the candidate block has been deduplicated, storing the candidate block as a deduplicated data block having at least one sub-block matching an existing target sub-block, wherein a tag descriptor describing the tag is stored and associated with the candidate block, such as in block-level metadata of the candidate block. The tag descriptor may include tag content and tag location information.
    Type: Grant
    Filed: February 4, 2019
    Date of Patent: September 8, 2020
    Assignee: EMC IP Holding Company LLC
    Inventors: Sorin Faibish, Philippe Armangau, Istvan Gonczi, Ivan Bassov, Anton Kucherov
  • Publication number: 20200249860
    Abstract: Techniques for data processing may include: receiving a candidate block including a plurality of uniformly-sized sub-blocks, wherein a tag is stored at a first location in the candidate block; performing data deduplication processing of the candidate block, wherein the data deduplication processing excludes content stored from a first offset to a second offset corresponding to the first location; determining whether at least one sub-block of the candidate block has been deduplicated by the data deduplication processing; and responsive to determining that at least one sub-block of the candidate block has been deduplicated, storing the candidate block as a deduplicated data block having at least one sub-block matching an existing target sub-block, wherein a tag descriptor describing the tag is stored and associated with the candidate block, such as in block-level metadata of the candidate block. The tag descriptor may include tag content and tag location information.
    Type: Application
    Filed: February 4, 2019
    Publication date: August 6, 2020
    Applicant: EMC IP Holding Company LLC
    Inventors: Sorin Faibish, Philippe Armangau, Istvan Gonczi, Ivan Bassov, Anton Kucherov
  • Patent number: 10733158
    Abstract: A method, computer program product, and computing system for receiving a candidate data portion; calculating a distance-preserving hash for the candidate data portion; and performing an entropy analysis on the distance-preserving hash to generate a hash entropy for the candidate data portion.
    Type: Grant
    Filed: May 3, 2019
    Date of Patent: August 4, 2020
    Assignee: EMC IP Holding Company LLC
    Inventors: Sorin Faibish, Philip Shilane, Ivan Basov, Istvan Gonczi, Philippe Armangau, Vamsi Vankamamidi
  • Publication number: 20200241778
    Abstract: Techniques for data processing may include: receiving a data chunk and an associated digest; and performing data deduplication processing for the data chunk comprising: determining whether there is an existing entry in a deduplication digest cache for the digest; and responsive to determining there is no existing entry in the deduplication digest cache, performing processing including: determining whether there is an existing entry in a mapping structure for the digest, the mapping structure mapping digests to associated pages of related entries in a deduplication data store; and responsive to determining there is an existing entry in the mapping structure, performing second processing including: obtaining, from the existing entry, a page mapped to the digest; and loading the page of entries from the deduplication data store into the deduplication digest cache. At least some entries of the page may be prefetched and loaded into the deduplication digest cache prior to use.
    Type: Application
    Filed: January 24, 2019
    Publication date: July 30, 2020
    Applicant: EMC IP Holding Company LLC
    Inventors: Ivan Bassov, Istvan Gonczi
  • Patent number: 10664165
    Abstract: A method is used in managing inline data compression and deduplication in storage systems. A block of data from data stored in a cache of a storage system is identified based on entropy. Entropy of the block of data is compared with a first threshold value. Based on the comparison, the block of data is either deduplicated or compressed without deduplication.
    Type: Grant
    Filed: May 10, 2019
    Date of Patent: May 26, 2020
    Assignee: EMC IP Holding Company LLC
    Inventors: Sorin Faibish, Istvan Gonczi, Philippe Armangau, Vamsi Vankamamidi, Ivan Bassov
  • Publication number: 20200133923
    Abstract: Techniques for data processing a data set may comprise: performing first processing that forms a first compression unit, wherein the first compression unit includes a data chunks including a first data chunk having a first entropy value less than an entropy threshold, the first processing including: receiving a second data chunk; determining, in accordance with criteria, whether to add the second data chunk to the first compression unit; and responsive to determining to add the second data chunk to the first compression unit, adding the second data chunk to the first compression unit; and compressing the first compression unit as a single compressible unit. The second chunk may be added if its entropy value is less than the entropy threshold and if entropy values of the first and second chunks are similar. The second chunk may be added if the resulting compression unit provides sufficient storage/compression benefit.
    Type: Application
    Filed: September 9, 2019
    Publication date: April 30, 2020
    Applicant: EMC IP Holding Company LLC
    Inventors: Ivan Bassov, Sorin Faibish, Istvan Gonczi
  • Publication number: 20200133546
    Abstract: A technique for managing cache in a storage system that supports data deduplication renders each of a set of data blocks as multiple sub-blocks and loads a cache-resident digest database on a per-block basis, selectively creating new digest entries in the database for all sub-blocks in a block, but only for blocks that contain no duplicate sub-blocks. Sub-blocks of blocks containing duplicates are excluded. By limiting digest entries to sub-blocks of blocks that contain no duplicates, the storage system limits the size of the digest database, and thus of the cache, while also biasing the contents of the digest database toward entries that are likely to produce deduplication matches in the future.
    Type: Application
    Filed: October 31, 2018
    Publication date: April 30, 2020
    Inventors: Sorin Faibish, Philippe Armangau, Istvan Gonczi, Ivan Bassov, Vamsi K. Vankamamidi
  • Publication number: 20200133928
    Abstract: A technique for performing data deduplication operates at sub-block granularity by searching a deduplication database for a match between a candidate sub-block of a candidate block and a target sub-block of a previously-stored target block. When a match is found, the technique identifies a duplicate range shared between the candidate block and the target block and effects persistent storage of the duplicate range by configuring mapping metadata of the candidate block so that it points to the duplicate range in the target block.
    Type: Application
    Filed: October 31, 2018
    Publication date: April 30, 2020
    Inventors: Philippe Armangau, Sorin Faibish, Istvan Gonczi, Ivan Bassov, Vamsi K. Vankamamidi
  • Publication number: 20200134047
    Abstract: Techniques for data processing may include: receiving a data chunk of the data set; determining, in accordance with criteria including a compressibility ratio for the data set and a cost ratio of compression computation cost and entropy computation cost, whether to activate or deactivate entropy computation for the data set, wherein the compressibility ratio is ratio of a number of compressible data chunks of the data set and a number of uncompressible data chunks of the data set; and responsive to determining to activate entropy computation for the data set, performing first processing comprising: determining an entropy value for the data chunk; and determining, in accordance with the entropy value for the data chunk, whether to compress the data chunk.
    Type: Application
    Filed: October 30, 2018
    Publication date: April 30, 2020
    Applicant: EMC IP Holding Company LLC
    Inventors: Ivan Bassov, Philippe Armangau, Sorin Faibish, Istvan Gonczi
  • Patent number: 10509676
    Abstract: Techniques for data processing may include: computing an entropy value for the chunk; determining, in accordance with the entropy value for the data chunk, whether the data chunk is compressible; and responsive to determining the data chunk is compressible based on the entropy value for the chunk, compressing the data chunk. The entropy value may be determined using counters for data items where the counters denote current frequencies of different allowable data items in the data chunk; and performing second processing using the counters to determine an entropy value for the data chunk, wherein said second processing includes selecting a precomputed binary logarithmic value from a table for each of the counters. The table may include integer representations of binary logarithmic values. The second processing may include loading multiple data items of the chunk into a register, extracting each data item from the register and incrementing a corresponding counter.
    Type: Grant
    Filed: October 29, 2018
    Date of Patent: December 17, 2019
    Assignee: EMC IP Holding Company LLC
    Inventors: Ivan Bassov, Istvan Gonczi, Sorin Faibish
  • Patent number: 10505563
    Abstract: Techniques for data processing may include: determining a data layout for a configuration of counters stored in registers, wherein each of the registers is configured to store at least two counters, and each counter is associated with a particular data item allowable in the data set and denotes a current frequency of the particular data item; receiving data items of a data chunk of the data set; for each data item received, performing processing including: determining a first of the counters corresponding to the data item, wherein the first counter is stored in a first of the registers and denotes a current frequency of the data item; and incrementing the first counter stored in the first register by one; and determining, in accordance with the counters stored in the registers, an entropy value for the data chunk.
    Type: Grant
    Filed: October 26, 2018
    Date of Patent: December 10, 2019
    Assignee: EMC IP Holding Company LLC
    Inventors: Istvan Gonczi, Ivan Bassov, Sorin Faibish
  • Publication number: 20190324675
    Abstract: In response to a cache flush event indicating that host data accumulated in a cache of a storage processor of a data storage system is to be flushed to a lower deck file system, an aggregation set of blocks is formed within the cache, and a digest calculation group is selected from within the aggregation set. Hardware vector processing logic is caused to simultaneously calculate crypto-digests from the blocks in the digest calculation group. If one of the resulting crypto-digests matches a previously generated crypto-digest, deduplication is performed that i) causes the lower deck file system to indicate the block of data from which the previously generated crypto-digest was generated and ii) discards the block that corresponds to the matching crypto-digest. Objects required by a digest generation component may be allocated in a just in time manner to avoid having to manage a pool of pre-allocated objects.
    Type: Application
    Filed: June 24, 2019
    Publication date: October 24, 2019
    Inventors: Istvan Gonczi, Ivan Bassov, Philippe Armangau
  • Patent number: 10452616
    Abstract: Techniques for data processing a data set may comprise: performing first processing that forms a first compression unit, wherein the first compression unit includes a data chunks including a first data chunk having a first entropy value less than an entropy threshold, the first processing including: receiving a second data chunk; determining, in accordance with criteria, whether to add the second data chunk to the first compression unit; and responsive to determining to add the second data chunk to the first compression unit, adding the second data chunk to the first compression unit; and compressing the first compression unit as a single compressible unit. The second chunk may be added if its entropy value is less than the entropy threshold and if entropy values of the first and second chunks are similar. The second chunk may be added if the resulting compression unit provides sufficient storage/compression benefit.
    Type: Grant
    Filed: October 29, 2018
    Date of Patent: October 22, 2019
    Assignee: EMC IP Holding Company LLC
    Inventors: Ivan Bassov, Sorin Faibish, Istvan Gonczi