Patents by Inventor Danny Harnik

Danny Harnik has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20180210658
    Abstract: Embodiments for optimizing dual-layered data compression in a storage environment. In a data storage system having a primary compressor and a secondary compressor, the primary compressor is selectively used to perform a first one of a plurality of actions on Input/Output (I/O) data while a second one of the plurality of actions is performed on the I/O data by the secondary compressor, thereby reducing latency and improving an overall compression performance while processing the I/O data.
    Type: Application
    Filed: January 25, 2017
    Publication date: July 26, 2018
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Danny HARNIK, Sergey MARENKOV, Yosef SHATSKY
  • Publication number: 20180203769
    Abstract: Machines, systems and methods for enhancing data recovery in a data storage system, the method comprising determining whether one or more data storage mediums in a data storage system are unavailable; determining data that are at a risk of loss, due to said one or more data storage mediums being unavailable; from among the data that is determined to be at the risk of loss, identifying data that is highly vulnerable to loss; and creating one or more temporary replicas of the data that is highly vulnerable to loss.
    Type: Application
    Filed: March 16, 2018
    Publication date: July 19, 2018
    Inventors: Danny Harnik, Elliot K. Kolodner, Dmitry Sotnikov, Paula K. Ta-Shma
  • Publication number: 20180150474
    Abstract: A computer-implemented method includes dividing a data set into a plurality of regions and dividing the plurality of regions into a plurality of chunks of fixed size. The computer-implemented method further includes determining a sample size of the plurality of chunks to be sampled for each region, wherein the sample size is determined based, at least in part, on an acceptance of a likelihood of identifying at least one collision between two regions corresponding to logical entities of a first cluster of logical entities. The computer-implemented method further includes sampling the plurality of chunks for each region based on the determined sample size. The computer-implemented method further includes generating a hash value for each chunk sampled and storing each hash value in an index. The computer-implemented method further includes identifying one or more collisions between the plurality of regions. A corresponding computer system and computer program product are also disclosed.
    Type: Application
    Filed: August 16, 2017
    Publication date: May 31, 2018
    Inventors: Danny Harnik, Ety Khaitzin, Sergey Marenkov, Dmitry Sotnikov
  • Publication number: 20180150473
    Abstract: A computer-implemented method includes dividing a data set into a plurality of regions and dividing the plurality of regions into a plurality of chunks of fixed size. The computer-implemented method further includes determining a sample size of the plurality of chunks to be sampled for each region, wherein the sample size is determined based, at least in part, on an acceptance of a likelihood of identifying at least one collision between two regions corresponding to logical entities of a first cluster of logical entities. The computer-implemented method further includes sampling the plurality of chunks for each region based on the determined sample size. The computer-implemented method further includes generating a hash value for each chunk sampled and storing each hash value in an index. The computer-implemented method further includes identifying one or more collisions between the plurality of regions. A corresponding computer system and computer program product are also disclosed.
    Type: Application
    Filed: November 30, 2016
    Publication date: May 31, 2018
    Inventors: Danny Harnik, Ety Khaitzin, Sergey Marenkov, Dmitry Sotnikov
  • Patent number: 9984092
    Abstract: A computer-implemented method includes dividing a data set into a plurality of regions and dividing the plurality of regions into a plurality of chunks of fixed size. The computer-implemented method further includes determining a sample size of the plurality of chunks to be sampled for each region, wherein the sample size is determined based, at least in part, on an acceptance of a likelihood of identifying at least one collision between two regions corresponding to logical entities of a first cluster of logical entities. The computer-implemented method further includes sampling the plurality of chunks for each region based on the determined sample size. The computer-implemented method further includes generating a hash value for each chunk sampled and storing each hash value in an index. The computer-implemented method further includes identifying one or more collisions between the plurality of regions. A corresponding computer system and computer program product are also disclosed.
    Type: Grant
    Filed: August 16, 2017
    Date of Patent: May 29, 2018
    Assignee: International Business Machines Corporation
    Inventors: Danny Harnik, Ety Khaitzin, Sergey Marenkov, Dmitry Sotnikov
  • Patent number: 9965182
    Abstract: Various embodiments for optimizing deduplication in a computing storage environment by a processor. Links between data regions are intelligently formed, based on up-to-date popularity statistics, including a number of times a particular one of the data regions was a target for a potential link with another one of the data regions.
    Type: Grant
    Filed: October 21, 2015
    Date of Patent: May 8, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Danny Harnik, Ben Sasson, Yosef Shatsky, Dmitry Sotnikov
  • Patent number: 9946602
    Abstract: Machines, systems and methods for enhancing data recovery in a data storage system, the method comprising determining whether one or more data storage mediums in a data storage system are unavailable; determining data that are at a risk of loss, due to said one or more data storage mediums being unavailable; from among the data that is determined to be at the risk of loss, identifying data that is highly vulnerable to loss; and creating one or more temporary replicas of the data that is highly vulnerable to loss.
    Type: Grant
    Filed: February 17, 2016
    Date of Patent: April 17, 2018
    Assignee: International Business Machines Corporation
    Inventors: Danny Harnik, Elliot K. Kolodner, Dmitry Sotnikov, Paula K. Ta-Shma
  • Publication number: 20180074745
    Abstract: This invention relates to a system and method for managing deduplication of volume regions. A storage controller receives the volume hashes of the data stored on a set of storage devices and produces a set of volume sketches. The volume sketches represent a fraction of the hashes. From these sketches and mergers of sketches, the deduped storage size is estimated. The sketches also integrate compression ratios in systems combining both deduplication and compression and are combined with off-line scans in systems that do not have deduplication.
    Type: Application
    Filed: September 12, 2016
    Publication date: March 15, 2018
    Inventors: Danny Harnik, Ronen Kat, Ety Khaitzin
  • Patent number: 9916320
    Abstract: Methods, computing systems and computer program products implement embodiments of the present invention that include configuring a storage system to store multiple storage entities, and defining, in a memory, a lookup table including multiple entries, each of the entries referencing a unique storage entity. Upon receiving a storage entity to be stored on the storage system, a compressibility of the received storage entity is determined upon detecting that the received storage entity is not identical to any of the unique storage entities referenced by the lookup table, and an entry referencing the received storage entity is added to the lookup table upon meeting a duplication condition based on the determined compressibility.
    Type: Grant
    Filed: April 26, 2015
    Date of Patent: March 13, 2018
    Assignee: International Business Machines Corporation
    Inventors: Danny Harnik, Dmitry Sotnikov
  • Patent number: 9817865
    Abstract: Various embodiments for identifying data in a data deduplication system, by a processor device, are provided. In one embodiment, a method comprises efficiently identifying duplicate data in the data deduplication system by identifying fingerprint matches using a direct inter-region fingerprint lookup to search for the fingerprint matches in at least one of a plurality of metadata regions, the direct inter-region fingerprint lookup supplementing a central fingerprint index.
    Type: Grant
    Filed: December 7, 2015
    Date of Patent: November 14, 2017
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: David D. Chambliss, Jonathan Fischer-Toubol, Joseph S. Glider, Danny Harnik, Ety Khaitzin, Yifat Kuttner, Michael Moser, Yosef Shatsky
  • Patent number: 9792350
    Abstract: For real-time classification of data into data compression domains, a decision is made for which of the data compression domains write operations should be forwarded by reading randomly selected data of the write operations for computing a set of classifying heuristics thereby creating a fingerprint for each of the write operations. The write operations having a similar fingerprint are compressed together in a similar compression stream.
    Type: Grant
    Filed: January 10, 2013
    Date of Patent: October 17, 2017
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jonathan Amit, Lilia Demidov, George Goldberg, Nir Halowani, Danny Harnik, Chaim Koifman, Sergey Marenkov, Oded Margalit, Kat I. Ronen, Dmitry Sotnikov
  • Publication number: 20170286444
    Abstract: Computer program products, as well as corresponding systems and methods are configured for performing deduplication in conjunction with random read and write operations, and include: receiving a write request comprising data; computing a fingerprint of the data; determining whether a short term dictionary comprises an entry corresponding to the fingerprint; in response to determining the short term dictionary comprises the entry corresponding to the fingerprint, writing the data to a data store in a deduplicating manner; in response to determining the short term dictionary does not comprise the entry, determining whether a long term dictionary corresponding to the namespace comprises the entry; in response to determining the long term dictionary comprises the entry, writing the data to the data store in the deduplicating manner; and in response to determining the long term dictionary does not comprise the entry, writing the data to the data store in a non-deduplicating manner.
    Type: Application
    Filed: March 29, 2016
    Publication date: October 5, 2017
    Inventors: David D. Chambliss, Joseph S. Glider, Danny Harnik, Ety Khaitzin
  • Publication number: 20170262467
    Abstract: A computer-implemented method includes receiving a set of basis fingerprints corresponding to image chunks within a basis set of image regions wherein each image region within the basis set of image regions comprises one or more image chunks, and generating a fingerprint for each image chunk of a plurality of selected image chunks within an unprocessed region of a machine image to produce a plurality of sampled fingerprints. The method also includes determining a similarity metric for the unprocessed region from the sampled fingerprints and the basis fingerprints, comparing the similarity metric for the unprocessed region with a selected threshold, and including the unprocessed region within the basis set of image regions in response to determining that the similarity metric is less than the selected threshold. A corresponding computer program product and computer system are also disclosed herein.
    Type: Application
    Filed: March 8, 2016
    Publication date: September 14, 2017
    Inventors: Danny Harnik, Ronen I. Kat, Ety Khaitzin, Sergey Marenkov
  • Publication number: 20170262466
    Abstract: A computer-implemented method includes receiving a set of representative machine image regions for a computing environment wherein the set of representative machine image regions collectively comprise a set of representative image chunks. The method also includes generating a fingerprint for each representative image chunk within the set of representative image chunks to produce a set of representative fingerprints, generating a fingerprint for selected image chunks within a measured machine image region to produce a set of sampled fingerprints, and determining a deduplication metric for the measured machine image region based on the representative fingerprints and the sampled fingerprints. A corresponding computer program product and computer system are also disclosed herein.
    Type: Application
    Filed: March 8, 2016
    Publication date: September 14, 2017
    Inventors: Jonathan Amit, Danny Harnik, Ety Khaitzin, Sergey Marenkov
  • Publication number: 20170262468
    Abstract: A computer-implemented method includes receiving a set of basis fingerprints corresponding to image chunks within a basis set of image regions wherein each image region within the basis set of image regions comprises one or more image chunks, and generating a fingerprint for each image chunk of a plurality of selected image chunks within an unprocessed region of a machine image to produce a plurality of sampled fingerprints. The method also includes determining a similarity metric for the unprocessed region from the sampled fingerprints and the basis fingerprints, comparing the similarity metric for the unprocessed region with a selected threshold, and including the unprocessed region within the basis set of image regions in response to determining that the similarity metric is less than the selected threshold. A corresponding computer program product and computer system are also disclosed herein.
    Type: Application
    Filed: May 22, 2017
    Publication date: September 14, 2017
    Inventors: Danny Harnik, Ronen I. Kat, Ety Khaitzin, Sergey Marenkov
  • Patent number: 9760578
    Abstract: Calculating fingerprints for each one of a multiplicity of alignment combinations of fixed-size deduplication data blocks and comparing each of the fingerprints to stored deduplicated data fingerprints in a lookup database for determining a preferred deduplication data block alignment. A deduplication data block comprises each of the fixed-size deduplication data blocks.
    Type: Grant
    Filed: July 23, 2014
    Date of Patent: September 12, 2017
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Aviv Caro, Danny Harnik, Ety Khaitzin, Chaim Koifman, Sergey Marenkov, Ben Sasson, Yosef Shatsky, Dmitry Sotnikov, Shai I. Tahar
  • Publication number: 20170199904
    Abstract: Methods, computing systems and computer program products implement embodiments of the present invention that include partitioning a dataset into a full set of logical data units, and selecting a sample subset of the full set, the sample subset including a random sample of the full set based on a sampling ratio. A set of target hash values are selected from a full range of hash values, and, using a hash function, a respective unit hash value is calculated for each of the logical data units in the sample subset. A histogram is computed that indicates a duplication count of each of the unit hash values that matches a given target hash value, and based on the histogram, a number of distinct logical data units in the full set is estimated.
    Type: Application
    Filed: January 13, 2016
    Publication date: July 13, 2017
    Applicant: International Business Machines Corporation
    Inventors: Danny Harnik, Ety KHAITZIN, Dmitry SOTNIKOV
  • Publication number: 20170201602
    Abstract: Methods and systems for data transfer include adding a data chunks to a priority queue in an order based on utilization priority. A reducibility score for the data chunks is determined. A data reduction operation is performed on a data chunk having a highest reducibility in the priority queue using a processor if sufficient resources are available. The data chunk having the lowest reducibility score is moved from the priority queue to a transfer queue for transmission if the transfer queue is not full.
    Type: Application
    Filed: January 13, 2016
    Publication date: July 13, 2017
    Inventors: Danny Harnik, Alexei Karve, Andrzej Kochut, Dmitry Sotnikov
  • Publication number: 20170199895
    Abstract: A method, including partitioning a dataset into a first number of data units, and selecting, based on a sampling ratio, a second number of the data units. A hash value is calculated for each of the selected data units, and a first histogram is computed indicating a first duplication count for each of the calculated hash values. Based on respective frequencies of the calculated hash values, a second histogram is computed indicating an observed frequency for each of the first duplication counts in the first histogram, and based on the sampling ratio and the second histogram, a target function is derived. A third histogram that minimizes the target function is derived, the third histogram including, for the first number of the storage units, second duplication counts and a respective predicted frequency for each of the second duplication counts. Finally, a deduplication ratio is determined based on the third histogram.
    Type: Application
    Filed: January 13, 2016
    Publication date: July 13, 2017
    Inventors: Danny Harnik, David Chambliss, Oded Margalit, Dmitry Sotnikov
  • Publication number: 20170199892
    Abstract: A method, including identifying, using a sampling ratio, a random number of logical data units. A hash is calculated for each of the identified logical data units, and a first histogram is computed indicating a duplication count of each of the calculated hashes. Based on respective frequencies of the calculated hashes, a second histogram is computed indicating observed frequencies of each of the duplication counts in the first histogram, and based on the sampling ratio and the second histogram, a target function is derived. A range of acceptable results is derived for the target function, and based on the range of the acceptable results, a set of plausible duplication frequency histograms is defined. A first given plausible duplication frequency histogram having a highest number of distinct logical data units is identified, and a second given plausible duplication frequency histogram having a lowest number of distinct logical data units is identified.
    Type: Application
    Filed: January 13, 2016
    Publication date: July 13, 2017
    Applicant: International Business Machines Corporation
    Inventors: Danny Harnik, Ety Khaitzin, Dmitry Sotnikov