Patents by Inventor Danny Harnik

Danny Harnik has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

OPTIMIZING DUAL-LAYERED COMPRESSION IN STORAGE SYSTEMS

Publication number: 20180210658

Abstract: Embodiments for optimizing dual-layered data compression in a storage environment. In a data storage system having a primary compressor and a secondary compressor, the primary compressor is selectively used to perform a first one of a plurality of actions on Input/Output (I/O) data while a second one of the plurality of actions is performed on the I/O data by the secondary compressor, thereby reducing latency and improving an overall compression performance while processing the I/O data.

Type: Application

Filed: January 25, 2017

Publication date: July 26, 2018

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Danny HARNIK, Sergey MARENKOV, Yosef SHATSKY
Reliability Enhancement in a Distributed Storage System

Publication number: 20180203769

Abstract: Machines, systems and methods for enhancing data recovery in a data storage system, the method comprising determining whether one or more data storage mediums in a data storage system are unavailable; determining data that are at a risk of loss, due to said one or more data storage mediums being unavailable; from among the data that is determined to be at the risk of loss, identifying data that is highly vulnerable to loss; and creating one or more temporary replicas of the data that is highly vulnerable to loss.

Type: Application

Filed: March 16, 2018

Publication date: July 19, 2018

Inventors: Danny Harnik, Elliot K. Kolodner, Dmitry Sotnikov, Paula K. Ta-Shma
IDENTIFICATION OF HIGH DEDUPLICATION DATA

Publication number: 20180150474

Abstract: A computer-implemented method includes dividing a data set into a plurality of regions and dividing the plurality of regions into a plurality of chunks of fixed size. The computer-implemented method further includes determining a sample size of the plurality of chunks to be sampled for each region, wherein the sample size is determined based, at least in part, on an acceptance of a likelihood of identifying at least one collision between two regions corresponding to logical entities of a first cluster of logical entities. The computer-implemented method further includes sampling the plurality of chunks for each region based on the determined sample size. The computer-implemented method further includes generating a hash value for each chunk sampled and storing each hash value in an index. The computer-implemented method further includes identifying one or more collisions between the plurality of regions. A corresponding computer system and computer program product are also disclosed.

Type: Application

Filed: August 16, 2017

Publication date: May 31, 2018

Inventors: Danny Harnik, Ety Khaitzin, Sergey Marenkov, Dmitry Sotnikov
IDENTIFICATION OF HIGH DEDUPLICATION DATA

Publication number: 20180150473

Abstract: A computer-implemented method includes dividing a data set into a plurality of regions and dividing the plurality of regions into a plurality of chunks of fixed size. The computer-implemented method further includes determining a sample size of the plurality of chunks to be sampled for each region, wherein the sample size is determined based, at least in part, on an acceptance of a likelihood of identifying at least one collision between two regions corresponding to logical entities of a first cluster of logical entities. The computer-implemented method further includes sampling the plurality of chunks for each region based on the determined sample size. The computer-implemented method further includes generating a hash value for each chunk sampled and storing each hash value in an index. The computer-implemented method further includes identifying one or more collisions between the plurality of regions. A corresponding computer system and computer program product are also disclosed.

Type: Application

Filed: November 30, 2016

Publication date: May 31, 2018

Inventors: Danny Harnik, Ety Khaitzin, Sergey Marenkov, Dmitry Sotnikov
Identification of high deduplication data

Patent number: 9984092

Abstract: A computer-implemented method includes dividing a data set into a plurality of regions and dividing the plurality of regions into a plurality of chunks of fixed size. The computer-implemented method further includes determining a sample size of the plurality of chunks to be sampled for each region, wherein the sample size is determined based, at least in part, on an acceptance of a likelihood of identifying at least one collision between two regions corresponding to logical entities of a first cluster of logical entities. The computer-implemented method further includes sampling the plurality of chunks for each region based on the determined sample size. The computer-implemented method further includes generating a hash value for each chunk sampled and storing each hash value in an index. The computer-implemented method further includes identifying one or more collisions between the plurality of regions. A corresponding computer system and computer program product are also disclosed.

Type: Grant

Filed: August 16, 2017

Date of Patent: May 29, 2018

Assignee: International Business Machines Corporation

Inventors: Danny Harnik, Ety Khaitzin, Sergey Marenkov, Dmitry Sotnikov
Optimization of data deduplication

Patent number: 9965182

Abstract: Various embodiments for optimizing deduplication in a computing storage environment by a processor. Links between data regions are intelligently formed, based on up-to-date popularity statistics, including a number of times a particular one of the data regions was a target for a potential link with another one of the data regions.

Type: Grant

Filed: October 21, 2015

Date of Patent: May 8, 2018

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Danny Harnik, Ben Sasson, Yosef Shatsky, Dmitry Sotnikov
Reliability enhancement in a distributed storage system

Patent number: 9946602

Abstract: Machines, systems and methods for enhancing data recovery in a data storage system, the method comprising determining whether one or more data storage mediums in a data storage system are unavailable; determining data that are at a risk of loss, due to said one or more data storage mediums being unavailable; from among the data that is determined to be at the risk of loss, identifying data that is highly vulnerable to loss; and creating one or more temporary replicas of the data that is highly vulnerable to loss.

Type: Grant

Filed: February 17, 2016

Date of Patent: April 17, 2018

Assignee: International Business Machines Corporation

Inventors: Danny Harnik, Elliot K. Kolodner, Dmitry Sotnikov, Paula K. Ta-Shma
Managing Volumes with Deduplication using Volume Sketches

Publication number: 20180074745

Abstract: This invention relates to a system and method for managing deduplication of volume regions. A storage controller receives the volume hashes of the data stored on a set of storage devices and produces a set of volume sketches. The volume sketches represent a fraction of the hashes. From these sketches and mergers of sketches, the deduped storage size is estimated. The sketches also integrate compression ratios in systems combining both deduplication and compression and are combined with off-line scans in systems that do not have deduplication.

Type: Application

Filed: September 12, 2016

Publication date: March 15, 2018

Inventors: Danny Harnik, Ronen Kat, Ety Khaitzin
Compression-based filtering for deduplication

Patent number: 9916320

Abstract: Methods, computing systems and computer program products implement embodiments of the present invention that include configuring a storage system to store multiple storage entities, and defining, in a memory, a lookup table including multiple entries, each of the entries referencing a unique storage entity. Upon receiving a storage entity to be stored on the storage system, a compressibility of the received storage entity is determined upon detecting that the received storage entity is not identical to any of the unique storage entities referenced by the lookup table, and an entry referencing the received storage entity is added to the lookup table upon meeting a duplication condition based on the determined compressibility.

Type: Grant

Filed: April 26, 2015

Date of Patent: March 13, 2018

Assignee: International Business Machines Corporation

Inventors: Danny Harnik, Dmitry Sotnikov
Direct lookup for identifying duplicate data in a data deduplication system

Patent number: 9817865

Abstract: Various embodiments for identifying data in a data deduplication system, by a processor device, are provided. In one embodiment, a method comprises efficiently identifying duplicate data in the data deduplication system by identifying fingerprint matches using a direct inter-region fingerprint lookup to search for the fingerprint matches in at least one of a plurality of metadata regions, the direct inter-region fingerprint lookup supplementing a central fingerprint index.

Type: Grant

Filed: December 7, 2015

Date of Patent: November 14, 2017

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: David D. Chambliss, Jonathan Fischer-Toubol, Joseph S. Glider, Danny Harnik, Ety Khaitzin, Yifat Kuttner, Michael Moser, Yosef Shatsky
Real-time classification of data into data compression domains

Patent number: 9792350

Abstract: For real-time classification of data into data compression domains, a decision is made for which of the data compression domains write operations should be forwarded by reading randomly selected data of the write operations for computing a set of classifying heuristics thereby creating a fingerprint for each of the write operations. The write operations having a similar fingerprint are compressed together in a similar compression stream.

Type: Grant

Filed: January 10, 2013

Date of Patent: October 17, 2017

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Jonathan Amit, Lilia Demidov, George Goldberg, Nir Halowani, Danny Harnik, Chaim Koifman, Sergey Marenkov, Oded Margalit, Kat I. Ronen, Dmitry Sotnikov
REGION-INTEGRATED DATA DEDUPLICATION IMPLEMENTING A MULTI-LIFETIME DUPLICATE FINDER

Publication number: 20170286444

Abstract: Computer program products, as well as corresponding systems and methods are configured for performing deduplication in conjunction with random read and write operations, and include: receiving a write request comprising data; computing a fingerprint of the data; determining whether a short term dictionary comprises an entry corresponding to the fingerprint; in response to determining the short term dictionary comprises the entry corresponding to the fingerprint, writing the data to a data store in a deduplicating manner; in response to determining the short term dictionary does not comprise the entry, determining whether a long term dictionary corresponding to the namespace comprises the entry; in response to determining the long term dictionary comprises the entry, writing the data to the data store in the deduplicating manner; and in response to determining the long term dictionary does not comprise the entry, writing the data to the data store in a non-deduplicating manner.

Type: Application

Filed: March 29, 2016

Publication date: October 5, 2017

Inventors: David D. Chambliss, Joseph S. Glider, Danny Harnik, Ety Khaitzin
DEDUPLICATION RATIO ESTIMATION USING AN EXPANDABLE BASIS SET

Publication number: 20170262467

Abstract: A computer-implemented method includes receiving a set of basis fingerprints corresponding to image chunks within a basis set of image regions wherein each image region within the basis set of image regions comprises one or more image chunks, and generating a fingerprint for each image chunk of a plurality of selected image chunks within an unprocessed region of a machine image to produce a plurality of sampled fingerprints. The method also includes determining a similarity metric for the unprocessed region from the sampled fingerprints and the basis fingerprints, comparing the similarity metric for the unprocessed region with a selected threshold, and including the unprocessed region within the basis set of image regions in response to determining that the similarity metric is less than the selected threshold. A corresponding computer program product and computer system are also disclosed herein.

Type: Application

Filed: March 8, 2016

Publication date: September 14, 2017

Inventors: Danny Harnik, Ronen I. Kat, Ety Khaitzin, Sergey Marenkov
DEDUPLICATION RATIO ESTIMATION USING REPRESENTATIVE IMAGES

Publication number: 20170262466

Abstract: A computer-implemented method includes receiving a set of representative machine image regions for a computing environment wherein the set of representative machine image regions collectively comprise a set of representative image chunks. The method also includes generating a fingerprint for each representative image chunk within the set of representative image chunks to produce a set of representative fingerprints, generating a fingerprint for selected image chunks within a measured machine image region to produce a set of sampled fingerprints, and determining a deduplication metric for the measured machine image region based on the representative fingerprints and the sampled fingerprints. A corresponding computer program product and computer system are also disclosed herein.

Type: Application

Filed: March 8, 2016

Publication date: September 14, 2017

Inventors: Jonathan Amit, Danny Harnik, Ety Khaitzin, Sergey Marenkov
DEDUPLICATION RATIO ESTIMATION USING AN EXPANDABLE BASIS SET

Publication number: 20170262468

Abstract: A computer-implemented method includes receiving a set of basis fingerprints corresponding to image chunks within a basis set of image regions wherein each image region within the basis set of image regions comprises one or more image chunks, and generating a fingerprint for each image chunk of a plurality of selected image chunks within an unprocessed region of a machine image to produce a plurality of sampled fingerprints. The method also includes determining a similarity metric for the unprocessed region from the sampled fingerprints and the basis fingerprints, comparing the similarity metric for the unprocessed region with a selected threshold, and including the unprocessed region within the basis set of image regions in response to determining that the similarity metric is less than the selected threshold. A corresponding computer program product and computer system are also disclosed herein.

Type: Application

Filed: May 22, 2017

Publication date: September 14, 2017

Inventors: Danny Harnik, Ronen I. Kat, Ety Khaitzin, Sergey Marenkov
Lookup-based data block alignment for data deduplication

Patent number: 9760578

Abstract: Calculating fingerprints for each one of a multiplicity of alignment combinations of fixed-size deduplication data blocks and comparing each of the fingerprints to stored deduplicated data fingerprints in a lookup database for determining a preferred deduplication data block alignment. A deduplication data block comprises each of the fixed-size deduplication data blocks.

Type: Grant

Filed: July 23, 2014

Date of Patent: September 12, 2017

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Aviv Caro, Danny Harnik, Ety Khaitzin, Chaim Koifman, Sergey Marenkov, Ben Sasson, Yosef Shatsky, Dmitry Sotnikov, Shai I. Tahar
LOW MEMORY SAMPLING-BASED ESTIMATION OF DISTINCT ELEMENTS AND DEDUPLICATION

Publication number: 20170199904

Abstract: Methods, computing systems and computer program products implement embodiments of the present invention that include partitioning a dataset into a full set of logical data units, and selecting a sample subset of the full set, the sample subset including a random sample of the full set based on a sampling ratio. A set of target hash values are selected from a full range of hash values, and, using a hash function, a respective unit hash value is calculated for each of the logical data units in the sample subset. A histogram is computed that indicates a duplication count of each of the unit hash values that matches a given target hash value, and based on the histogram, a number of distinct logical data units in the full set is estimated.

Type: Application

Filed: January 13, 2016

Publication date: July 13, 2017

Applicant: International Business Machines Corporation

Inventors: Danny Harnik, Ety KHAITZIN, Dmitry SOTNIKOV
NETWORK UTILIZATION IMPROVEMENT BY DATA REDUCTION BASED MIGRATION PRIORITIZATION

Publication number: 20170201602

Abstract: Methods and systems for data transfer include adding a data chunks to a priority queue in an order based on utilization priority. A reducibility score for the data chunks is determined. A data reduction operation is performed on a data chunk having a highest reducibility in the priority queue using a processor if sufficient resources are available. The data chunk having the lowest reducibility score is moved from the priority queue to a transfer queue for transmission if the transfer queue is not full.

Type: Application

Filed: January 13, 2016

Publication date: July 13, 2017

Inventors: Danny Harnik, Alexei Karve, Andrzej Kochut, Dmitry Sotnikov
SAMPLING-BASED DEDUPLICATION ESTIMATION

Publication number: 20170199895

Abstract: A method, including partitioning a dataset into a first number of data units, and selecting, based on a sampling ratio, a second number of the data units. A hash value is calculated for each of the selected data units, and a first histogram is computed indicating a first duplication count for each of the calculated hash values. Based on respective frequencies of the calculated hash values, a second histogram is computed indicating an observed frequency for each of the first duplication counts in the first histogram, and based on the sampling ratio and the second histogram, a target function is derived. A third histogram that minimizes the target function is derived, the third histogram including, for the first number of the storage units, second duplication counts and a respective predicted frequency for each of the second duplication counts. Finally, a deduplication ratio is determined based on the third histogram.

Type: Application

Filed: January 13, 2016

Publication date: July 13, 2017

Inventors: Danny Harnik, David Chambliss, Oded Margalit, Dmitry Sotnikov
GAUGING ACCURACY OF SAMPLING-BASED DISTINCT ELEMENT ESTIMATION

Publication number: 20170199892

Abstract: A method, including identifying, using a sampling ratio, a random number of logical data units. A hash is calculated for each of the identified logical data units, and a first histogram is computed indicating a duplication count of each of the calculated hashes. Based on respective frequencies of the calculated hashes, a second histogram is computed indicating observed frequencies of each of the duplication counts in the first histogram, and based on the sampling ratio and the second histogram, a target function is derived. A range of acceptable results is derived for the target function, and based on the range of the acceptable results, a set of plausible duplication frequency histograms is defined. A first given plausible duplication frequency histogram having a highest number of distinct logical data units is identified, and a second given plausible duplication frequency histogram having a lowest number of distinct logical data units is identified.

Type: Application

Filed: January 13, 2016

Publication date: July 13, 2017

Applicant: International Business Machines Corporation

Inventors: Danny Harnik, Ety Khaitzin, Dmitry Sotnikov

prev 1 2 3 4 5 6 next