Patents by Inventor Ety KHAITZIN

Ety KHAITZIN has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

FINE-GRAINED PRIVACY ENFORCEMENT AND POLICY-BASED DATA ACCESS CONTROL AT SCALE

Publication number: 20190286828

Abstract: Embodiments of the present systems and methods may provide a data access approval process that supports complex and fine-grained policies and can be applied to different data items at scale, which provides improvement over current technologies. For example, in an embodiment, a computer-implemented method for controlling access to data by computer systems may comprise generating an intermediate representation by integrating a combination of data access policies, data attributes including attributes per data subject, and the data itself to form the intermediate representation, receiving a request for access to the data, rewriting the request for access to the data to incorporate the intermediate representation so as to provide access only to data allowed by the policies integrated into the intermediate representation, and executing the rewritten request and providing only data allowed by the policies integrated into the intermediate representation.

Type: Application

Filed: March 19, 2018

Publication date: September 19, 2019

Inventors: MAYA ANDERSON, RONEN Itshak KAT, ROEE SHLOMO, ETY KHAITZIN
REGION-INTEGRATED DATA DEDUPLICATION IMPLEMENTING A MULTI-LIFETIME DUPLICATE FINDER

Publication number: 20190272258

Abstract: Computer program products, as well as corresponding systems and methods are configured for performing deduplication in conjunction with random read and write operations, and include: computing a fingerprint of data included in a write request; determining whether a short term dictionary comprises an entry corresponding to the fingerprint; in response to determining the short term dictionary comprises the entry corresponding to the fingerprint, writing the data to a data store in a deduplicating manner; in response to determining the short term dictionary does not comprise the entry, determining whether a long term dictionary corresponding to the namespace comprises the entry; in response to determining the long term dictionary comprises the entry, writing the data to the data store in the deduplicating manner; and in response to determining the long term dictionary does not comprise the entry, writing the data to the data store in a non-deduplicating manner.

Type: Application

Filed: May 21, 2019

Publication date: September 5, 2019

Inventors: David D. Chambliss, Joseph S. Glider, Danny Harnik, Ety Khaitzin
Region-integrated data deduplication implementing a multi-lifetime duplicate finder

Patent number: 10394764

Abstract: Computer program products, as well as corresponding systems and methods are configured for performing deduplication in conjunction with random read and write operations, and include: receiving a write request comprising data; computing a fingerprint of the data; determining whether a short term dictionary comprises an entry corresponding to the fingerprint; in response to determining the short term dictionary comprises the entry corresponding to the fingerprint, writing the data to a data store in a deduplicating manner; in response to determining the short term dictionary does not comprise the entry, determining whether a long term dictionary corresponding to the namespace comprises the entry; in response to determining the long term dictionary comprises the entry, writing the data to the data store in the deduplicating manner; and in response to determining the long term dictionary does not comprise the entry, writing the data to the data store in a non-deduplicating manner.

Type: Grant

Filed: March 29, 2016

Date of Patent: August 27, 2019

Assignee: International Business Machines Corporation

Inventors: David D. Chambliss, Joseph S. Glider, Danny Harnik, Ety Khaitzin
Heterogeneous compression in replicated storage

Patent number: 10394846

Abstract: Various embodiments for data management in a replicated storage environment, by a processor device, are provided. In one embodiment, a method comprises storing a plurality of data replicas under a plurality of heterogeneous compression algorithms, wherein one of the data replicas is optimized for a data operation.

Type: Grant

Filed: August 25, 2015

Date of Patent: August 27, 2019

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Danny Harnik, Ety Khaitzin, Sergey Marenkov, Dmitry Sotnikov
Deduplication ratio estimation using representative images

Patent number: 10395145

Abstract: A computer-implemented method includes receiving a set of representative machine image regions for a computing environment wherein the set of representative machine image regions collectively comprise a set of representative image chunks. The method also includes generating a fingerprint for each representative image chunk within the set of representative image chunks to produce a set of representative fingerprints, generating a fingerprint for selected image chunks within a measured machine image region to produce a set of sampled fingerprints, and determining a deduplication metric for the measured machine image region based on the representative fingerprints and the sampled fingerprints. A corresponding computer program product and computer system are also disclosed herein.

Type: Grant

Filed: March 8, 2016

Date of Patent: August 27, 2019

Assignee: International Business Machines Corporation

Inventors: Jonathan Amit, Danny Harnik, Ety Khaitzin, Sergey Marenkov
Performance of Dispersed Location-Based Deduplication

Publication number: 20190121563

Abstract: A mechanism is provided for dispersed location-based data storage. A request is received to write a data file to a referrer memory region in a set of memory regions. For each data chunk of the data file, responsive to a comparison of a hash value for the data chunk to other hash values for other stored data chunks referenced in the referrer memory region indicating that the data chunk fails to exist in the referrer memory region, responsive to the data chunk existing in another memory region in the set of memory regions, responsive to the memory region failing to be one of the predetermined number N of owner memory regions associated with the referrer memory region, and responsive to the predetermined number N of owner memory regions failing to have been met, a reference to the data chunk is stored in the referrer memory region.

Type: Application

Filed: October 25, 2017

Publication date: April 25, 2019

Inventors: Reut Cohen, Jonathan Fischer-Toubol, Afief Halumi, Danny Harnik, Ety Khaitzin, Sergey Marenkov, Asaf Porat-Stoler, Yosef Shatsky, Tom Sivan
Identification of high deduplication data

Patent number: 10255290

Abstract: A computer-implemented method includes dividing a data set into a plurality of regions and dividing the plurality of regions into a plurality of chunks of fixed size. The computer-implemented method further includes determining a sample size of the plurality of chunks to be sampled for each region, wherein the sample size is determined based, at least in part, on an acceptance of a likelihood of identifying at least one collision between two regions corresponding to logical entities of a first cluster of logical entities. The computer-implemented method further includes sampling the plurality of chunks for each region based on the determined sample size. The computer-implemented method further includes generating a hash value for each chunk sampled and storing each hash value in an index. The computer-implemented method further includes identifying one or more collisions between the plurality of regions. A corresponding computer system and computer program product are also disclosed.

Type: Grant

Filed: April 17, 2018

Date of Patent: April 9, 2019

Assignee: International Business Machines Corporation

Inventors: Danny Harnik, Ety Khaitzin, Sergey Marenkov, Dmitry Sotnikov
Identification of high deduplication data

Patent number: 10235379

Abstract: A computer-implemented method includes dividing a data set into a plurality of regions and dividing the plurality of regions into a plurality of chunks of fixed size. The computer-implemented method further includes determining a sample size of the plurality of chunks to be sampled for each region, wherein the sample size is determined based, at least in part, on an acceptance of a likelihood of identifying at least one collision between two regions corresponding to logical entities of a first cluster of logical entities. The computer-implemented method further includes sampling the plurality of chunks for each region based on the determined sample size. The computer-implemented method further includes generating a hash value for each chunk sampled and storing each hash value in an index. The computer-implemented method further includes identifying one or more collisions between the plurality of regions. A corresponding computer system and computer program product are also disclosed.

Type: Grant

Filed: April 17, 2018

Date of Patent: March 19, 2019

Assignee: International Business Machines Corporation

Inventors: Danny Harnik, Ety Khaitzin, Sergey Marenkov, Dmitry Sotnikov
Gauging accuracy of sampling-based distinct element estimation

Patent number: 10169364

Abstract: A method, including identifying, using a sampling ratio, a random number of logical data units. A hash is calculated for each of the identified logical data units, and a first histogram is computed indicating a duplication count of each of the calculated hashes. Based on respective frequencies of the calculated hashes, a second histogram is computed indicating observed frequencies of each of the duplication counts in the first histogram, and based on the sampling ratio and the second histogram, a target function is derived. A range of acceptable results is derived for the target function, and based on the range of the acceptable results, a set of plausible duplication frequency histograms is defined. A first given plausible duplication frequency histogram having a highest number of distinct logical data units is identified, and a second given plausible duplication frequency histogram having a lowest number of distinct logical data units is identified.

Type: Grant

Filed: January 13, 2016

Date of Patent: January 1, 2019

Assignee: International Business Machines Corporation

Inventors: Danny Harnik, Ety Khaitzin, Dmitry Sotnikov
Low memory sampling-based estimation of distinct elements and deduplication

Patent number: 10162867

Abstract: Methods, computing systems and computer program products implement embodiments of the present invention that include partitioning a dataset into a full set of logical data units, and selecting a sample subset of the full set, the sample subset including a random sample of the full set based on a sampling ratio. A set of target hash values are selected from a full range of hash values, and, using a hash function, a respective unit hash value is calculated for each of the logical data units in the sample subset. A histogram is computed that indicates a duplication count of each of the unit hash values that matches a given target hash value, and based on the histogram, a number of distinct logical data units in the full set is estimated.

Type: Grant

Filed: January 13, 2016

Date of Patent: December 25, 2018

Assignee: International Business Machines Corporation

Inventors: Danny Harnik, Ety Khaitzin, Dmitry Sotnikov
IDENTIFICATION OF HIGH DEDUPLICATION DATA

Publication number: 20180225301

Abstract: A computer-implemented method includes dividing a data set into a plurality of regions and dividing the plurality of regions into a plurality of chunks of fixed size. The computer-implemented method further includes determining a sample size of the plurality of chunks to be sampled for each region, wherein the sample size is determined based, at least in part, on an acceptance of a likelihood of identifying at least one collision between two regions corresponding to logical entities of a first cluster of logical entities. The computer-implemented method further includes sampling the plurality of chunks for each region based on the determined sample size. The computer-implemented method further includes generating a hash value for each chunk sampled and storing each hash value in an index. The computer-implemented method further includes identifying one or more collisions between the plurality of regions. A corresponding computer system and computer program product are also disclosed.

Type: Application

Filed: April 17, 2018

Publication date: August 9, 2018

Inventors: Danny Harnik, Ety Khaitzin, Sergey Marenkov, Dmitry Sotnikov
IDENTIFICATION OF HIGH DEDUPLICATION DATA

Publication number: 20180225300

Abstract: A computer-implemented method includes dividing a data set into a plurality of regions and dividing the plurality of regions into a plurality of chunks of fixed size. The computer-implemented method further includes determining a sample size of the plurality of chunks to be sampled for each region, wherein the sample size is determined based, at least in part, on an acceptance of a likelihood of identifying at least one collision between two regions corresponding to logical entities of a first cluster of logical entities. The computer-implemented method further includes sampling the plurality of chunks for each region based on the determined sample size. The computer-implemented method further includes generating a hash value for each chunk sampled and storing each hash value in an index. The computer-implemented method further includes identifying one or more collisions between the plurality of regions. A corresponding computer system and computer program product are also disclosed.

Type: Application

Filed: April 17, 2018

Publication date: August 9, 2018

Inventors: Danny Harnik, Ety Khaitzin, Sergey Marenkov, Dmitry Sotnikov
IDENTIFICATION OF HIGH DEDUPLICATION DATA

Publication number: 20180150474

Abstract: A computer-implemented method includes dividing a data set into a plurality of regions and dividing the plurality of regions into a plurality of chunks of fixed size. The computer-implemented method further includes determining a sample size of the plurality of chunks to be sampled for each region, wherein the sample size is determined based, at least in part, on an acceptance of a likelihood of identifying at least one collision between two regions corresponding to logical entities of a first cluster of logical entities. The computer-implemented method further includes sampling the plurality of chunks for each region based on the determined sample size. The computer-implemented method further includes generating a hash value for each chunk sampled and storing each hash value in an index. The computer-implemented method further includes identifying one or more collisions between the plurality of regions. A corresponding computer system and computer program product are also disclosed.

Type: Application

Filed: August 16, 2017

Publication date: May 31, 2018

Inventors: Danny Harnik, Ety Khaitzin, Sergey Marenkov, Dmitry Sotnikov
IDENTIFICATION OF HIGH DEDUPLICATION DATA

Publication number: 20180150473

Abstract: A computer-implemented method includes dividing a data set into a plurality of regions and dividing the plurality of regions into a plurality of chunks of fixed size. The computer-implemented method further includes determining a sample size of the plurality of chunks to be sampled for each region, wherein the sample size is determined based, at least in part, on an acceptance of a likelihood of identifying at least one collision between two regions corresponding to logical entities of a first cluster of logical entities. The computer-implemented method further includes sampling the plurality of chunks for each region based on the determined sample size. The computer-implemented method further includes generating a hash value for each chunk sampled and storing each hash value in an index. The computer-implemented method further includes identifying one or more collisions between the plurality of regions. A corresponding computer system and computer program product are also disclosed.

Type: Application

Filed: November 30, 2016

Publication date: May 31, 2018

Inventors: Danny Harnik, Ety Khaitzin, Sergey Marenkov, Dmitry Sotnikov
Identification of high deduplication data

Patent number: 9984092

Abstract: A computer-implemented method includes dividing a data set into a plurality of regions and dividing the plurality of regions into a plurality of chunks of fixed size. The computer-implemented method further includes determining a sample size of the plurality of chunks to be sampled for each region, wherein the sample size is determined based, at least in part, on an acceptance of a likelihood of identifying at least one collision between two regions corresponding to logical entities of a first cluster of logical entities. The computer-implemented method further includes sampling the plurality of chunks for each region based on the determined sample size. The computer-implemented method further includes generating a hash value for each chunk sampled and storing each hash value in an index. The computer-implemented method further includes identifying one or more collisions between the plurality of regions. A corresponding computer system and computer program product are also disclosed.

Type: Grant

Filed: August 16, 2017

Date of Patent: May 29, 2018

Assignee: International Business Machines Corporation

Inventors: Danny Harnik, Ety Khaitzin, Sergey Marenkov, Dmitry Sotnikov
Managing Volumes with Deduplication using Volume Sketches

Publication number: 20180074745

Abstract: This invention relates to a system and method for managing deduplication of volume regions. A storage controller receives the volume hashes of the data stored on a set of storage devices and produces a set of volume sketches. The volume sketches represent a fraction of the hashes. From these sketches and mergers of sketches, the deduped storage size is estimated. The sketches also integrate compression ratios in systems combining both deduplication and compression and are combined with off-line scans in systems that do not have deduplication.

Type: Application

Filed: September 12, 2016

Publication date: March 15, 2018

Inventors: Danny Harnik, Ronen Kat, Ety Khaitzin
Direct lookup for identifying duplicate data in a data deduplication system

Patent number: 9817865

Abstract: Various embodiments for identifying data in a data deduplication system, by a processor device, are provided. In one embodiment, a method comprises efficiently identifying duplicate data in the data deduplication system by identifying fingerprint matches using a direct inter-region fingerprint lookup to search for the fingerprint matches in at least one of a plurality of metadata regions, the direct inter-region fingerprint lookup supplementing a central fingerprint index.

Type: Grant

Filed: December 7, 2015

Date of Patent: November 14, 2017

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: David D. Chambliss, Jonathan Fischer-Toubol, Joseph S. Glider, Danny Harnik, Ety Khaitzin, Yifat Kuttner, Michael Moser, Yosef Shatsky
REGION-INTEGRATED DATA DEDUPLICATION IMPLEMENTING A MULTI-LIFETIME DUPLICATE FINDER

Publication number: 20170286444

Abstract: Computer program products, as well as corresponding systems and methods are configured for performing deduplication in conjunction with random read and write operations, and include: receiving a write request comprising data; computing a fingerprint of the data; determining whether a short term dictionary comprises an entry corresponding to the fingerprint; in response to determining the short term dictionary comprises the entry corresponding to the fingerprint, writing the data to a data store in a deduplicating manner; in response to determining the short term dictionary does not comprise the entry, determining whether a long term dictionary corresponding to the namespace comprises the entry; in response to determining the long term dictionary comprises the entry, writing the data to the data store in the deduplicating manner; and in response to determining the long term dictionary does not comprise the entry, writing the data to the data store in a non-deduplicating manner.

Type: Application

Filed: March 29, 2016

Publication date: October 5, 2017

Inventors: David D. Chambliss, Joseph S. Glider, Danny Harnik, Ety Khaitzin
DEDUPLICATION RATIO ESTIMATION USING REPRESENTATIVE IMAGES

Publication number: 20170262466

Abstract: A computer-implemented method includes receiving a set of representative machine image regions for a computing environment wherein the set of representative machine image regions collectively comprise a set of representative image chunks. The method also includes generating a fingerprint for each representative image chunk within the set of representative image chunks to produce a set of representative fingerprints, generating a fingerprint for selected image chunks within a measured machine image region to produce a set of sampled fingerprints, and determining a deduplication metric for the measured machine image region based on the representative fingerprints and the sampled fingerprints. A corresponding computer program product and computer system are also disclosed herein.

Type: Application

Filed: March 8, 2016

Publication date: September 14, 2017

Inventors: Jonathan Amit, Danny Harnik, Ety Khaitzin, Sergey Marenkov
DEDUPLICATION RATIO ESTIMATION USING AN EXPANDABLE BASIS SET

Publication number: 20170262467

Abstract: A computer-implemented method includes receiving a set of basis fingerprints corresponding to image chunks within a basis set of image regions wherein each image region within the basis set of image regions comprises one or more image chunks, and generating a fingerprint for each image chunk of a plurality of selected image chunks within an unprocessed region of a machine image to produce a plurality of sampled fingerprints. The method also includes determining a similarity metric for the unprocessed region from the sampled fingerprints and the basis fingerprints, comparing the similarity metric for the unprocessed region with a selected threshold, and including the unprocessed region within the basis set of image regions in response to determining that the similarity metric is less than the selected threshold. A corresponding computer program product and computer system are also disclosed herein.

Type: Application

Filed: March 8, 2016

Publication date: September 14, 2017

Inventors: Danny Harnik, Ronen I. Kat, Ety Khaitzin, Sergey Marenkov

prev 1 2 3 next