Patents by Inventor Ety KHAITZIN

Ety KHAITZIN has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20190286828
    Abstract: Embodiments of the present systems and methods may provide a data access approval process that supports complex and fine-grained policies and can be applied to different data items at scale, which provides improvement over current technologies. For example, in an embodiment, a computer-implemented method for controlling access to data by computer systems may comprise generating an intermediate representation by integrating a combination of data access policies, data attributes including attributes per data subject, and the data itself to form the intermediate representation, receiving a request for access to the data, rewriting the request for access to the data to incorporate the intermediate representation so as to provide access only to data allowed by the policies integrated into the intermediate representation, and executing the rewritten request and providing only data allowed by the policies integrated into the intermediate representation.
    Type: Application
    Filed: March 19, 2018
    Publication date: September 19, 2019
    Inventors: MAYA ANDERSON, RONEN Itshak KAT, ROEE SHLOMO, ETY KHAITZIN
  • Publication number: 20190272258
    Abstract: Computer program products, as well as corresponding systems and methods are configured for performing deduplication in conjunction with random read and write operations, and include: computing a fingerprint of data included in a write request; determining whether a short term dictionary comprises an entry corresponding to the fingerprint; in response to determining the short term dictionary comprises the entry corresponding to the fingerprint, writing the data to a data store in a deduplicating manner; in response to determining the short term dictionary does not comprise the entry, determining whether a long term dictionary corresponding to the namespace comprises the entry; in response to determining the long term dictionary comprises the entry, writing the data to the data store in the deduplicating manner; and in response to determining the long term dictionary does not comprise the entry, writing the data to the data store in a non-deduplicating manner.
    Type: Application
    Filed: May 21, 2019
    Publication date: September 5, 2019
    Inventors: David D. Chambliss, Joseph S. Glider, Danny Harnik, Ety Khaitzin
  • Patent number: 10394764
    Abstract: Computer program products, as well as corresponding systems and methods are configured for performing deduplication in conjunction with random read and write operations, and include: receiving a write request comprising data; computing a fingerprint of the data; determining whether a short term dictionary comprises an entry corresponding to the fingerprint; in response to determining the short term dictionary comprises the entry corresponding to the fingerprint, writing the data to a data store in a deduplicating manner; in response to determining the short term dictionary does not comprise the entry, determining whether a long term dictionary corresponding to the namespace comprises the entry; in response to determining the long term dictionary comprises the entry, writing the data to the data store in the deduplicating manner; and in response to determining the long term dictionary does not comprise the entry, writing the data to the data store in a non-deduplicating manner.
    Type: Grant
    Filed: March 29, 2016
    Date of Patent: August 27, 2019
    Assignee: International Business Machines Corporation
    Inventors: David D. Chambliss, Joseph S. Glider, Danny Harnik, Ety Khaitzin
  • Patent number: 10394846
    Abstract: Various embodiments for data management in a replicated storage environment, by a processor device, are provided. In one embodiment, a method comprises storing a plurality of data replicas under a plurality of heterogeneous compression algorithms, wherein one of the data replicas is optimized for a data operation.
    Type: Grant
    Filed: August 25, 2015
    Date of Patent: August 27, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Danny Harnik, Ety Khaitzin, Sergey Marenkov, Dmitry Sotnikov
  • Patent number: 10395145
    Abstract: A computer-implemented method includes receiving a set of representative machine image regions for a computing environment wherein the set of representative machine image regions collectively comprise a set of representative image chunks. The method also includes generating a fingerprint for each representative image chunk within the set of representative image chunks to produce a set of representative fingerprints, generating a fingerprint for selected image chunks within a measured machine image region to produce a set of sampled fingerprints, and determining a deduplication metric for the measured machine image region based on the representative fingerprints and the sampled fingerprints. A corresponding computer program product and computer system are also disclosed herein.
    Type: Grant
    Filed: March 8, 2016
    Date of Patent: August 27, 2019
    Assignee: International Business Machines Corporation
    Inventors: Jonathan Amit, Danny Harnik, Ety Khaitzin, Sergey Marenkov
  • Publication number: 20190121563
    Abstract: A mechanism is provided for dispersed location-based data storage. A request is received to write a data file to a referrer memory region in a set of memory regions. For each data chunk of the data file, responsive to a comparison of a hash value for the data chunk to other hash values for other stored data chunks referenced in the referrer memory region indicating that the data chunk fails to exist in the referrer memory region, responsive to the data chunk existing in another memory region in the set of memory regions, responsive to the memory region failing to be one of the predetermined number N of owner memory regions associated with the referrer memory region, and responsive to the predetermined number N of owner memory regions failing to have been met, a reference to the data chunk is stored in the referrer memory region.
    Type: Application
    Filed: October 25, 2017
    Publication date: April 25, 2019
    Inventors: Reut Cohen, Jonathan Fischer-Toubol, Afief Halumi, Danny Harnik, Ety Khaitzin, Sergey Marenkov, Asaf Porat-Stoler, Yosef Shatsky, Tom Sivan
  • Patent number: 10255290
    Abstract: A computer-implemented method includes dividing a data set into a plurality of regions and dividing the plurality of regions into a plurality of chunks of fixed size. The computer-implemented method further includes determining a sample size of the plurality of chunks to be sampled for each region, wherein the sample size is determined based, at least in part, on an acceptance of a likelihood of identifying at least one collision between two regions corresponding to logical entities of a first cluster of logical entities. The computer-implemented method further includes sampling the plurality of chunks for each region based on the determined sample size. The computer-implemented method further includes generating a hash value for each chunk sampled and storing each hash value in an index. The computer-implemented method further includes identifying one or more collisions between the plurality of regions. A corresponding computer system and computer program product are also disclosed.
    Type: Grant
    Filed: April 17, 2018
    Date of Patent: April 9, 2019
    Assignee: International Business Machines Corporation
    Inventors: Danny Harnik, Ety Khaitzin, Sergey Marenkov, Dmitry Sotnikov
  • Patent number: 10235379
    Abstract: A computer-implemented method includes dividing a data set into a plurality of regions and dividing the plurality of regions into a plurality of chunks of fixed size. The computer-implemented method further includes determining a sample size of the plurality of chunks to be sampled for each region, wherein the sample size is determined based, at least in part, on an acceptance of a likelihood of identifying at least one collision between two regions corresponding to logical entities of a first cluster of logical entities. The computer-implemented method further includes sampling the plurality of chunks for each region based on the determined sample size. The computer-implemented method further includes generating a hash value for each chunk sampled and storing each hash value in an index. The computer-implemented method further includes identifying one or more collisions between the plurality of regions. A corresponding computer system and computer program product are also disclosed.
    Type: Grant
    Filed: April 17, 2018
    Date of Patent: March 19, 2019
    Assignee: International Business Machines Corporation
    Inventors: Danny Harnik, Ety Khaitzin, Sergey Marenkov, Dmitry Sotnikov
  • Patent number: 10169364
    Abstract: A method, including identifying, using a sampling ratio, a random number of logical data units. A hash is calculated for each of the identified logical data units, and a first histogram is computed indicating a duplication count of each of the calculated hashes. Based on respective frequencies of the calculated hashes, a second histogram is computed indicating observed frequencies of each of the duplication counts in the first histogram, and based on the sampling ratio and the second histogram, a target function is derived. A range of acceptable results is derived for the target function, and based on the range of the acceptable results, a set of plausible duplication frequency histograms is defined. A first given plausible duplication frequency histogram having a highest number of distinct logical data units is identified, and a second given plausible duplication frequency histogram having a lowest number of distinct logical data units is identified.
    Type: Grant
    Filed: January 13, 2016
    Date of Patent: January 1, 2019
    Assignee: International Business Machines Corporation
    Inventors: Danny Harnik, Ety Khaitzin, Dmitry Sotnikov
  • Patent number: 10162867
    Abstract: Methods, computing systems and computer program products implement embodiments of the present invention that include partitioning a dataset into a full set of logical data units, and selecting a sample subset of the full set, the sample subset including a random sample of the full set based on a sampling ratio. A set of target hash values are selected from a full range of hash values, and, using a hash function, a respective unit hash value is calculated for each of the logical data units in the sample subset. A histogram is computed that indicates a duplication count of each of the unit hash values that matches a given target hash value, and based on the histogram, a number of distinct logical data units in the full set is estimated.
    Type: Grant
    Filed: January 13, 2016
    Date of Patent: December 25, 2018
    Assignee: International Business Machines Corporation
    Inventors: Danny Harnik, Ety Khaitzin, Dmitry Sotnikov
  • Publication number: 20180225301
    Abstract: A computer-implemented method includes dividing a data set into a plurality of regions and dividing the plurality of regions into a plurality of chunks of fixed size. The computer-implemented method further includes determining a sample size of the plurality of chunks to be sampled for each region, wherein the sample size is determined based, at least in part, on an acceptance of a likelihood of identifying at least one collision between two regions corresponding to logical entities of a first cluster of logical entities. The computer-implemented method further includes sampling the plurality of chunks for each region based on the determined sample size. The computer-implemented method further includes generating a hash value for each chunk sampled and storing each hash value in an index. The computer-implemented method further includes identifying one or more collisions between the plurality of regions. A corresponding computer system and computer program product are also disclosed.
    Type: Application
    Filed: April 17, 2018
    Publication date: August 9, 2018
    Inventors: Danny Harnik, Ety Khaitzin, Sergey Marenkov, Dmitry Sotnikov
  • Publication number: 20180225300
    Abstract: A computer-implemented method includes dividing a data set into a plurality of regions and dividing the plurality of regions into a plurality of chunks of fixed size. The computer-implemented method further includes determining a sample size of the plurality of chunks to be sampled for each region, wherein the sample size is determined based, at least in part, on an acceptance of a likelihood of identifying at least one collision between two regions corresponding to logical entities of a first cluster of logical entities. The computer-implemented method further includes sampling the plurality of chunks for each region based on the determined sample size. The computer-implemented method further includes generating a hash value for each chunk sampled and storing each hash value in an index. The computer-implemented method further includes identifying one or more collisions between the plurality of regions. A corresponding computer system and computer program product are also disclosed.
    Type: Application
    Filed: April 17, 2018
    Publication date: August 9, 2018
    Inventors: Danny Harnik, Ety Khaitzin, Sergey Marenkov, Dmitry Sotnikov
  • Publication number: 20180150474
    Abstract: A computer-implemented method includes dividing a data set into a plurality of regions and dividing the plurality of regions into a plurality of chunks of fixed size. The computer-implemented method further includes determining a sample size of the plurality of chunks to be sampled for each region, wherein the sample size is determined based, at least in part, on an acceptance of a likelihood of identifying at least one collision between two regions corresponding to logical entities of a first cluster of logical entities. The computer-implemented method further includes sampling the plurality of chunks for each region based on the determined sample size. The computer-implemented method further includes generating a hash value for each chunk sampled and storing each hash value in an index. The computer-implemented method further includes identifying one or more collisions between the plurality of regions. A corresponding computer system and computer program product are also disclosed.
    Type: Application
    Filed: August 16, 2017
    Publication date: May 31, 2018
    Inventors: Danny Harnik, Ety Khaitzin, Sergey Marenkov, Dmitry Sotnikov
  • Publication number: 20180150473
    Abstract: A computer-implemented method includes dividing a data set into a plurality of regions and dividing the plurality of regions into a plurality of chunks of fixed size. The computer-implemented method further includes determining a sample size of the plurality of chunks to be sampled for each region, wherein the sample size is determined based, at least in part, on an acceptance of a likelihood of identifying at least one collision between two regions corresponding to logical entities of a first cluster of logical entities. The computer-implemented method further includes sampling the plurality of chunks for each region based on the determined sample size. The computer-implemented method further includes generating a hash value for each chunk sampled and storing each hash value in an index. The computer-implemented method further includes identifying one or more collisions between the plurality of regions. A corresponding computer system and computer program product are also disclosed.
    Type: Application
    Filed: November 30, 2016
    Publication date: May 31, 2018
    Inventors: Danny Harnik, Ety Khaitzin, Sergey Marenkov, Dmitry Sotnikov
  • Patent number: 9984092
    Abstract: A computer-implemented method includes dividing a data set into a plurality of regions and dividing the plurality of regions into a plurality of chunks of fixed size. The computer-implemented method further includes determining a sample size of the plurality of chunks to be sampled for each region, wherein the sample size is determined based, at least in part, on an acceptance of a likelihood of identifying at least one collision between two regions corresponding to logical entities of a first cluster of logical entities. The computer-implemented method further includes sampling the plurality of chunks for each region based on the determined sample size. The computer-implemented method further includes generating a hash value for each chunk sampled and storing each hash value in an index. The computer-implemented method further includes identifying one or more collisions between the plurality of regions. A corresponding computer system and computer program product are also disclosed.
    Type: Grant
    Filed: August 16, 2017
    Date of Patent: May 29, 2018
    Assignee: International Business Machines Corporation
    Inventors: Danny Harnik, Ety Khaitzin, Sergey Marenkov, Dmitry Sotnikov
  • Publication number: 20180074745
    Abstract: This invention relates to a system and method for managing deduplication of volume regions. A storage controller receives the volume hashes of the data stored on a set of storage devices and produces a set of volume sketches. The volume sketches represent a fraction of the hashes. From these sketches and mergers of sketches, the deduped storage size is estimated. The sketches also integrate compression ratios in systems combining both deduplication and compression and are combined with off-line scans in systems that do not have deduplication.
    Type: Application
    Filed: September 12, 2016
    Publication date: March 15, 2018
    Inventors: Danny Harnik, Ronen Kat, Ety Khaitzin
  • Patent number: 9817865
    Abstract: Various embodiments for identifying data in a data deduplication system, by a processor device, are provided. In one embodiment, a method comprises efficiently identifying duplicate data in the data deduplication system by identifying fingerprint matches using a direct inter-region fingerprint lookup to search for the fingerprint matches in at least one of a plurality of metadata regions, the direct inter-region fingerprint lookup supplementing a central fingerprint index.
    Type: Grant
    Filed: December 7, 2015
    Date of Patent: November 14, 2017
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: David D. Chambliss, Jonathan Fischer-Toubol, Joseph S. Glider, Danny Harnik, Ety Khaitzin, Yifat Kuttner, Michael Moser, Yosef Shatsky
  • Publication number: 20170286444
    Abstract: Computer program products, as well as corresponding systems and methods are configured for performing deduplication in conjunction with random read and write operations, and include: receiving a write request comprising data; computing a fingerprint of the data; determining whether a short term dictionary comprises an entry corresponding to the fingerprint; in response to determining the short term dictionary comprises the entry corresponding to the fingerprint, writing the data to a data store in a deduplicating manner; in response to determining the short term dictionary does not comprise the entry, determining whether a long term dictionary corresponding to the namespace comprises the entry; in response to determining the long term dictionary comprises the entry, writing the data to the data store in the deduplicating manner; and in response to determining the long term dictionary does not comprise the entry, writing the data to the data store in a non-deduplicating manner.
    Type: Application
    Filed: March 29, 2016
    Publication date: October 5, 2017
    Inventors: David D. Chambliss, Joseph S. Glider, Danny Harnik, Ety Khaitzin
  • Publication number: 20170262466
    Abstract: A computer-implemented method includes receiving a set of representative machine image regions for a computing environment wherein the set of representative machine image regions collectively comprise a set of representative image chunks. The method also includes generating a fingerprint for each representative image chunk within the set of representative image chunks to produce a set of representative fingerprints, generating a fingerprint for selected image chunks within a measured machine image region to produce a set of sampled fingerprints, and determining a deduplication metric for the measured machine image region based on the representative fingerprints and the sampled fingerprints. A corresponding computer program product and computer system are also disclosed herein.
    Type: Application
    Filed: March 8, 2016
    Publication date: September 14, 2017
    Inventors: Jonathan Amit, Danny Harnik, Ety Khaitzin, Sergey Marenkov
  • Publication number: 20170262467
    Abstract: A computer-implemented method includes receiving a set of basis fingerprints corresponding to image chunks within a basis set of image regions wherein each image region within the basis set of image regions comprises one or more image chunks, and generating a fingerprint for each image chunk of a plurality of selected image chunks within an unprocessed region of a machine image to produce a plurality of sampled fingerprints. The method also includes determining a similarity metric for the unprocessed region from the sampled fingerprints and the basis fingerprints, comparing the similarity metric for the unprocessed region with a selected threshold, and including the unprocessed region within the basis set of image regions in response to determining that the similarity metric is less than the selected threshold. A corresponding computer program product and computer system are also disclosed herein.
    Type: Application
    Filed: March 8, 2016
    Publication date: September 14, 2017
    Inventors: Danny Harnik, Ronen I. Kat, Ety Khaitzin, Sergey Marenkov