Patents by Inventor Ety KHAITZIN
Ety KHAITZIN has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20190286828Abstract: Embodiments of the present systems and methods may provide a data access approval process that supports complex and fine-grained policies and can be applied to different data items at scale, which provides improvement over current technologies. For example, in an embodiment, a computer-implemented method for controlling access to data by computer systems may comprise generating an intermediate representation by integrating a combination of data access policies, data attributes including attributes per data subject, and the data itself to form the intermediate representation, receiving a request for access to the data, rewriting the request for access to the data to incorporate the intermediate representation so as to provide access only to data allowed by the policies integrated into the intermediate representation, and executing the rewritten request and providing only data allowed by the policies integrated into the intermediate representation.Type: ApplicationFiled: March 19, 2018Publication date: September 19, 2019Inventors: MAYA ANDERSON, RONEN Itshak KAT, ROEE SHLOMO, ETY KHAITZIN
-
Publication number: 20190272258Abstract: Computer program products, as well as corresponding systems and methods are configured for performing deduplication in conjunction with random read and write operations, and include: computing a fingerprint of data included in a write request; determining whether a short term dictionary comprises an entry corresponding to the fingerprint; in response to determining the short term dictionary comprises the entry corresponding to the fingerprint, writing the data to a data store in a deduplicating manner; in response to determining the short term dictionary does not comprise the entry, determining whether a long term dictionary corresponding to the namespace comprises the entry; in response to determining the long term dictionary comprises the entry, writing the data to the data store in the deduplicating manner; and in response to determining the long term dictionary does not comprise the entry, writing the data to the data store in a non-deduplicating manner.Type: ApplicationFiled: May 21, 2019Publication date: September 5, 2019Inventors: David D. Chambliss, Joseph S. Glider, Danny Harnik, Ety Khaitzin
-
Patent number: 10394764Abstract: Computer program products, as well as corresponding systems and methods are configured for performing deduplication in conjunction with random read and write operations, and include: receiving a write request comprising data; computing a fingerprint of the data; determining whether a short term dictionary comprises an entry corresponding to the fingerprint; in response to determining the short term dictionary comprises the entry corresponding to the fingerprint, writing the data to a data store in a deduplicating manner; in response to determining the short term dictionary does not comprise the entry, determining whether a long term dictionary corresponding to the namespace comprises the entry; in response to determining the long term dictionary comprises the entry, writing the data to the data store in the deduplicating manner; and in response to determining the long term dictionary does not comprise the entry, writing the data to the data store in a non-deduplicating manner.Type: GrantFiled: March 29, 2016Date of Patent: August 27, 2019Assignee: International Business Machines CorporationInventors: David D. Chambliss, Joseph S. Glider, Danny Harnik, Ety Khaitzin
-
Patent number: 10394846Abstract: Various embodiments for data management in a replicated storage environment, by a processor device, are provided. In one embodiment, a method comprises storing a plurality of data replicas under a plurality of heterogeneous compression algorithms, wherein one of the data replicas is optimized for a data operation.Type: GrantFiled: August 25, 2015Date of Patent: August 27, 2019Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Danny Harnik, Ety Khaitzin, Sergey Marenkov, Dmitry Sotnikov
-
Patent number: 10395145Abstract: A computer-implemented method includes receiving a set of representative machine image regions for a computing environment wherein the set of representative machine image regions collectively comprise a set of representative image chunks. The method also includes generating a fingerprint for each representative image chunk within the set of representative image chunks to produce a set of representative fingerprints, generating a fingerprint for selected image chunks within a measured machine image region to produce a set of sampled fingerprints, and determining a deduplication metric for the measured machine image region based on the representative fingerprints and the sampled fingerprints. A corresponding computer program product and computer system are also disclosed herein.Type: GrantFiled: March 8, 2016Date of Patent: August 27, 2019Assignee: International Business Machines CorporationInventors: Jonathan Amit, Danny Harnik, Ety Khaitzin, Sergey Marenkov
-
Publication number: 20190121563Abstract: A mechanism is provided for dispersed location-based data storage. A request is received to write a data file to a referrer memory region in a set of memory regions. For each data chunk of the data file, responsive to a comparison of a hash value for the data chunk to other hash values for other stored data chunks referenced in the referrer memory region indicating that the data chunk fails to exist in the referrer memory region, responsive to the data chunk existing in another memory region in the set of memory regions, responsive to the memory region failing to be one of the predetermined number N of owner memory regions associated with the referrer memory region, and responsive to the predetermined number N of owner memory regions failing to have been met, a reference to the data chunk is stored in the referrer memory region.Type: ApplicationFiled: October 25, 2017Publication date: April 25, 2019Inventors: Reut Cohen, Jonathan Fischer-Toubol, Afief Halumi, Danny Harnik, Ety Khaitzin, Sergey Marenkov, Asaf Porat-Stoler, Yosef Shatsky, Tom Sivan
-
Patent number: 10255290Abstract: A computer-implemented method includes dividing a data set into a plurality of regions and dividing the plurality of regions into a plurality of chunks of fixed size. The computer-implemented method further includes determining a sample size of the plurality of chunks to be sampled for each region, wherein the sample size is determined based, at least in part, on an acceptance of a likelihood of identifying at least one collision between two regions corresponding to logical entities of a first cluster of logical entities. The computer-implemented method further includes sampling the plurality of chunks for each region based on the determined sample size. The computer-implemented method further includes generating a hash value for each chunk sampled and storing each hash value in an index. The computer-implemented method further includes identifying one or more collisions between the plurality of regions. A corresponding computer system and computer program product are also disclosed.Type: GrantFiled: April 17, 2018Date of Patent: April 9, 2019Assignee: International Business Machines CorporationInventors: Danny Harnik, Ety Khaitzin, Sergey Marenkov, Dmitry Sotnikov
-
Patent number: 10235379Abstract: A computer-implemented method includes dividing a data set into a plurality of regions and dividing the plurality of regions into a plurality of chunks of fixed size. The computer-implemented method further includes determining a sample size of the plurality of chunks to be sampled for each region, wherein the sample size is determined based, at least in part, on an acceptance of a likelihood of identifying at least one collision between two regions corresponding to logical entities of a first cluster of logical entities. The computer-implemented method further includes sampling the plurality of chunks for each region based on the determined sample size. The computer-implemented method further includes generating a hash value for each chunk sampled and storing each hash value in an index. The computer-implemented method further includes identifying one or more collisions between the plurality of regions. A corresponding computer system and computer program product are also disclosed.Type: GrantFiled: April 17, 2018Date of Patent: March 19, 2019Assignee: International Business Machines CorporationInventors: Danny Harnik, Ety Khaitzin, Sergey Marenkov, Dmitry Sotnikov
-
Patent number: 10169364Abstract: A method, including identifying, using a sampling ratio, a random number of logical data units. A hash is calculated for each of the identified logical data units, and a first histogram is computed indicating a duplication count of each of the calculated hashes. Based on respective frequencies of the calculated hashes, a second histogram is computed indicating observed frequencies of each of the duplication counts in the first histogram, and based on the sampling ratio and the second histogram, a target function is derived. A range of acceptable results is derived for the target function, and based on the range of the acceptable results, a set of plausible duplication frequency histograms is defined. A first given plausible duplication frequency histogram having a highest number of distinct logical data units is identified, and a second given plausible duplication frequency histogram having a lowest number of distinct logical data units is identified.Type: GrantFiled: January 13, 2016Date of Patent: January 1, 2019Assignee: International Business Machines CorporationInventors: Danny Harnik, Ety Khaitzin, Dmitry Sotnikov
-
Patent number: 10162867Abstract: Methods, computing systems and computer program products implement embodiments of the present invention that include partitioning a dataset into a full set of logical data units, and selecting a sample subset of the full set, the sample subset including a random sample of the full set based on a sampling ratio. A set of target hash values are selected from a full range of hash values, and, using a hash function, a respective unit hash value is calculated for each of the logical data units in the sample subset. A histogram is computed that indicates a duplication count of each of the unit hash values that matches a given target hash value, and based on the histogram, a number of distinct logical data units in the full set is estimated.Type: GrantFiled: January 13, 2016Date of Patent: December 25, 2018Assignee: International Business Machines CorporationInventors: Danny Harnik, Ety Khaitzin, Dmitry Sotnikov
-
Publication number: 20180225301Abstract: A computer-implemented method includes dividing a data set into a plurality of regions and dividing the plurality of regions into a plurality of chunks of fixed size. The computer-implemented method further includes determining a sample size of the plurality of chunks to be sampled for each region, wherein the sample size is determined based, at least in part, on an acceptance of a likelihood of identifying at least one collision between two regions corresponding to logical entities of a first cluster of logical entities. The computer-implemented method further includes sampling the plurality of chunks for each region based on the determined sample size. The computer-implemented method further includes generating a hash value for each chunk sampled and storing each hash value in an index. The computer-implemented method further includes identifying one or more collisions between the plurality of regions. A corresponding computer system and computer program product are also disclosed.Type: ApplicationFiled: April 17, 2018Publication date: August 9, 2018Inventors: Danny Harnik, Ety Khaitzin, Sergey Marenkov, Dmitry Sotnikov
-
Publication number: 20180225300Abstract: A computer-implemented method includes dividing a data set into a plurality of regions and dividing the plurality of regions into a plurality of chunks of fixed size. The computer-implemented method further includes determining a sample size of the plurality of chunks to be sampled for each region, wherein the sample size is determined based, at least in part, on an acceptance of a likelihood of identifying at least one collision between two regions corresponding to logical entities of a first cluster of logical entities. The computer-implemented method further includes sampling the plurality of chunks for each region based on the determined sample size. The computer-implemented method further includes generating a hash value for each chunk sampled and storing each hash value in an index. The computer-implemented method further includes identifying one or more collisions between the plurality of regions. A corresponding computer system and computer program product are also disclosed.Type: ApplicationFiled: April 17, 2018Publication date: August 9, 2018Inventors: Danny Harnik, Ety Khaitzin, Sergey Marenkov, Dmitry Sotnikov
-
Publication number: 20180150474Abstract: A computer-implemented method includes dividing a data set into a plurality of regions and dividing the plurality of regions into a plurality of chunks of fixed size. The computer-implemented method further includes determining a sample size of the plurality of chunks to be sampled for each region, wherein the sample size is determined based, at least in part, on an acceptance of a likelihood of identifying at least one collision between two regions corresponding to logical entities of a first cluster of logical entities. The computer-implemented method further includes sampling the plurality of chunks for each region based on the determined sample size. The computer-implemented method further includes generating a hash value for each chunk sampled and storing each hash value in an index. The computer-implemented method further includes identifying one or more collisions between the plurality of regions. A corresponding computer system and computer program product are also disclosed.Type: ApplicationFiled: August 16, 2017Publication date: May 31, 2018Inventors: Danny Harnik, Ety Khaitzin, Sergey Marenkov, Dmitry Sotnikov
-
Publication number: 20180150473Abstract: A computer-implemented method includes dividing a data set into a plurality of regions and dividing the plurality of regions into a plurality of chunks of fixed size. The computer-implemented method further includes determining a sample size of the plurality of chunks to be sampled for each region, wherein the sample size is determined based, at least in part, on an acceptance of a likelihood of identifying at least one collision between two regions corresponding to logical entities of a first cluster of logical entities. The computer-implemented method further includes sampling the plurality of chunks for each region based on the determined sample size. The computer-implemented method further includes generating a hash value for each chunk sampled and storing each hash value in an index. The computer-implemented method further includes identifying one or more collisions between the plurality of regions. A corresponding computer system and computer program product are also disclosed.Type: ApplicationFiled: November 30, 2016Publication date: May 31, 2018Inventors: Danny Harnik, Ety Khaitzin, Sergey Marenkov, Dmitry Sotnikov
-
Patent number: 9984092Abstract: A computer-implemented method includes dividing a data set into a plurality of regions and dividing the plurality of regions into a plurality of chunks of fixed size. The computer-implemented method further includes determining a sample size of the plurality of chunks to be sampled for each region, wherein the sample size is determined based, at least in part, on an acceptance of a likelihood of identifying at least one collision between two regions corresponding to logical entities of a first cluster of logical entities. The computer-implemented method further includes sampling the plurality of chunks for each region based on the determined sample size. The computer-implemented method further includes generating a hash value for each chunk sampled and storing each hash value in an index. The computer-implemented method further includes identifying one or more collisions between the plurality of regions. A corresponding computer system and computer program product are also disclosed.Type: GrantFiled: August 16, 2017Date of Patent: May 29, 2018Assignee: International Business Machines CorporationInventors: Danny Harnik, Ety Khaitzin, Sergey Marenkov, Dmitry Sotnikov
-
Publication number: 20180074745Abstract: This invention relates to a system and method for managing deduplication of volume regions. A storage controller receives the volume hashes of the data stored on a set of storage devices and produces a set of volume sketches. The volume sketches represent a fraction of the hashes. From these sketches and mergers of sketches, the deduped storage size is estimated. The sketches also integrate compression ratios in systems combining both deduplication and compression and are combined with off-line scans in systems that do not have deduplication.Type: ApplicationFiled: September 12, 2016Publication date: March 15, 2018Inventors: Danny Harnik, Ronen Kat, Ety Khaitzin
-
Patent number: 9817865Abstract: Various embodiments for identifying data in a data deduplication system, by a processor device, are provided. In one embodiment, a method comprises efficiently identifying duplicate data in the data deduplication system by identifying fingerprint matches using a direct inter-region fingerprint lookup to search for the fingerprint matches in at least one of a plurality of metadata regions, the direct inter-region fingerprint lookup supplementing a central fingerprint index.Type: GrantFiled: December 7, 2015Date of Patent: November 14, 2017Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: David D. Chambliss, Jonathan Fischer-Toubol, Joseph S. Glider, Danny Harnik, Ety Khaitzin, Yifat Kuttner, Michael Moser, Yosef Shatsky
-
Publication number: 20170286444Abstract: Computer program products, as well as corresponding systems and methods are configured for performing deduplication in conjunction with random read and write operations, and include: receiving a write request comprising data; computing a fingerprint of the data; determining whether a short term dictionary comprises an entry corresponding to the fingerprint; in response to determining the short term dictionary comprises the entry corresponding to the fingerprint, writing the data to a data store in a deduplicating manner; in response to determining the short term dictionary does not comprise the entry, determining whether a long term dictionary corresponding to the namespace comprises the entry; in response to determining the long term dictionary comprises the entry, writing the data to the data store in the deduplicating manner; and in response to determining the long term dictionary does not comprise the entry, writing the data to the data store in a non-deduplicating manner.Type: ApplicationFiled: March 29, 2016Publication date: October 5, 2017Inventors: David D. Chambliss, Joseph S. Glider, Danny Harnik, Ety Khaitzin
-
Publication number: 20170262466Abstract: A computer-implemented method includes receiving a set of representative machine image regions for a computing environment wherein the set of representative machine image regions collectively comprise a set of representative image chunks. The method also includes generating a fingerprint for each representative image chunk within the set of representative image chunks to produce a set of representative fingerprints, generating a fingerprint for selected image chunks within a measured machine image region to produce a set of sampled fingerprints, and determining a deduplication metric for the measured machine image region based on the representative fingerprints and the sampled fingerprints. A corresponding computer program product and computer system are also disclosed herein.Type: ApplicationFiled: March 8, 2016Publication date: September 14, 2017Inventors: Jonathan Amit, Danny Harnik, Ety Khaitzin, Sergey Marenkov
-
Publication number: 20170262467Abstract: A computer-implemented method includes receiving a set of basis fingerprints corresponding to image chunks within a basis set of image regions wherein each image region within the basis set of image regions comprises one or more image chunks, and generating a fingerprint for each image chunk of a plurality of selected image chunks within an unprocessed region of a machine image to produce a plurality of sampled fingerprints. The method also includes determining a similarity metric for the unprocessed region from the sampled fingerprints and the basis fingerprints, comparing the similarity metric for the unprocessed region with a selected threshold, and including the unprocessed region within the basis set of image regions in response to determining that the similarity metric is less than the selected threshold. A corresponding computer program product and computer system are also disclosed herein.Type: ApplicationFiled: March 8, 2016Publication date: September 14, 2017Inventors: Danny Harnik, Ronen I. Kat, Ety Khaitzin, Sergey Marenkov