Patents by Inventor Shmuel T. Klein
Shmuel T. Klein has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 10649854Abstract: A computer-implemented method, according to one embodiment includes, for each repository data chunk in repository data that comprises a plurality of the repository data chunks, generating a corresponding set of repository distinguishing characteristics (RDCs). Each set of RDCs is generated by: applying a hash function to the respective input data chunk or repository data chunk to generate a plurality of hashes, each hash comprising a hash value and a hash position within the data chunk, applying a first function to the plurality of generated hashes to identify a first subset of hashes distributed across the data chunk, applying a second function to the hash positions of the hashes of the first subset to identify a second subset of the plurality of generated hashes, and defining the second subset of hashes as the set of RDCs.Type: GrantFiled: August 1, 2016Date of Patent: May 12, 2020Assignee: International Business Machines CorporationInventors: Lior Aronovich, Ron Asher, Eitan Bachmat, Haim Bitner, Michael Hirsch, Shmuel T. Klein
-
Patent number: 10282257Abstract: A computer program product for searching a repository of binary uninterpretted data, according to one embodiment, includes a computer readable storage medium having program instructions executable by a computer to cause the computer to perform a method comprising: analyzing, by the computer, segments of each of the repository and input data to determine a repository segment that is similar to an input segment, the analyzing including searching an index of representation values of the repository data for matching representation values of the input in a time independent of a size of the repository and linear in a size of the input data; and analyzing, by the computer, the similar repository segment with respect to the input segment to determine their common data sections while utilizing at least some of the matching representation values for data alignment, in a time linear in a size of the input segment.Type: GrantFiled: July 25, 2016Date of Patent: May 7, 2019Assignee: International Business Machines CorporationInventors: Lior Aronovich, Ron Asher, Eitan Bachmat, Haim Bitner, Michael Hirsch, Shmuel T. Klein
-
Patent number: 9747055Abstract: Exemplary method, system, and computer program product embodiments for scalable data deduplication working with small data chunk in a computing environment are provided. In one embodiment, by way of example only, for each small data chunk, a signature is generated based on a combination of a representation of characters used in selecting data to be deduplicated. A c-spectrum of the small data chunk being a sequence of representations of different characters ordered by a frequency of occurrence in the small data chunk, and an f-spectrum of the small data chunk being a corresponding sequence of frequencies of the different characters in the small data chunk.Type: GrantFiled: June 8, 2015Date of Patent: August 29, 2017Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Lior Aronovich, Ron Asher, Michael Hirsch, Shmuel T. Klein, Ehud Meiri, Yair Toaff
-
Publication number: 20160342482Abstract: A computer-implemented method, according to one embodiment includes, for each repository data chunk in repository data that comprises a plurality of the repository data chunks, generating a corresponding set of repository distinguishing characteristics (RDCs). Each set of RDCs is generated by: applying a hash function to the respective input data chunk or repository data chunk to generate a plurality of hashes, each hash comprising a hash value and a hash position within the data chunk, applying a first function to the plurality of generated hashes to identify a first subset of hashes distributed across the data chunk, applying a second function to the hash positions of the hashes of the first subset to identify a second subset of the plurality of generated hashes, and defining the second subset of hashes as the set of RDCs.Type: ApplicationFiled: August 1, 2016Publication date: November 24, 2016Inventors: Lior Aronovich, Ron Asher, Eitan Bachmat, Haim Bitner, Michael Hirsch, Shmuel T. Klein
-
Publication number: 20160335285Abstract: A computer program product for searching a repository of binary uninterpretted data, according to one embodiment, includes a computer readable storage medium having program instructions executable by a computer to cause the computer to perform a method comprising: analyzing, by the computer, segments of each of the repository and input data to determine a repository segment that is similar to an input segment, the analyzing including searching an index of representation values of the repository data for matching representation values of the input in a time independent of a size of the repository and linear in a size of the input data; and analyzing, by the computer, the similar repository segment with respect to the input segment to determine their common data sections while utilizing at least some of the matching representation values for data alignment, in a time linear in a size of the input segment.Type: ApplicationFiled: July 25, 2016Publication date: November 17, 2016Inventors: Lior Aronovich, Ron Asher, Eitan Bachmat, Haim Bitner, Michael Hirsch, Shmuel T. Klein
-
Patent number: 9448854Abstract: Exemplary method, system, and computer program product embodiments for full exploitation of parallel processors for data processing are provided. In one embodiment, by way of example only, a set of parallel processors is partitioned into disjoint subsets according to indices of the set of the parallel processors. The size of each of the disjoint subsets corresponds to a number of processors assigned to the processing of the data chunks at one of the layers. A transition function is devised from the indices of the set of the parallel processors at one time steps to the indices of the set of the parallel processors at a following time step.Type: GrantFiled: April 11, 2016Date of Patent: September 20, 2016Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Michael Hirsch, Shmuel T. Klein, Yair Toaff
-
Patent number: 9436511Abstract: Exemplary method, system, and computer program product embodiments for full exploitation of parallel processors for data processing are provided. In one embodiment, by way of example only, a set of parallel processors is partitioned into disjoint subsets according to indices of the set of the parallel processors. The size of each of the disjoint subsets corresponds to a number of processors assigned to the processing of the data chunks at one of the layers. Each of the processors are assigned to different layers in different data chunks such that each of processors are busy and the data chunks are fully processed within a number of the time steps equal to the number of the layers. A transition function is devised from the indices of the set of the parallel processors at one time steps to the indices of the set of the parallel processors at a following time step.Type: GrantFiled: February 17, 2015Date of Patent: September 6, 2016Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Michael Hirsch, Shmuel T. Klein, Yair Toaff
-
Patent number: 9430486Abstract: Systems and methods enabling search of a repository for the location of data that is similar to input data, using a defined measure of similarity, in a time that is independent of the size of the repository and linear in a size of the input data, and a space that is proportional to a small fraction of the size of the repository. The similar data segments thus located are further analyzed to determine their common (identical) data sections, regardless of the order and position of the common data sections in the repository and input, and in a time that is linear in the segment size and in constant space.Type: GrantFiled: March 19, 2009Date of Patent: August 30, 2016Assignee: International Business Machines CorporationInventors: Michael Hirsch, Haim Bitner, Lior Aronovich, Ron Asher, Eitan Bachmat, Shmuel T. Klein
-
Publication number: 20160224391Abstract: Exemplary method, system, and computer program product embodiments for full exploitation of parallel processors for data processing are provided. In one embodiment, by way of example only, a set of parallel processors is partitioned into disjoint subsets according to indices of the set of the parallel processors. The size of each of the disjoint subsets corresponds to a number of processors assigned to the processing of the data chunks at one of the layers. A transition function is devised from the indices of the set of the parallel processors at one time steps to the indices of the set of the parallel processors at a following time step.Type: ApplicationFiled: April 11, 2016Publication date: August 4, 2016Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Michael HIRSCH, Shmuel T. KLEIN, Yair TOAFF
-
Patent number: 9405509Abstract: Methods, computer systems, and computer program products for calculating a remainder by division of a sequence of bytes interpreted as a first number by a second number are provided. A first subset of bytes is read, and an associated first remainder by division is calculated and stored in the memory location from which the subset was read. A second subset of bytes is read, and an associated second remainder by division is calculated with a second processor. The calculating of the second remainder by division may occur at least partially during the calculating of the first remainder by division. A third and fourth subset of bytes is read and associated remainders are calculated.Type: GrantFiled: December 17, 2014Date of Patent: August 2, 2016Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Michael Hirsch, Shmuel T. Klein, Yair Toaff
-
Patent number: 9400796Abstract: Systems and methods enabling search of a repository for the location of data that is similar to input data, using a defined measure of similarity, in a time that is independent of the size of the repository and linear in a size of the input data, and a space that is proportional to a small fraction of the size of the repository. The similar data segments thus located are further analyzed to determine their common (identical) data sections, regardless of the order and position of the common data sections in the repository and input, and in a time that is linear in the segment size and in constant space.Type: GrantFiled: March 19, 2009Date of Patent: July 26, 2016Assignee: International Business Machines CorporationInventors: Michael Hirsch, Haim Bitner, Lior Aronovich, Ron Asher, Eitan Bachmat, Shmuel T. Klein
-
Patent number: 9378211Abstract: Systems and methods enabling search of a repository for the location of data that is similar to input data, using a defined measure of similarity, in a time that is independent of the size of the repository and linear in a size of the input data, and a space that is proportional to a small fraction of the size of the repository. The similar data segments thus located are further analyzed to determine their common (identical) data sections, regardless of the order and position of the common data sections in the repository and input, and in a time that is linear in the segment size and in constant space.Type: GrantFiled: March 19, 2009Date of Patent: June 28, 2016Assignee: International Business Machines CorporationInventors: Michael Hirsch, Haim Bitner, Lior Aronovich, Ron Asher, Eitan Bachmat, Shmuel T. Klein
-
Publication number: 20150286443Abstract: Exemplary method, system, and computer program product embodiments for scalable data deduplication working with small data chunk in a computing environment are provided. In one embodiment, by way of example only, for each small data chunk, a signature is generated based on a combination of a representation of characters used in selecting data to be deduplicated. A c-spectrum of the small data chunk being a sequence of representations of different characters ordered by a frequency of occurrence in the small data chunk, and an f-spectrum of the small data chunk being a corresponding sequence of frequencies of the different characters in the small data chunk.Type: ApplicationFiled: June 8, 2015Publication date: October 8, 2015Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Lior ARONOVICH, Ron ASHER, Michael HIRSCH, Shmuel T. KLEIN, Ehud MEIRI, Yair TOAFF
-
Publication number: 20150269182Abstract: Segment sizes are controlled by setting the size of a segment boundary in a hash-based deduplication system. A subsequence of size K of a sequence of characters S is set. Segment boundaries are set by using the sequence of the decreasingly restrictive logical tests if one of the sequence of the decreasingly restrictive logical tests returns a true value when applied on the sequence of characters S.Type: ApplicationFiled: June 1, 2015Publication date: September 24, 2015Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Michael HIRSCH, Shmuel T. KLEIN, Yair TOAFF
-
Publication number: 20150261447Abstract: Segment sizes are controlled by setting the size of a segment boundary in a hash-based backup deduplication system in a distributed computing environment. A subsequence of size K of a sequence of characters S is set. Segment boundaries are set by using the sequence of the decreasingly restrictive logical tests if one of the sequence of the decreasingly restrictive logical tests returns a true value when applied on the sequence of characters S.Type: ApplicationFiled: June 1, 2015Publication date: September 17, 2015Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Michael HIRSCH, Shmuel T. KLEIN, Yair TOAFF
-
Publication number: 20150234685Abstract: Exemplary method, system, and computer program product embodiments for full exploitation of parallel processors for data processing are provided. In one embodiment, by way of example only, a set of parallel processors is partitioned into disjoint subsets according to indices of the set of the parallel processors. The size of each of the disjoint subsets corresponds to a number of processors assigned to the processing of the data chunks at one of the layers. Each of the processors are assigned to different layers in different data chunks such that each of processors are busy and the data chunks are fully processed within a number of the time steps equal to the number of the layers. A transition function is devised from the indices of the set of the parallel processors at one time steps to the indices of the set of the parallel processors at a following time step.Type: ApplicationFiled: February 17, 2015Publication date: August 20, 2015Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Michael HIRSCH, Shmuel T. KLEIN, Yair TOAFF
-
Patent number: 9081809Abstract: Exemplary method, system, and computer program product embodiments for scalable data deduplication working with small data chunk in a computing environment are provided. In one embodiment, by way of example only, for each of the small data chunk, a signature is generated based on a combination of a representation of characters that appear in the small data chunk with a representation of frequencies of the small data chunk. A signature is generated based on a combination of a representation of characters that appear. The signature is used to help in selecting the data to be deduplicated. Additional system and computer program product embodiments are disclosed and provide related advantages.Type: GrantFiled: June 27, 2013Date of Patent: July 14, 2015Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Lior Aronovich, Ron Asher, Michael Hirsch, Shmuel T. Klein, Ehud Meiri, Yair Toaff
-
Patent number: 9075842Abstract: Exemplary method, system, and computer program product embodiments for scalable data deduplication working with small data chunk in a computing environment are provided. In one embodiment, by way of example only, for each of the small data chunk, a signature is generated based on a combination of a representation of characters that appear in the small data chunk with a representation of frequencies of the small data chunk. A signature is generated based on a combination of a representation of characters that appear. The signature is used to help in selecting the data to be deduplicated. Additional system and computer program product embodiments are disclosed and provide related advantages.Type: GrantFiled: June 27, 2013Date of Patent: July 7, 2015Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Lior Aronovich, Ron Asher, Michael Hirsch, Shmuel T. Klein, Ehud Meiri, Yair Toaff
-
Patent number: 9069478Abstract: Segment sizes are controlled by setting the size of a segment boundary in a hash-based deduplication system. A subsequence of size K of a sequence of characters S is set. An increasing sequence of n probabilities and a corresponding sequence of n decreasingly restrictive logical tests are chosen to be applied on the sequence of characters S. Segment boundaries are set by using the sequence of the decreasingly restrictive logical tests by deciding to declare a segment boundary at a current position if one of the sequence of the decreasingly restrictive logical tests, with a corresponding probability of the sequence of n probabilities, returns a true value when applied on the sequence of characters S.Type: GrantFiled: January 2, 2013Date of Patent: June 30, 2015Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Michael Hirsch, Shmuel T. Klein, Yair Toaff
-
Publication number: 20150143197Abstract: A basic property of flash memory is that: a 0-bit can be changed into a 1-bit, but not vice-versa, which severely limits the possibilities of reusing storage space with new data. A family of new coding methods is presented that enables double use of the memory, effectively expanding the combined amount of stored data. This can then be used as a compression booster, adding an additional layer to, and improving the compression of some rewriting methods that are not context sensitive.Type: ApplicationFiled: June 29, 2014Publication date: May 21, 2015Inventor: Shmuel T. KLEIN