Patents by Inventor Shmuel T. Klein

Shmuel T. Klein has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10649854
    Abstract: A computer-implemented method, according to one embodiment includes, for each repository data chunk in repository data that comprises a plurality of the repository data chunks, generating a corresponding set of repository distinguishing characteristics (RDCs). Each set of RDCs is generated by: applying a hash function to the respective input data chunk or repository data chunk to generate a plurality of hashes, each hash comprising a hash value and a hash position within the data chunk, applying a first function to the plurality of generated hashes to identify a first subset of hashes distributed across the data chunk, applying a second function to the hash positions of the hashes of the first subset to identify a second subset of the plurality of generated hashes, and defining the second subset of hashes as the set of RDCs.
    Type: Grant
    Filed: August 1, 2016
    Date of Patent: May 12, 2020
    Assignee: International Business Machines Corporation
    Inventors: Lior Aronovich, Ron Asher, Eitan Bachmat, Haim Bitner, Michael Hirsch, Shmuel T. Klein
  • Patent number: 10282257
    Abstract: A computer program product for searching a repository of binary uninterpretted data, according to one embodiment, includes a computer readable storage medium having program instructions executable by a computer to cause the computer to perform a method comprising: analyzing, by the computer, segments of each of the repository and input data to determine a repository segment that is similar to an input segment, the analyzing including searching an index of representation values of the repository data for matching representation values of the input in a time independent of a size of the repository and linear in a size of the input data; and analyzing, by the computer, the similar repository segment with respect to the input segment to determine their common data sections while utilizing at least some of the matching representation values for data alignment, in a time linear in a size of the input segment.
    Type: Grant
    Filed: July 25, 2016
    Date of Patent: May 7, 2019
    Assignee: International Business Machines Corporation
    Inventors: Lior Aronovich, Ron Asher, Eitan Bachmat, Haim Bitner, Michael Hirsch, Shmuel T. Klein
  • Patent number: 9747055
    Abstract: Exemplary method, system, and computer program product embodiments for scalable data deduplication working with small data chunk in a computing environment are provided. In one embodiment, by way of example only, for each small data chunk, a signature is generated based on a combination of a representation of characters used in selecting data to be deduplicated. A c-spectrum of the small data chunk being a sequence of representations of different characters ordered by a frequency of occurrence in the small data chunk, and an f-spectrum of the small data chunk being a corresponding sequence of frequencies of the different characters in the small data chunk.
    Type: Grant
    Filed: June 8, 2015
    Date of Patent: August 29, 2017
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Lior Aronovich, Ron Asher, Michael Hirsch, Shmuel T. Klein, Ehud Meiri, Yair Toaff
  • Publication number: 20160342482
    Abstract: A computer-implemented method, according to one embodiment includes, for each repository data chunk in repository data that comprises a plurality of the repository data chunks, generating a corresponding set of repository distinguishing characteristics (RDCs). Each set of RDCs is generated by: applying a hash function to the respective input data chunk or repository data chunk to generate a plurality of hashes, each hash comprising a hash value and a hash position within the data chunk, applying a first function to the plurality of generated hashes to identify a first subset of hashes distributed across the data chunk, applying a second function to the hash positions of the hashes of the first subset to identify a second subset of the plurality of generated hashes, and defining the second subset of hashes as the set of RDCs.
    Type: Application
    Filed: August 1, 2016
    Publication date: November 24, 2016
    Inventors: Lior Aronovich, Ron Asher, Eitan Bachmat, Haim Bitner, Michael Hirsch, Shmuel T. Klein
  • Publication number: 20160335285
    Abstract: A computer program product for searching a repository of binary uninterpretted data, according to one embodiment, includes a computer readable storage medium having program instructions executable by a computer to cause the computer to perform a method comprising: analyzing, by the computer, segments of each of the repository and input data to determine a repository segment that is similar to an input segment, the analyzing including searching an index of representation values of the repository data for matching representation values of the input in a time independent of a size of the repository and linear in a size of the input data; and analyzing, by the computer, the similar repository segment with respect to the input segment to determine their common data sections while utilizing at least some of the matching representation values for data alignment, in a time linear in a size of the input segment.
    Type: Application
    Filed: July 25, 2016
    Publication date: November 17, 2016
    Inventors: Lior Aronovich, Ron Asher, Eitan Bachmat, Haim Bitner, Michael Hirsch, Shmuel T. Klein
  • Patent number: 9448854
    Abstract: Exemplary method, system, and computer program product embodiments for full exploitation of parallel processors for data processing are provided. In one embodiment, by way of example only, a set of parallel processors is partitioned into disjoint subsets according to indices of the set of the parallel processors. The size of each of the disjoint subsets corresponds to a number of processors assigned to the processing of the data chunks at one of the layers. A transition function is devised from the indices of the set of the parallel processors at one time steps to the indices of the set of the parallel processors at a following time step.
    Type: Grant
    Filed: April 11, 2016
    Date of Patent: September 20, 2016
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael Hirsch, Shmuel T. Klein, Yair Toaff
  • Patent number: 9436511
    Abstract: Exemplary method, system, and computer program product embodiments for full exploitation of parallel processors for data processing are provided. In one embodiment, by way of example only, a set of parallel processors is partitioned into disjoint subsets according to indices of the set of the parallel processors. The size of each of the disjoint subsets corresponds to a number of processors assigned to the processing of the data chunks at one of the layers. Each of the processors are assigned to different layers in different data chunks such that each of processors are busy and the data chunks are fully processed within a number of the time steps equal to the number of the layers. A transition function is devised from the indices of the set of the parallel processors at one time steps to the indices of the set of the parallel processors at a following time step.
    Type: Grant
    Filed: February 17, 2015
    Date of Patent: September 6, 2016
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael Hirsch, Shmuel T. Klein, Yair Toaff
  • Patent number: 9430486
    Abstract: Systems and methods enabling search of a repository for the location of data that is similar to input data, using a defined measure of similarity, in a time that is independent of the size of the repository and linear in a size of the input data, and a space that is proportional to a small fraction of the size of the repository. The similar data segments thus located are further analyzed to determine their common (identical) data sections, regardless of the order and position of the common data sections in the repository and input, and in a time that is linear in the segment size and in constant space.
    Type: Grant
    Filed: March 19, 2009
    Date of Patent: August 30, 2016
    Assignee: International Business Machines Corporation
    Inventors: Michael Hirsch, Haim Bitner, Lior Aronovich, Ron Asher, Eitan Bachmat, Shmuel T. Klein
  • Publication number: 20160224391
    Abstract: Exemplary method, system, and computer program product embodiments for full exploitation of parallel processors for data processing are provided. In one embodiment, by way of example only, a set of parallel processors is partitioned into disjoint subsets according to indices of the set of the parallel processors. The size of each of the disjoint subsets corresponds to a number of processors assigned to the processing of the data chunks at one of the layers. A transition function is devised from the indices of the set of the parallel processors at one time steps to the indices of the set of the parallel processors at a following time step.
    Type: Application
    Filed: April 11, 2016
    Publication date: August 4, 2016
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael HIRSCH, Shmuel T. KLEIN, Yair TOAFF
  • Patent number: 9405509
    Abstract: Methods, computer systems, and computer program products for calculating a remainder by division of a sequence of bytes interpreted as a first number by a second number are provided. A first subset of bytes is read, and an associated first remainder by division is calculated and stored in the memory location from which the subset was read. A second subset of bytes is read, and an associated second remainder by division is calculated with a second processor. The calculating of the second remainder by division may occur at least partially during the calculating of the first remainder by division. A third and fourth subset of bytes is read and associated remainders are calculated.
    Type: Grant
    Filed: December 17, 2014
    Date of Patent: August 2, 2016
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael Hirsch, Shmuel T. Klein, Yair Toaff
  • Patent number: 9400796
    Abstract: Systems and methods enabling search of a repository for the location of data that is similar to input data, using a defined measure of similarity, in a time that is independent of the size of the repository and linear in a size of the input data, and a space that is proportional to a small fraction of the size of the repository. The similar data segments thus located are further analyzed to determine their common (identical) data sections, regardless of the order and position of the common data sections in the repository and input, and in a time that is linear in the segment size and in constant space.
    Type: Grant
    Filed: March 19, 2009
    Date of Patent: July 26, 2016
    Assignee: International Business Machines Corporation
    Inventors: Michael Hirsch, Haim Bitner, Lior Aronovich, Ron Asher, Eitan Bachmat, Shmuel T. Klein
  • Patent number: 9378211
    Abstract: Systems and methods enabling search of a repository for the location of data that is similar to input data, using a defined measure of similarity, in a time that is independent of the size of the repository and linear in a size of the input data, and a space that is proportional to a small fraction of the size of the repository. The similar data segments thus located are further analyzed to determine their common (identical) data sections, regardless of the order and position of the common data sections in the repository and input, and in a time that is linear in the segment size and in constant space.
    Type: Grant
    Filed: March 19, 2009
    Date of Patent: June 28, 2016
    Assignee: International Business Machines Corporation
    Inventors: Michael Hirsch, Haim Bitner, Lior Aronovich, Ron Asher, Eitan Bachmat, Shmuel T. Klein
  • Publication number: 20150286443
    Abstract: Exemplary method, system, and computer program product embodiments for scalable data deduplication working with small data chunk in a computing environment are provided. In one embodiment, by way of example only, for each small data chunk, a signature is generated based on a combination of a representation of characters used in selecting data to be deduplicated. A c-spectrum of the small data chunk being a sequence of representations of different characters ordered by a frequency of occurrence in the small data chunk, and an f-spectrum of the small data chunk being a corresponding sequence of frequencies of the different characters in the small data chunk.
    Type: Application
    Filed: June 8, 2015
    Publication date: October 8, 2015
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Lior ARONOVICH, Ron ASHER, Michael HIRSCH, Shmuel T. KLEIN, Ehud MEIRI, Yair TOAFF
  • Publication number: 20150269182
    Abstract: Segment sizes are controlled by setting the size of a segment boundary in a hash-based deduplication system. A subsequence of size K of a sequence of characters S is set. Segment boundaries are set by using the sequence of the decreasingly restrictive logical tests if one of the sequence of the decreasingly restrictive logical tests returns a true value when applied on the sequence of characters S.
    Type: Application
    Filed: June 1, 2015
    Publication date: September 24, 2015
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael HIRSCH, Shmuel T. KLEIN, Yair TOAFF
  • Publication number: 20150261447
    Abstract: Segment sizes are controlled by setting the size of a segment boundary in a hash-based backup deduplication system in a distributed computing environment. A subsequence of size K of a sequence of characters S is set. Segment boundaries are set by using the sequence of the decreasingly restrictive logical tests if one of the sequence of the decreasingly restrictive logical tests returns a true value when applied on the sequence of characters S.
    Type: Application
    Filed: June 1, 2015
    Publication date: September 17, 2015
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael HIRSCH, Shmuel T. KLEIN, Yair TOAFF
  • Publication number: 20150234685
    Abstract: Exemplary method, system, and computer program product embodiments for full exploitation of parallel processors for data processing are provided. In one embodiment, by way of example only, a set of parallel processors is partitioned into disjoint subsets according to indices of the set of the parallel processors. The size of each of the disjoint subsets corresponds to a number of processors assigned to the processing of the data chunks at one of the layers. Each of the processors are assigned to different layers in different data chunks such that each of processors are busy and the data chunks are fully processed within a number of the time steps equal to the number of the layers. A transition function is devised from the indices of the set of the parallel processors at one time steps to the indices of the set of the parallel processors at a following time step.
    Type: Application
    Filed: February 17, 2015
    Publication date: August 20, 2015
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael HIRSCH, Shmuel T. KLEIN, Yair TOAFF
  • Patent number: 9081809
    Abstract: Exemplary method, system, and computer program product embodiments for scalable data deduplication working with small data chunk in a computing environment are provided. In one embodiment, by way of example only, for each of the small data chunk, a signature is generated based on a combination of a representation of characters that appear in the small data chunk with a representation of frequencies of the small data chunk. A signature is generated based on a combination of a representation of characters that appear. The signature is used to help in selecting the data to be deduplicated. Additional system and computer program product embodiments are disclosed and provide related advantages.
    Type: Grant
    Filed: June 27, 2013
    Date of Patent: July 14, 2015
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Lior Aronovich, Ron Asher, Michael Hirsch, Shmuel T. Klein, Ehud Meiri, Yair Toaff
  • Patent number: 9075842
    Abstract: Exemplary method, system, and computer program product embodiments for scalable data deduplication working with small data chunk in a computing environment are provided. In one embodiment, by way of example only, for each of the small data chunk, a signature is generated based on a combination of a representation of characters that appear in the small data chunk with a representation of frequencies of the small data chunk. A signature is generated based on a combination of a representation of characters that appear. The signature is used to help in selecting the data to be deduplicated. Additional system and computer program product embodiments are disclosed and provide related advantages.
    Type: Grant
    Filed: June 27, 2013
    Date of Patent: July 7, 2015
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Lior Aronovich, Ron Asher, Michael Hirsch, Shmuel T. Klein, Ehud Meiri, Yair Toaff
  • Patent number: 9069478
    Abstract: Segment sizes are controlled by setting the size of a segment boundary in a hash-based deduplication system. A subsequence of size K of a sequence of characters S is set. An increasing sequence of n probabilities and a corresponding sequence of n decreasingly restrictive logical tests are chosen to be applied on the sequence of characters S. Segment boundaries are set by using the sequence of the decreasingly restrictive logical tests by deciding to declare a segment boundary at a current position if one of the sequence of the decreasingly restrictive logical tests, with a corresponding probability of the sequence of n probabilities, returns a true value when applied on the sequence of characters S.
    Type: Grant
    Filed: January 2, 2013
    Date of Patent: June 30, 2015
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael Hirsch, Shmuel T. Klein, Yair Toaff
  • Publication number: 20150143197
    Abstract: A basic property of flash memory is that: a 0-bit can be changed into a 1-bit, but not vice-versa, which severely limits the possibilities of reusing storage space with new data. A family of new coding methods is presented that enables double use of the memory, effectively expanding the combined amount of stored data. This can then be used as a compression booster, adding an additional layer to, and improving the compression of some rewriting methods that are not context sensitive.
    Type: Application
    Filed: June 29, 2014
    Publication date: May 21, 2015
    Inventor: Shmuel T. KLEIN